PSL works with entrepreneurs to apply emerging software technologies to problems in order to create value. We’ve been experimenting with generative AI for several years. Researchers are making amazing breakthroughs at an increasing pace with techniques like GPT and DALL-E. These latest AI innovations enable machines to demonstrate common-sense reasoning, long-form text composition, and image creation. Still, there’s a lot to learn when applying these new techniques to the real world. Here are some of the challenges we’ve run into in this exciting space.
Quality. GPT-3 often generates confident but incorrect text. The art generated by DALL-E 2 and similar algorithms is not yet suitable for commercial use.
Continuity. GPT-3 doesn’t retain context from one request to another. Finding a prompt template that works consistently (given varying input) is difficult. Generating a sequence of images with DALL-E that contain the same subject is currently impossible.
Defensibility. It’s hard to build a defensible business model on top of a 3rd party API, or an open-source model.
Cost. Foundation models are trained on huge amounts of data, with huge numbers of parameters. So, replicating the model is probably impossible for a startup. Depending on the prompts and number of calls your solutions require, using an API for someone else’s model may be cost-prohibitive.
These hurdles can be summarized simply as “timing.” Applying new technology at the right moment in the right markets is a huge challenge, but one we love. Here are some of the ideas we’ve tried, our results, and links to interesting related products. If you’re interested in founding a startup to address problems like these, please send us an email at email@example.com.
Advertisers want to show a diverse set of faces in their advertisements and stock photography. It’s reasonable to expect viewers will click more ads if the people in the ads look like them. Generating faces will soon be more cost-effective than expensive real-life photo shoots. Inspired by “this person does not exist,” we invented a novel way to navigate the space of all possible photos based on the desired age, gender, and expression dimensions. We did this work in 2019, based on an AI pipeline called StyleGAN, a year before DALL-E 1 was released. We called it Ganvatar and ran a trial with some alpha partners generating and testing ad variations.
What we learned is that it’s very difficult to disrupt advertisers’ existing high-touch processes. We could adjust the age, race, and gender of a single face in a photo, but we didn’t have a method to control lighting, setting, etc. According to advertisers, the quality of the images we were generating was not good enough yet.
GPT-3 can increase the efficiency of human writers. Business writers can answer more questions, and marketers get better exposure in Google when they remix their existing content.
In 2021, we created a Blog Post Creator called Writespeed. We were a bit too early. At the time, OpenAI was rolling it out slowly and frowned upon using GPT-3 for blog posts (out of concern for fake, inappropriate, or spam content). The cost of GPT-3 API calls was also high at that time, but pricing has now come down. Quality again was a concern, but also differentiation - given the dependency on OpenAI. The scope of what GPT-3 could write is limited. Any solution based on these technologies today still requires a large amount of editing by the human user. Lastly, we also had a limited pipeline of prospective CEOs.
The WriteSpeed project turned us onto the emerging field of “prompt engineering.” You can think of GPT-3 as the ultimate auto-complete. The prompt (or text pre-amble) input is critical. GPT-3 is a very flexible machine, if you’re creative with what you ask it to do. For example, we’ve begun to use it as a generalized text-summarization engine inside our other tools. It’s also a powerful common sense, fuzzy logic, and simple reasoning engine.
However, keeping GPT “focused” for an unstructured blog post was very difficult. For example, we could use it to write "top 5 reasons to get a dog during the pandemic" but we couldn't write "Why we think the housing market will stagnate in 2023." It’s worth noting that the data set GPT-3 was trained on was before the pandemic, so it knows nothing about Covid-19 for example. More specific topics require more interaction by the human writer, or information extraction from the web, in addition to careful construction of fragile prompts.
We also tried writing fiction from children’s books to romance novels. Here we ran into the character continuity issue (characters change from one request to the next), and also OpenAI blocking certain types of content requests.
We discovered interesting capabilities with taking loose text and summarizing it into a structured form. For example, we were working on providing a free-text interface to a legacy form application. With a prompt such as
GPT-3 would respond with:
This structured information can then be used to programmatically fill out a form.
We’ve even used GPT-3 to answer the question “Does person X live in Seattle?” An internal tool of ours does a Google Search via the Programmable Search API. It then feeds the Google results links and summaries into GPT-3 and asks it to summarize where the person lives. So, while Large Language Model algorithms are not a replacement for humans yet, they have amazing flexibility as a generalized Natural Language Processing engine.
The nature of programming is changing. Just like writing, with the help of foundational models trained on code, programmers will soon be training and guiding processes into existence rather than coding specific rules to handle every condition.
One idea we looked into was converting Figma designs into code. It’s hard for engineers to start from scratch when working from visual designs created by designers. The project was called CodeBrew. Demo video
Similar to the above experiments, we learned that the process still needs a human chaperone, and can introduce subtle bugs. Also, Codex (the code-trained version of GPT-3) is a pragmatic code producer. Even if it produces working code, it won’t necessarily produce code in the style you want (or even very readable code). Many others have carried this work further with impressive demos of code generation for constrained scenarios.
Here are two more ideas we’re investigating to inspire you.
Datatron is an OpenAI Codex and GPT-3 mash-up that allows users to drag their data into a web browser and start asking questions. Under the hood, Datatron has heuristics that write initial code describing the contents and datatypes found in the CSVs. This code seeds future calls to Codex. Datatron decides as the user types whether the reply should be in English (in which case it uses GPT-3 to generate it), should be determined by writing code (in which case it writes it via OpenAI and executes it in browser), or needs a combo of both. Datatron can easily present new tables and basic chart types, and users can debug and tweak the generated code using an integrated code editor.
Knowledge workers perform many repetitive tasks that can potentially be recorded (or described in text) and then generalized with generative AI techniques. Tools like OpenAI’s Codex (the backend Github co-pilot) will soon be used to generate scripts and macros that are currently created or recorded by humans. Because the scripts will be generated and refined using English, they will be more robust and less fragile. “Clippy, every day, send me an email with the latest sales numbers from the accounting spreadsheet.” We’re looking for knowledge workers with repetitive tasks we can automate, please email firstname.lastname@example.org if you’d like to be an early user.
GPT-based algorithms still need to be fine-tuned for your specific business. Realms will be a public/private space for generating, storing, and forking example sets. Critically, Realms will also provide automatic monthly fine-tuning so that your foundation model will have up-to-date knowledge of current events. Realms also enable generated output to follow the desired styles and best practices for your business. Realms is an online catalog of style guides for your marketing communications, contracts, web design, and stock photography.
Applying new technology at the right moment (and in the right verticals) is a huge challenge, but one we love. We believe the innovation and new customer opportunities introduced by this wave of AI innovation are significant enough, that - unlike prior AI waves where incumbent big companies benefited, startups stand to benefit disproportionately. If you’re interested in founding a startup to address problems like these, please send us an email at email@example.com.