Knock knock; Who's there? Understanding GPT-3 Prompt Engineering

Apps wrapping GPT-3 in a domain-specific interface are starting to appear, and I wanted to give non-technical folks some insight into how these apps are being built. It's really fascinating and probably not what you'd expect.

You already understand GPT-3 because you understand Knock-Knock jokes. If I asked you to complete the following prompt, what you would say?

If you're an English speaker, you know the answer is almost certainly "Who's there?". You know this because "Knock Knock" is a common PROMPT for which the COMPLETION is a particular formulaic joke.

Now imagine a Knock Knock Joke AI. It embeds within it the structure of a knock knock joke, a memory of every knock knock joke ever told, and other useful patterns of human language.

Give the Knock Knock Joke AI the first part of a joke (PROMPT), and it can generate various plausible COMPLETIONS. For this prompt, "Banana Who?" is a likely COMPLETION, and "One hovercraft, please" is an unlikely one.

This is all you need to know to understand how some of the apps using GPT-3 work. GPT-3 has read every knock knock joke on the internet, as well as everything else. It embeds that information in a representation that allows it to create COMPLETIONS for all sorts of PROMPTS.

To get GPT-3 to do any one particular thing reliably---restaurant reviews, suggest startup ideas, simplify headlines---you have to find the right prompt to produce the completion you want. You have to find a Knock Knock joke pattern it understands.

Here's an example: say I want a GPT-3 powered app that shows sightseeing recommendations on a map. I'm going to need to find a PROMPT that reliably causes GPT-3 to return a COMPLETION with tourism recommendations, for all parameterizations of location.

Right now, many of the first wave of GPT-3 apps out there are based on having found a useful PROMPT through trial and error. Here's one that works well for sightseeing recommendations. Just replace $YOUR_CITY with a new city name.

New York, USA: Empire State Building, Lincoln Center, Union Square
Taipei, Taiwan: Taipei 101, Maokong, National Palace Museum
San Francisco, USA: Golden Gate Park, Embarcadero, Haight-Ashbury
$YOUR_CITY:

And here's an example COMPLETION of that PROMPT $YOUR_CITY is replaced with Madrid, Spain. No cherry picking of results was performed.

Gran Via, Santiago Bernabeu Stadium, Torre Tagle, La Gran Espanola, Toledo

So how are many of these initial GPT-3 apps being made? Essentially through a weird new kind of conversation engineering. Prompt engineering. Because GPT-3 learned from human language directly, the only way to access its knowledge is via more human language. Wild, right?

"So is this as powerful as I thought it was?" Yes and no. The above comments are about the public version of GPT-3: a pre-trained and general purpose model. The generality is why all this prompt exploration is necessary. (And it's astounding it's even possible)

The real power of GPT-3 will be the second wave of apps that fine-tune this general purpose model around specific datasets. This, I suspect, is what Microsoft was really after when they licensed the model exclusively.