In this article, we’re going to discuss the basics of large language models like GPT-4, Palm, etc., and the interfaces they use, such as ChatGPT, Bing, Bard, Microsoft CoPilot, etc.
How do I write better prompts?
That is the number one question people ask. So today, that’s what we’re going to tackle: how to write better prompts for large language models.
We’re going to rely on how the technology works to inform our protocols and processes for writing better prompts. For the most part, I’ll be discussing the models released by OpenAI, such as Instruct GPT, GPT-3.5 Turbo (the default for ChatGPT), and GPT-4.
First, let’s discuss what these models are capable of and what specific tasks they were trained to do.
Part One: Prompt Types and Structure
OpenAI released a research paper about a year ago for Instruct GPT, the immediate precursor to GPT-3.5, which is what ChatGPT now uses.
For that precursor model, ChatGPT started with it back in November 2022 when it launched.
OpenAI specified a collection of six core types of tasks that the model performed well on and that people were using it for:
- Generation and Brainstorming
- Knowledge Seeking via Open and Closed Questions and Answers
- Summarization and Extraction
So, what are these tasks?
Based on the documentation, they break down as follows:
➜ Generation and Brainstorming should be fairly obvious. It includes tasks such as content creation that result in completed content like a first draft, outlines of content, or some other form of content. Ironically, this is the category large language models are least good at. We’ll come back to that later.
➜ Knowledge Seeking via Open and Closed Questions and Answers involves using the language model like a search engine. Closed Questions and Answers involve providing the model with questions and answers, like a multiple-choice test.
➜ Conversation refers to actual chat, with people having real conversations with the models.
➜ Rewriting involves taking a piece of text and rewriting it in some way. This is one of the tasks that these models excel at.
➜ Summarization and Extraction involves feeding a model a large amount of text and having it condense or extract the text. This is another task large language models excel at.
➜ Classification involves giving a model a lot of text and having it perform classifying tasks.
Now, are there emergent tasks that don’t fall into these categories neatly? Yes, there are, as well as tasks that are a combination of one or more of these categories.
Somewhat ironically, the task people seem to use these models the most for, which is generation, is the task these models tend to do the least well at. That’s not to say they do it badly, but it’s the most complex and difficult task with the highest likelihood of unsatisfactory results.
Because the underlying architecture of the models is designed for transformation. GPT stands for Generative Pre-trained Transformer. Transformers are really good at understanding the relationship among words and remembering things, which is why they can create such realistic text.
When companies like OpenAI make these models, they train them on billions of pages of text to create essentially a massive probability matrix, like a probability library. When we work with these models, we are using these pre-trained probabilities. Hence the “P” in “GPT”: it’s a “Pre-trained” transformer.
So, how does this relate to the six categories and writing better prompts?
Well, consider the extent to which a machine must guess probabilities during generation tasks.
If you say, “write a blog post about the importance of seat belts in cars” as a prompt, the machine has to delve into its table of probabilities—its library of probabilities—to understand what cars mean, what seat belts are, why they are important, what a blog is, what a blog post is, and then generate patterns of probabilities to answer the question.
That’s why, when you write a short prompt for a generation task, you tend to receive lackluster outputs—outputs filled with bland language.
The machine has to guess many probabilities to fulfill that request, and when it does, the output is quite generic.
Contrast this with a prompt like, “rewrite this text, fixing grammar, spelling, punctuation, and formatting,” followed by the actual text.
What is the mechanism? What does the machine need to do?
It needs to scan the original text and examine the probability of words in its library of probabilities. It must look at the actual probabilities, the actual relationship of the words in the text you provided, and essentially fix the text based on its library of probabilities.
That’s why these machines, these tools, excel at tasks like rewriting. They don’t have to create anything. They don’t have to guess probabilities. They’re essentially just editing.
Think about this in your own life. Is it easier for you to write or edit?
Chances are, most people find it easier to edit something they’ve written rather than face a blank page. There’s that blinking cursor, and you’re like, “Oh, lovely.” It’s easier to edit something than it is to conquer the blank page.
So, let’s revisit the task list.
Which tasks use existing information versus which tasks require the machine to create something new? In other words, which is a writing task versus an editing task?
Generation and Brainstorming? That’s writing.
Knowledge Seeking via Open and Closed Questions and Answers? That’s writing.
Conversation? That’s writing.
Rewriting? That’s editing.
Summarization and Extraction? That’s editing.
Classification? Mostly editing.
So, what does this mean when it comes to prompts?
The more writing a machine has to do, the longer and more complex your prompts must be to provide it with the necessary raw materials required to produce a satisfactory output.
“Write a blog post about dogs,” is a terribly short prompt that will yield poor results.
A page-long prompt about specific dogs you care about, including their characteristics, data, breeding history, diet, etc., will produce a much more satisfying result for a generation task.
Again, we see this in the real world.
Suppose you hire a freelance writer.
How long does your creative brief need to be to help them generate a good result?
If you say, “Hey, write a 1500-word article about dogs,” any good writer is going to ask, “Could you be a little more specific?”
If you tell that same writer, “Hey, I need a 1500-word article about this specific kind of dog, its breeding habits, and all kinds of other details,” then you’re going to receive a better article from that writer.
Now, if you hire an editor, how detailed do your instructions need to be to help them generate a good result?
I would wager that the instructions you need to give to an editor will be shorter than the instructions you need to give a writer.
Still not so short that you say, “Here’s my book. Edit it.” That’s still not going to give you a good result.
However, you might say, “Hey, I want you to perform grammar edits or spelling edits or developmental edits,” so you give them something to focus on, but it’s still going to be shorter than what you have to provide in the creative brief to the writer.
The same is true for large language models like ChatGPT and the GPT-4 model.
For an editing task, a prompt like “fix grammar, spelling, punctuation, and formatting,” along with providing the text itself, is going to yield a very satisfactory result, despite the shortness of the prompt, because it’s an editing task.
That’s Part One of how to write better prompts. Now let’s tackle:
Part Two: Prompt Formatting
What should the format of a good prompt be?
Well, it depends on the system and the model.
For OpenAI’s ChatGPT and the GPT family of models, they are very clear about how they want developers, or people who write code, to interface with their models.
And this, by the way, is the secret to getting good prompts. Read the documentation for developers to see what they tell developers to do.
OpenAI offers a playground for developers to work with ChatGPT. What we see in the developer’s version of ChatGPT are three components: a system prompt, a user prompt, and an assistant.
The system part of the prompt intake is what we call a role. Here we define what role the model will be or what role it’s going to take.
For example, we might say, “You will act as a B2B marketer.” You have expertise in B2B marketing, especially lead generation and lead nurturing. You specialize in email marketing and email newsletters as key parts of audience retention and engagement strategy.
This role statement is essential for the model to understand what it’s supposed to be doing, because the words used here help set guardrails. They help to refine the context of what we’re talking about. They give the model more to look up in its probability library.
The second part of the prompt is the user statement. This is where we give the model specific directions.
Your first task is to write a blog post about the importance of a weekly email newsletter and an overall marketing strategy.
These instructions are what the model carries out.
The third part is the assistant part, where the model returns information. So, for writing tasks, having a robust system statement and an equally robust user statement is essential to getting the model to perform well.
The more words, the more text we provide, the better the model is going to perform because it basically means the model has to generate fewer wild guesses. It has more probabilities to look up and latch onto.
For editing tasks, you may not even need the system statement, because you’re providing all the text you want the model to work with.
The art and science of writing prompts is a discipline called prompt engineering.
It’s a form of software development, but instead of writing in a language like C, Java, or Python, we’re writing in plain natural language. But we are still giving directions to a machine for repeatable output.
That means we are programming the machine; we are developers, we are coders—we’re just coding in regular language.
For your prompts to perform better with these machines, adhere to the way the system is architected and designed, stick to the way the models work best, understand the different classes of tasks and what you’re asking of the machine, and then provide appropriate prompts for the kind of task you’re performing.
The bottom line? Always include a detailed system statement in any writing tasks. Optionally include them in editing tasks. And don’t be afraid to be very detailed with either, as you will always get better results.
Why is this method of prompt engineering different from the “Top 50 ChatGPT Prompts” webinar or eBook currently being advertised on your social media feed?
This method aligns with how the technology actually works, how it was built, and how companies like OpenAI are
instructing traditional software developers to communicate with their models for optimal performance.
When you know how something works, you can generally make it work better, and that’s why this method will likely work for you.