Pondering the trajectory of generative AI offers a multitude of possible directions. This article delves into where the field seems to be heading and how we should prepare for these advancements.
Firstly, large generative models like OpenAI’s GPT-4 are expected to continue growing. GPT-4, while not acquiring new knowledge beyond September 2021, has increased its model parameters by an order of magnitude. This implies enhanced memory and generation capabilities. With more parameters, a model’s pattern recognition expands. Early GPT-2 models could only remember a couple of sentences, but GPT-3 began generating coherent paragraphs. GPT-4 is now capable of producing multiple pages of text. As evidenced in developer documentation, GPT-4 can handle 32,000 tokens or approximately 24,000 words. Consequently, publicly accessible generative language models have evolved from producing paragraphs to churning out novellas.
Future models like GPT-5 may manage 64,000 tokens or around 45,000 words, equal to an average business book’s length. However, as models expand, they demand more computational power for creation and usage. Despite this, the public and corporations will likely embrace these models for their convenience.
Imagine enjoying an unfinished story, be it due to an author’s abandonment or a show’s cancellation. With current and upcoming large language models, ordinary individuals could input an incomplete story and receive a logical conclusion. This extends to other large text projects, such as creating books or fine-tuning blog posts.
To capitalize on larger models, plan for their utilization. Contemplate the long-form content you want to tackle and design comprehensive, multi-page prompts to achieve desired results.
However, bigger generative models aren’t always superior. Computationally, multibillion or trillion-parameter models are slow, expensive to operate, and susceptible to oddities due to their vast linguistic scope. This leads to the development of smaller, purpose-built custom models designed for specific tasks. Companies often begin with a small model like Eleuther.ai’s GPT-J-6B and fine-tune it for a singular domain of expertise. While these models lose capabilities in unrelated areas, they excel in their focused function, often outperforming larger, general models.
Examples of custom models include BloombergGPT for financial analysis and GoCharlie’s CHARLIE LLM for content marketing. Forward-thinking companies with data that’s challenging for average users to process are increasingly opting for custom models tailored to their domain. Less ambiguous prompting is required, as the model’s domain knowledge is highly concentrated.
Software companies with moderately complex interfaces should consider integrating large language models to simplify tasks. For instance, Adobe could benefit from an LLM for tools like Photoshop, empowering average users to accomplish tasks through simple verbal commands.
Generative AI and large language models typically produce mediocre to good results, rarely matching true experts’ skill. Nevertheless, they can be useful when world-class results are not essential. For instance, AI-generated music may suffice for background tunes in a video, though it won’t replace professional musicians.
Custom-built large language models can also be applied to marketing, architecture, or engineering domains, automating specific repetitive tasks and saving valuable time.
The third upcoming trend is multimodal technology, where data transformation between modes is the focus. Tools like Stable Diffusion, Midjourney, and DALL-E already convert word prompts into images. The reverse process, transforming images into words, will also be possible. This capability allows for practical applications, like generating written summaries of photos or data visualizations.
Andrej Karpathy’s quote on Twitter, “The hottest programming language in 2023 is English,” encapsulates the idea of using natural language to program machines effectively.
The essential message here is to begin identifying non-text inputs, such as images, that require repetitive tasks and strategize how to utilize them with large language models.
The ultimate direction these models are headed involves text-like inputs that may not be text per se but can function as text. Consider a genetic sequence. A genetic sequence can be expressed like this:
AGTCATTGACATAAATCCAAGGATAATA
This represents the four DNA base pairs in text form. Imagine if, instead of creating limericks, we developed a model specifically designed to work with DNA and RNA while employing the same foundational technology. What possibilities would emerge if you could input genetic data? What could be generated or predicted?
1.) Innovative gene therapies.
2.) Potential vaccines.
3.) Insights into protein folding and misfolding.
4.) Quite literally, a cure for cancer.
The technology behind tools like GPT-4 and ChatGPT can achieve this with sufficient training and specialization. These models are adept at handling text-like data and making predictions based on that data – it’s difficult to believe that the world’s largest pharmaceutical companies aren’t already exploring this.
Generative AI’s future involves larger models, custom models, multimodal applications, and non-text use cases. If we, as a society and civilization, manage this effectively, we’ll witness immense benefits and significant advancements. Conversely, if we fail, we risk exacerbating income inequality and intensifying competition over scarce resources. Regardless of the outcome, this is the trajectory we’re on in the immediate future – think a year or less.
Are you prepared? Is your organization? The time to start planning is now.
And so, to summarize, this is where Generative AI is headed:
1.) Expanding Large Generative Models
Models like OpenAI’s GPT-4 are becoming larger, enabling them to remember more and generate more. For instance, GPT-4 can handle up to 32,000 tokens or approximately 24,000 words, compared to GPT-3.5, which manages around 4,096 tokens or 3,000 words. This expansion allows AI to generate content ranging from paragraphs to novellas, and potentially even full-length books in the future. However, as models grow, they require more computational power to create and operate.
2.) Customized, Task-Specific Models
As larger models become more computationally demanding, there is a trend towards developing smaller, purpose-built models tailored to specific tasks. Companies like Bloomberg and GoCharlie have already created custom models for financial analysis and content marketing, respectively. These specialized models offer improved performance within their domains and require less specific prompts due to their focused knowledge.
3.) Multimodal AI Applications
Generative AI is moving towards multimodal applications, transforming one mode of data to another. Currently, tools like Stable Diffusion, Midjourney, and DALL-E can generate images from word prompts. The next step is to reverse the process, inputting images to generate text. This development will enable tasks such as summarizing data from screenshots or describing conference photos.
4.) Text-Like Inputs For Non-Text Applications
Generative AI can also be applied to text-like inputs, such as genetic sequences. By building models to work exclusively with DNA and RNA data, we could potentially generate novel gene therapies, vaccine candidates, or even cures for diseases like cancer.
Master the Art of Prompt Engineering for Large Language Models: A Comprehensive Guide