The AI world has seen some groundbreaking progress lately. One case in point: Fable Studios has successfully harnessed a collection of AI systems, aptly named SHOW-1 Showrunner Agents, to recreate an episode of the popular TV show, South Park.
From a casual viewer’s standpoint, the AI-created episode is passable. While it might not have you rolling on the floor laughing, it captures the essence of the show so convincingly that one might chalk up any discrepancies to an off day for the writers.
This raises a fascinating question. The Turing test of today. Can consumers discern whether a piece of content is machine-generated? More importantly, would they care?
Diving into Fable’s paper and the demonstration video reveals some intriguing facts. First, the AI-recreated episode’s accuracy is remarkably high. By employing OpenAI models, the creators achieved a decent comedic value, though it was slightly muted compared to South Park’s traditional edginess. Employing an uncensored open-source model might have better captured the show’s distinctive humor.
Astonishingly, the SHOW-1 system crafted this 22-minute episode in just 3 hours and 15 minutes. For anyone not in the know, this kind of turnaround time is unheard of in the entertainment industry, where just agreeing on a plot could consume this amount of time or more. Nevertheless, it’s crucial to factor in the training time required for the Stable Diffusion models. Constructing 1,200 characters and 600 background images can take a couple of days using modern GPUs. However, this is a one-time upfront cost in the production process, with no repeat required unless the output quality degrades.
The implications are stark. For formulaic shows like South Park or The Simpsons, with decades’ worth of content available for training, AI could feasibly generate new episodes. Original shows, lacking this wealth of training data, would still need a human touch. However, for a franchise with 3-5 seasons, AI could very well continue the narrative. While it’s no surprise that South Park’s simplistic animation was easy to duplicate, it remains a considerable achievement.
The rapid progress of AI is astounding, as showcased by the relative ‘infancy’ of the Stable Diffusion model, which was released barely a year ago. The advancement of this technology is akin to a toddler transitioning from finger painting to commercial animation in a single year.
This scenario underlines a broader lesson: Ensemble systems like SHOW-1 are more proficient than single, larger systems. It’s unrealistic to expect AI systems to be ‘one-size-fits-all’ solutions. The most effective AI strategy involves an ensemble of tools, AI or otherwise, working in harmony, much like a chef using various appliances for distinct purposes in creating the perfect meal.
And now, for the second major development. Meta (previously Facebook) has launched LLaMa 2, its new language model, as promised. With significant enhancements over its predecessor, LLaMa 2 matches the performance of OpenAI’s GPT-3.5-Turbo, the engine that powers ChatGPT, in many standard AI benchmarks.
This is monumental news. The LLaMa 2 model, part of the open-source family, surpasses many other models in benchmark comparisons, sometimes significantly. Its performance even surpasses that of higher parameter models in certain cases. For those unacquainted, the ‘xB’ refers to the number of parameters (in billions) in the model, akin to the density of toppings on a pizza. More parameters improve accuracy (taste) but may slow down the model (cooking time).
LLaMa 2, however, delivers superior performance without sacrificing speed.
Further, LLaMa 2 is closing the gap between closed-source models by matching the quality of OpenAI’s GPT-3.5. While GPT-4 still holds the crown, LLaMa 2, backed by the robust open-source community, is closing in quickly, promising exciting advancements in the world of language models.
Mastering version control is among the most formidable challenges in enterprise software management. When deploying a software to an audience in the thousands or millions, be it employees or customers across the globe, one requires a system that delivers consistent performance under standard operating conditions. These software need to maintain reliability over long periods. This explains why in production environments, the computer systems often lag in terms of even basic operating system updates, as compared to consumer hardware. Unreliable software deployment on a global scale is simply a risk enterprises cannot shoulder.
These software lifecycles can span years. Case in point, Windows 10, released in 2015, still dominates the PC market 8 years later, with a 71% share of Windows installs, per Statcounter data.
Comparatively, the model lifecycle of companies like OpenAI operate at a much swifter pace in an enterprise context. For instance, they recently announced a sunset on all their older models, transitioning everyone to the Chat Completions API within a year of its launch, scheduled for January 2024. A single year might be a sizable timeframe in the AI industry, but in enterprise software, it is but a fleeting moment. For large corporations, a two-year timeline for software deployment isn’t uncommon. Now imagine implementing a major code update in the middle of this.
Possessing software that can be downloaded and run on local hardware offers significant control. This grants you authority over versioning, deployment, and user experience as the decision to distribute new versions rests with you and your corporate IT team. With Meta’s LLaMa 2 model, you can integrate a large language model interface in your enterprise, ensuring its stability across local distribution within your organization until you’re ready for an upgrade on your terms.
Another major benefit of LLaMa 2 is its cost-effectiveness. Until your application reaches a threshold of hundreds of millions of monthly users (per the license terms), it’s free. A similar strategy was adopted by Google for Android, propelling it to become nearly ubiquitous in mobile devices. Meta seems to be emulating this strategy, giving away high-end software in the hopes of setting the standard.
Most importantly, it democratizes access to large language models. Even individuals who can’t afford OpenAI or Anthropic’s APIs, particularly those in developing countries, can leverage LLaMa’s nearly cutting-edge performance for free. Consequently, all ChatGPT features in its free version are accessible globally without the need for a credit card.
Why did Meta (formerly Facebook) adopt this approach? Firstly, because it fosters innovation in their model as the nature of open-source software encourages hundreds of thousands of developers to contribute to improvements, which can be incorporated into Meta’s internal models.
Secondly, it acts as a safeguard against AI monopoly, considering the looming threat from major providers like OpenAI. By releasing open-source models which eventually become industry standards, they circumvent becoming OpenAI consumers and instead ensure their model is the gold standard. This strategy appeared to bear fruit when, within the first 24 hours of posting, the Hugging Face AI model site displayed 301 derivatives of the LLaMa 2 model.
For consumers and businesses alike, LLaMa 2’s launch is a milestone. It allows the integration of the model’s versions into our own products without any licensing or fee concerns and functions even without an Internet connection. We now have the capability to incorporate one of the most proficient models into an array of software packages, from accounting to web development, to film production software; essentially, any software that could benefit from a natural language interface. With LLaMa 2’s introduction I expect language model interfaces to pervade commercial software, leaving those not without one behind the curve rapidly.
Given its open-source nature, which allows anyone to download the model for free, fine-tuning it for specific tasks becomes a feasible option using a range of advanced techniques. We could even fine-tune it to strip out any unnecessary abilities, like telling jokes, while enhancing its capabilities to pick stocks, input medical records data, or even detect stress in written customer communication. LLaMa 2’s architecture is conducive to fine-tuning, and because it’s code you install on your machine, the tuning process is under your control.
The impact that this model and its successors will have on the large language model and generative AI landscape is enormous. For tech enthusiasts, honing your skills with these models is essential, as the demand for embedding a language model into your products or services, including customer care, is sure to emerge from stakeholders sooner or later. These models deliver high-quality output at virtually no extra cost. Regular business users and consumers can anticipate that large language models will become an integral part of software, making the understanding of prompt engineering absolutely vital to maximizing their utility.
Used wisely and responsibly these novel models and technologies will signal the arrival of unprecedented capabilities and enhancements. Imagine conversing with almost anything and receiving coherent responses. Imagine engaging in meaningful dialogue with machines as seamlessly as you would with a fellow human. We are entering the Intelligence Revolution, a time where humans are bolstered and empowered by our machines. As a developer, manager, or creator, you have the opportunity to lead in this Intelligence Revolution, fortified by AI. The future is now.