Navigating the world of generative AI can feel overwhelming, especially for those just taking their initial steps into this burgeoning field. A question I am frequently asked is, “So, where do I start with generative AI? What’s the first step? And which tool should I use?”
The tool you choose will be contingent upon your specific needs and objectives. Do you wish to distill vast amounts of text? Craft compelling narratives? Or create stunning visual artwork? To assist you in making these decisions, I have compiled this introductory guide.
About This Guide
Let’s open by delving into the intricacies of this guide. It is crucial to grasp the underlying philosophy that informed my selection of tools, as it is integral to the overall value and utility of the guide.
First, you will note a distinct lack of mention of specific vendor names, although the marketplace is brimming with commendable companies. Moreover, you will find many named products conspicuously absent from this guide. What you will find, however, is a plethora of foundational technologies. This deliberate choice stems from the observation that foundational models evolve at a pace that often outstrips the ability of most vendors to keep pace. Case in point, less than a month ago, OpenAI unveiled GPT-4V, their groundbreaking multi-modal vision model capable of interpreting image data.
Adhering to best practices in AI necessitates staying abreast of foundational models, as their advancements translate into enhancements in your own capabilities. It is not advisable to become overly reliant on a specific SaaS vendor unless their offerings are exceptionally unique and unparalleled in the market. That said, I find such companies to be few and far between.
A plethora of AI companies essentially serve as polished interfaces for foundational models. So the question becomes, why not leverage the foundational models directly?
It is pertinent to note that this guide includes tools that are third-party software, such as ChatGPT. It is important to recognize that you do not own this software. There are numerous scenarios involving sensitive information, such as trade secrets or confidential data, where the utilization of third-party tools would be deemed inappropriate or even illegal. As a user, it is your responsibility to ascertain the appropriateness of a tool based on the nature of the data at hand.
Finally, let’s touch briefly upon the financial aspect. While there are free versions of these tools available, it is important to understand that their capabilities are limited in comparison to their paid counterparts. For instance, the free version of ChatGPT operates on the GPT 3.5 model, which is significantly less potent and less informed than its paid version. Currently, the free access to ChatGPT is three iterations behind the paid version. So if your budget allows, I strongly recommend investing in the premium versions of ChatGPT, Anthropic’s Claude, or both. Each are priced at $20 per month. If your intended use cases align with those outlined in this guide, I believe you will find the investment to be more than worthwhile. With that groundwork in place, let’s delve into the substance of this guide.
For Reading And Writing Text Under 8,000 Words: ChatGPT Plus with GPT-3.5 or GPT-4
Our initial categories focus primarily on text-based tasks. For those of you delving into text that doesn’t exceed 8,000 words in length, such as blog posts, emails, and the like, I wholeheartedly recommend ChatGPT Plus, a premium iteration that utilizes the GPT-4 model. The rationale for this preference lies in the immense popularity of this tool, which consequently boasts an extensive repertoire of examples and a vast community of knowledgeable individuals ready and willing to assist you should you encounter any obstacles. GPT-4 stands a full head and shoulders above its counterparts when it comes to tackling these more concise text-related tasks.
That said, it is important to note that GPT-4 is equipped with what is known in technical parlance as a “context window,” which essentially functions as its working memory, dictating the extent of its recollection capabilities. Currently, this context window ranges from 8,192 to 32,768 tokens, with tokens being akin to fragments of words approximately four letters in length. By way of illustration, the paragraph you’ve just read comprises approximately 90 words and 120 tokens.
This ChatGPT limitation, even in its paid version, becomes evident when engaged in prolonged conversations; it begins to exhibit signs of forgetfulness. Its memory operates akin to a rolling window, where new information supersedes the old, causing the latter to fade out of memory.
For Reading And Writing Text Over 8,000 Words: Anthropic’s Claude 2
For textual endeavors that surpass the 8,000-word threshold, my recommendation shifts to Anthropic’s Claude 2, given its formidable context window that accommodates up to 100,000 tokens, translating to roughly 70,000 words at any given time. So, you might be wondering then why one shouldn’t simply opt for Claude for all text-related tasks. The answer lies in Claude’s limitations. While commendable for its generous token limit, it tends to falter when it comes to reasoning. Despite these advancements, the GPT-4 model, especially when paired with ChatGPT Plus, remains the undisputed champion in a multitude of tasks, particularly those necessitating intricate reasoning and meticulous data analysis.
For Creating Images: Microsoft Bing Image Creator Or ChatGPT Plus with DALL-E
Where image generation is concerned, there are two commendable options at your disposal. On the free end of the spectrum we have the Microsoft Bing image creator, which leverages the prowess of OpenAI’s DALL-E 3 model to serve as its backend engine. This no-cost option generously provides users with the capacity to generate approximately 100 images daily. The prerequisites for this service include a Microsoft account (which you can create for free), and a Microsoft-supported web browser, with Microsoft Edge being a prime example, particularly for Mac users.
Furthermore, this feature is seamlessly integrated into the Bing mobile application, making it conveniently accessible across various mobile platforms. On the paid front, ChatGPT Plus emerges as an optimal choice. This premium version natively incorporates DALL-E 3, distinguishing it as my preferred option. Unlike the Bing interface, ChatGPT Plus facilitates a natural language conversation with ChatGPT, empowering you to fine-tune your images effortlessly. This innovative approach allows you to articulate your preferences verbally, such as desiring more diversity in the age and body types of people depicted in a photo, or opting for scrambled eggs instead of sunny side up. In response, ChatGPT Plus will intelligently adjust the DALL-E 3 prompts on the backend to match your specifications. This method is not only more user-friendly, but also significantly more intuitive for those new to image generation models. Hence, if you’re a beginner with the financial wherewithal to opt for ChatGPT Plus, then I wholeheartedly recommend choosing ChatGPT Plus with its integrated DALL-E 3 feature, as it currently stands out as a premier tool in the realm of integrated image generation.
For Analyzing Images: Google Bard or ChatGPT Plus with GPT-4V
While the creation of images is an impressive feat in itself, the ability to analyze these images adds another layer of complexity and utility. At present, Google’s Bard software and ChatGPT Plus are the trailblazers in offering image upload and analysis capabilities. Bard, being free of charge, has demonstrated considerable aptitude in image recognition during my evaluations. ChatGPT Plus, with its premium status, also now supports image uploads, thereby allowing users to pose queries regarding about uploaded images. For instance, you could upload an image of a homepage for UI analysis or a page from a book written in a different language for translation purposes. Additionally, the AI could be prompted to list common ingredients and potential recipes upon uploading a photo of a meal. It’s worth noting that ChatGPT Plus occasionally fumbles when faced with relatively obscure images, and there are certain limitations to be mindful of. Currently, neither Bard nor ChatGPT Plus is authorized to process images depicting human faces. As such, should you wish to upload an image featuring a human face, you’ll need to obscure the face to prevent triggering the model’s security protocols.
For Real-Time Information: Microsoft Bing or Google Bard
Now let’s talk about real-time information, including all of the latest news and current events that shape our ever-evolving global landscape.
Tools like ChatGPT, along with its various iterations, are fundamentally rooted in large language models. These models are meticulously trained on extensive datasets, the specifics of which remain unknown to us. But it is crucial to highlight that these datasets are not current. ChatGPT’s knowledge is capped at January 2022, transporting our searches back in time by over a year and a half.
Similarly, Anthropic’s Claude draws the line at October 2022, placing our searches a full year in the past. In situations where the most recent and accurate data is paramount, the logical step is to employ an artificial intelligence system that seamlessly integrates with real-time data sources. Here there are two contenders that rise above the rest: Google Bard and Microsoft Bing. What sets these tools apart is their adept utilization of large language models as the foundation upon which they construct intricate search queries. These queries are then relayed to the search engine, which transforms them into natural language. This methodology is currently the gold standard for acquiring the latest and most current information. For instance, if your quest for knowledge revolves around the ongoing conflict in Ukraine, it is these state-of-the-art services — Google Bard and Microsoft Bing — that will deliver the most up-to-date and relevant information when you need it.
For Analyzing Data: ChatGPT Plus with Advanced Data Analysis
Here we venture into the realm of data analysis, with a particular focus on tabular or rectangular data. While we have the option to manually input CSV or tabular data within the text prompts themselves, the more efficient and user-friendly approach is to upload a spreadsheet directly, thereby allowing the AI to perform comprehensive analysis on the data provided. This methodology isn’t restricted to just spreadsheets; it’s equally applicable to Excel files and a multitude of other data formats as well. Today, standing tall as the premier tool for data analysis, powered by the latest in artificial intelligence, is OpenAI’s ChatGPT Plus, which comes equipped with the Advanced Data Analysis plug-in. This combination isn’t just a tool; it’s a guide that meticulously walks you through each step of the analysis, all while generating operational code. This is not just about simplifying the process; it’s about empowerment, giving you the ability to preserve the generated code for future use. And that’s not all; this tool is adept at handling even the most intricate forms of data science and statistical analysis. In essence, if there’s a Python code for the data analysis you seek, then you can rest assured that ChatGPT Plus has the capability to craft it for you, and tailor it to your specific requirements.
For Sensitive Information / Restricted Topics: Open Source, Locally Run Models Like LLaMa 2 via LM Studio, etc.
In this digital age, while technology brings us countless conveniences, it is imperative that we tread cautiously when it comes to certain types of data. There are instances where it is simply non-negotiable to avoid third-party systems, particularly with data that is of a sensitive nature, such as personally identifying information, protected health information, classified documents, state secrets, trade secrets, and the like. The security of this data must be protected at all costs.
In situations that demand the utmost confidentiality, the ideal course of action is to employ an artificial intelligence that operates solely within the confines of your own network or computer. This ensures that your critical data remains securely within your system, never venturing out into potentially vulnerable territory. When it comes to safeguarding sensitive information, your best bet is to opt for Open Source models such as Llama 2, complemented by interfaces like LM Studio. Notably, both of these options are available at no cost. However, it is worth noting that these models require a bit more effort to setup and come with certain hardware prerequisites. For example, if your laptop has the capacity to seamlessly run a Triple-A video game at its highest resolution, then you can rest assured that it is more than capable of running these systems. Conversely, if your laptop struggles with video gaming, it may not be adequately equipped to handle these tools.
The beauty of these Open Source models lies in their flexibility and independence. Once installed and operational, you can completely disconnect from the internet, turn off Wi-Fi and unplug all cables, and the models will continue to function impeccably, because they run locally on your machine, thereby providing you with a secure and reliable means of utilizing artificial intelligence with confidential information.
For Writing Code: ChatGPT Plus or CodeLLaMa
Finally, we delve into the realm of coding, where there are now many options available to suit various needs. But when it comes to the versatility and comprehensiveness of coding languages, ChatGPT Plus, powered by the GPT-4 model, stands out as a stellar tool. This AI model is proficient in a diverse range of languages, from COBOL to contemporary languages such as Swift and Python. Pro tip for those in the financial sector or similar industries that rely on legacy mainframes. Should you find yourself in a bind due to a retired COBOL programmer, ChatGPT can now fill that void.
It is important to note that while ChatGPT Plus shines as a general tool, there are specialized open source models that may surpass the GPT-4 model in specific languages. A prime example is Code Llama, a remarkable tool that is particularly honed for Python coding. So, for those with a penchant for Python, Code Llama is your go-to model.
Let’s not lose sight of the fact that generative AI encompasses a vast array of applications, far beyond what has been discussed here. This compilation is intended to serve only as a foundational guide for those who are just beginning to dip their toes into the expansive ocean of generative AI, as well as a robust generalist toolkit for accomplishing a wide range of tasks.
As of this moment, these tools are the pinnacle of what the market currently has to offer. But we all know that the ever-evolving landscape of technology means that there are no guarantees. A new and potentially superior tool could emerge tomorrow, rendering this guide, and it’s currently cutting-edge tools, completely obsolete.