Generative AI: The Operating System of the Future Part 1

We are at the cusp of one of the largest technological inflections in our generation. Some industries will be uprooted and replaced; others will grow significantly as new markets are unlocked. With the proper framework, investors can be ready to take advantage of these inflections. This research article is meant as a primer on Generative AI, how it will evolve, and other perspectives.

Generative AI Primer

The rise of Generative AI has ushered in a new era: the mass production of intelligence. Intelligence, unlike the products of previous general-purpose technologies, has unlimited demand. Intelligence drives innovation and efficiency. Chat GPT was the big bang of Generative AI. Within two months, it had 100 million users, making it the fastest growing application in history. VC Investments in Gen AI have increased by almost 400% in 2023, reaching almost $30B. Gen AI and LLMs are in the lexicon of almost every management team today.  The appetite for Generative AI-driven workloads, such as large language models (LLMs), is voracious, fueled by continuous demand, such as text-to-video generation.  

The pivotal question is the pace at which Gen AI will improve. Will it accelerate faster than expected? Will there be another AI winter? Are LLMs an off ramp to intelligent AI? Currently, the acceleration of these models makes Moore’s Law look like its standing still, accelerating at a rate more than 100x that of GPU performance, shown in the graph below.  Chat GPT 4 is approximately 1000x larger (by the amount of training computation) than the original model Chat GPT 1, which came out less than two years ago. AI models are growing 1,000X every two to three years due to increased performance the more data they process. We will go into these ‘scaling laws’ in more detail later in the article.

How do LLMs work?

LLMs are prediction machines. They differ from traditional predictive systems, like those used in Google search, which rely on querying a database of similar words and selecting the highest next-word correlation. LLMs on the other hand understand intent and context. They are trained on vast amounts of text data to have this ability. The underpinning architecture of how these models train is called ‘Transformer’. Transformers emphasize the most relevant aspects of text, allowing them to derive proper context through their attention mechanisms. Transformers convert text into numerical representations, or “tokens,” assigning weights to these representations based on their importance in making sequence predictions. All of these representations can be thought of as vectors in a very high-dimensional vector space. This architecture includes parameters that can number in the trillions, with each parameter having a specific weight. This architecture enables LLMs to handle and process vast amounts of unstructured data efficiently. When an input is provided, these parameters and their corresponding weights interact to generate an output. As Karpathy succinctly put it, LLMs are a form of “lossy compression”. Chat GPT compressed the entire text corpus of the internet, from around 10 Terabytes into about half a Terabyte. It can answer almost anything a person would ever want to know.

LLMs operate in two main stages: Training followed by Inference. During the training stage, the LLM ingests a vast corpus of unstructured data, learning to predict the next word in a sentence and developing a broad understanding of language. It then undergoes fine-tuning, where its parameters are adjusted using more curated datasets and techniques like Reinforcement Learning from Human Feedback (RLHF) to specialize its responses. RLHF are large teams of people that rate the better answer provided by the LLM, giving it a feedback loop.  Once the fine-tuning is complete, the model can perform inference, akin to applying learned knowledge to generate responses. The cost of inference, or the computational expense, is rapidly falling due to more efficient hardware and querying methods. This trend has yielded ChatGPT now offering a limited free version, and other models being completely free.

LLMs are far from perfect, sometimes giving a completely erroneous response to a simple question. LLM error today stems from three factors: Model error, data error, and a certain amount of irreducible error.  Thus far, the largest cause of error is data quality.

LLM Error (# of Parameters, Data)= Model error + Data error +Quantum error

Potential Hurdles to Gen AI Improvement

The below factors could be large roadblocks to Gen AI improvements and could precipitate an AI winter.

Compute: The compute necessary to train these models is growing orders of magnitude faster than advancement in GPU processing capability. The solution is either significantly more GPU production or more specialized chips (ASICs) to train these models. Currently, GPU production is constrained by TSMC capacity.

Power: The infrastructure needed to train these models is getting exceedingly larger.  Training Chat GPT-4 with 1.7 trillion parameters was a substantial undertaking. It cost about $100 million and used about 60 GWhrs of energy, equivalent to the total energy usage of 70k homes for one month. These are staggering numbers and are transforming the entire data center landscape. Data centers alone could consume almost ten percent of US energy needs by 2030, from around 2% today. Bringing on large new data centers requires access to new power which is held up entirely by regulatory processes. We wrote about the evolution of data centers due to AI here.

Data: The appetite for Data ingestion by Generative AI-driven workloads, such as large language model (LLM) Training, is voracious, almost as voracious as a Quasar, fueled by continuous innovation and applications, like text-to-video generation. The amount of high quality text data is set to be completely used by 2025, named the “data wall”. However there are exabytes of data in the form of videos to offset this as well as the possible use of synthetic data. This is where the LLMs can generate data that it then trains on. This is analogous to DeepMind playing millions of chess games against itself.

Gen AI Architecture: There is a nonzero probability that Transformers sees diminishing returns and it is just a local maximum on the road to General Intelligence. Even if that were the case the economy should still see large productivity gains from existing technology.

The Ecosystem of LLMs

Looking across the LLM ecosystem, there are different “species”: i) there are the large foundational models such as Chat GPT (closed source) and Llama (open source, meaning the parameters and weights are publicly available); ii) there are Small Language Models (SLMs) trained purely on unique data sets; iii) there are Mixture of Experts (MOE), which are groups of specialized models combined together; and iv) lastly, there are customized foundational models, which are foundational models with parameters tweaked through additional training on domain-specific knowledge. The illustration below depicts the growth of this ecosystem.

image 1
Morgan Stanley Research

The competition for large foundational LLMs is fierce. The dynamics include i) heavy fixed costs via enormous data centers, ii) high incremental margins, iii) enormous TAMs, and iv) a virtuous data feedback mechanism.  As a result, META, MSFT, and AMZN are hurling hundreds of billions of dollars building out the infrastructure for these models. Nvidia has been the clear winner of this arms race. Given these dynamics, the foundational models will likely coalesce to about two to three players.

Current Drawbacks

The Generative AI models today are still one-dimensional with their knowledge base. This means that if a prompt is changed slightly, it could give entirely different results even though it’s almost the same prompt. LLMs do not yet have an intuitive model for how the world works. Although the memory and the proper querying ability is there, the intelligence still is lacking.

Another weakness is the Lack of a System 2. Humans have both a System 1 and System 2 mode of thinking. The latter is for deeper and more precise cognitive thought vs the more instantaneous one. Gen AI models today lack a System 2. They simply respond with the first thing that is generated which sometimes results in “hallucinations”. There is no doublechecking or deeper abstraction taking place. To counter these shortcomings, techniques such as more sophisticated chain-of-thought and tree-of-thought prompting, and Retrieval Augmented Generation (RAG) are used. RAG, in particular, has become very popular, essentially “looking up” answers that are not in the Foundation Models current knowledge base.

Future Evolution of Gen AI

How will these Gen AI models progress?  Will there just be larger and larger models? Or will there be a new architecture that allows them to learn more efficiently? In the interim at least, these models will get significantly larger as more data and computing power continue to yield better results. Scaling laws show that simply increasing the number of parameters and data increases the accuracy of the LLM. A current rule of thumb is that ten times the compute results in half the error rate.  

Although these models are staggeringly large, when compared to the human brain, they seem paltry. There are over 100 billion neurons in the human brain and over 100 trillion synapses. The human brain consumes only about 20 watts yet intakes roughly 2 Megabytes per second of information. By age ten, a child has been exposed to 200 times more information than what Chat GPT 4 was trained on. It is estimated that the human brain could have up to 10^22 Flops of processing capability. All of this is to say that Gen AI has a long way to go to catch up to the amount of data the brain processes.Large Language Models to Large Vision Models (LVM) Although LLMs have been the primary use case for Generative AI, the next step in evolution for Gen AI is multimodality (audio and visual). It can already speak to you with intonations etc. In the near future, creating a movie will be as easy as typing in “Give me the movie Dune but written and directed by Quentin Tarantino”

Agentic Workflow. Agentic workflow is where a “team” of AI agents are able to interact with each other independently of human queries, so that they can actually carry out an entire workflow, like planning a trip and obtaining ticket reservations for each stop in the itinerary. This goes beyond simply responding to a prompt, it is more like assigning a task to a team. Since in principle there is no limit to how many agents can work as a team, this AI capability appears to have tremendous potential. This ability for agents to iterate at rapid speeds yields fascinating results. For instance, Chat GPT 3.5 significantly outperformed Chat GPT 4 using this approach.

The New Operating System. Analogous to how Operating Systems manage input, files, applications, memory, and output; LLMs will evolve to manage “contexts” and “agents” and incredible new forms of output, while evolving their interfaces with humans. It will be able to use tools such as the internet and high-powered applications and will have access to all of our data.

image 3
                             Karpathy : “Intro to Large Language Models”

First-order effects of Gen AI

“I thought AI was going to wash my dishes so I could write poetry; not the other way around.” Creativity is one of the first capabilities AI truly shined at; for instance it is able to conjure up any conceivable image or write poetry in any style or fashion eg “write the Raven as if you were Julius Caesar”. Generative AI runs opposite to most other inventions which effectively took the “plow” off of the back of animals and humans. Jobs requiring physical labor are likely to stick around, while the more digital based, higher-paying positions that require advanced education are at substantially more risk of being replaced.

Service as a Software: About 80% of the economy is service-based, largely untouched by technology due to the abundance of unstructured data and complex reasoning requirements. This will change. Gen AI will significantly automate a large part of the service industry. The level of automation will depend on both the volume and the expertise needed. The primary target will be work that is high-value, high-volume or both. Pricing models will change from a subscription basis to more consumption based one.

Companies with informational advantages such as proprietary datasets will be the initial primary beneficiaries of Gen AI. Prior to Gen AI, this data was analogous to hard-to-reach oil deposits. Now it can be better analyzed and queried for insights.  This increases its value dramatically and companies will want to keep it under lock and key. New industries such as ‘Data Protection as a Service’ will expand. Companies that act as data depositories such as BOX could see completely different economics as it can aggregate and synthesize data better for organizations.

Cost of producing content goes to zero. As inference costs drop, the cost to create an image or video drops. This will benefit social media platforms such as META whose business model is to increase engagement through content recommendations. The higher the quality content, the better the engagement. Another consequence will be the rise of AI influencers, the latest social media evolution. The ability to cheaply spin up worlds and characters will increase the use of Augmented and Virtual Reality.

Search costs go to zero. Searching costs have risen as the availability of every type of good and service has increased. People are inundated with choices. A google search is a labor. However, an LLM that can understand intent will greatly increase search ability. This will benefit marketplaces with significant heterogeneous products/services that were previously difficult to parse through. Etsy and others are possible beneficiaries.

In the interim AI will magnify asymmetries between the 1% and 99%. Whereas top computer scientists used to be 10x the value of the average computer scientist, it will now be 50x. This productivity magnification will see the burgeoning of small companies as startup costs are reduced and more can be done with less. Jobs will become polarized by either extreme generalization or specialization.

Communication will become more tailored. This will lead to an increase in media /newsletters. Advertising agencies will become more effective at targeting. Education will see a resurgence as content becomes more specific/engaging to individual students instead of classes.  

Weaponization of AI. As Generative AI proliferates, the number of cyber threat vectors will accelerate. AI hacking will become a more prominent phenomenon. Adding to this is the fact that hacking is the greatest point of leverage for various state actors, creating a veritable cocktail for malfeasance. The security measures for protecting the “weights” and architecture of Generative AI models will grow asymptotically. LLMs themselves are vulnerable to methods such as prompt injection, insecure plugin design, and remote code execution.

Second-order effects of Gen AI

AI will displace jobs previously thought as untouchable. As time goes on and social acceptance of AI grows, we will find AI more compelling in jobs like psychiatry. In the future, two factors will determine if AI will be heavily used in a certain industry: The amount of existing data and the data that can be generated.

Rise of the AI economy. Millions of AI Agents will be generating revenues and people will be able to buy and sell fractional ownerships of them. This marketplace will be one of the largest on the planet.

Trust will become a much rarer and more valuable commodity. As the proliferation of deep fakes continues, AI content and real content will not be discernable. “The Human Touch” will become a slogan to promote trust in companies that do not use AI. Pencil, paper and biometric scanners will become more commonplace to ensure the human element.

Parting thoughts

Generative AI is a general-purpose technology with limitless potential. While there may be a lull in progress or setbacks, a single leap forward in ability will have profound impacts upon society. As forward-looking investors, this is a technology we will continue to closely monitor. In Part 2, we will delve into more recent advancements in Generative AI and identify specific beneficiaries poised to capitalize.

Past performance is not necessarily indicative of future results. All investments carry significant risk, and it’s important to note that we are not in the business of providing investment advice. All investment decisions of an individual remain the specific responsibility of that individual. There is no guarantee that our research, analysis, and forward-looking price targets will result in profits or that they will not result in a full loss or losses. All investors are advised to fully understand all risks associated with any kind of investing they choose to do.