The Evolution of Data Centers

In this industry essay, we discuss:

  • How data centers are evolving due to AI workloads and new hardware requirements
  • The trends that will be occurring due to this evolution from compute to chiplet architecture
  • The industry participants that are poised to benefit from this inflection
  • Potential headwinds that could inhibit the adoption of this new data center architecture

The rise of AI

Traditionally, Data Centers have gone unnoticed, humming in the global background powering things such as Google’s searches and Netflix streaming videos. However recently, data centers have taken the front seat due to Generative AI Revolution and advances in High Performance Computing. Jensen Huang, CEO and founder of Nvidia, forecasts that within the next four years, expenditures on data center equipment for AI workloads will reach an estimated $1 trillion.

With the tsunami of AI demand for compute and storage, the data center industry faces a pivotal challenge. The vast majority of existing data centers are structured around CPU-based servers, which are rapidly becoming inadequate for modern requirements. To address this, a significant overhaul is necessary to integrate the newer GPUs. Unlike CPUs, which process instructions sequentially, GPUs are capable of executing thousands of parallel computations simultaneously, offering a substantial computational edge and efficiency over the traditional, energy-intensive CPU servers. (Although H100 GPUs say are rated for a mind boggling 700 watts, they are much more efficient than CPUs). The new standard of performance is computation per watt, a critical factor in preventing the power requirements of data centers from becoming untenable. Despite the computational efficiency advancing exponentially, the demand for compute is increasing at an even greater rate. The appetite for AI-driven workloads, such as large language models (LLMs), is voracious, fueled by continuous innovation and applications, such as text-to-video generation. These models are expanding rapidly: for instance, Chat GPT 1 contained 1.5 billion parameters, whereas Chat GPT 4 now boasts 1.7 trillion. This shift from CPUs to GPUs necessitates a rearchitecture of data centers, spanning network infrastructure, cooling systems, and power management. We delve into what a data center is and hardware trends that will result from increasing AI adoption.

What is a data center?

A data center is a facility that houses many servers all interconnected via cabling and switches to enable data processing, storage, and distribution. A server is a personal computer on steroids: It possesses much greater RAM, computing power, and multiple redundancies. Data centers are the workhorse of every computational workload, they need to have extremely high fidelity and run times. Some data centers are only offline for 25 minutes in the entire year. Data centers are typically owned and operated either by co-location providers, telcos, or hyperscalers.

A data center has thousands of servers housed in racks. A rack is simply a cabinet full of servers. These servers range from computer servers to memory servers and connecting them is a switch. Traditional data center networks use a ‘spine and leaf’ configuration. In this configuration1, the leaf layer consists of switches that connect directly to terminal equipment such as servers, storage devices, and other networking components. Each leaf switch is interconnected with every spine switch, which facilitates communication between devices in the leaf layer and extends connectivity to other segments of the network via the spine layer. These layers are, in turn, connected to a backbone network, which links to routers that integrate the data center with the internet or other data centers. Approximately 80% of data transmission occurs internally within data centers.

DC 1
Data Center Configuration

There are about 5,500 data centers in the USA, making up half the global number of data centers. The average data center holds about 100k servers and requires significant energy to run. Each rack typically consumes 15-20 kilowatts. Currently data centers in the USA make up about 2% of energy demand or about 80 TWhrs. A byproduct of this energy consumption is heat. Running data centers gives off tremendous heat and left unchecked, this would compromise the equipment, however with modern cooling methods, temperatures can be kept in a stable operating region. Surprisingly, about half the power consumed in a data center (shown below2) goes to cooling servers in order to have them be able to run for extended periods of time. The traditional methods of cooling are with air: Cool air enters the server through a fan, absorbs the heat from the heat sink and gets expelled out of the other side of the server.

DC 2
Data Center Power breakdown

What’s next?

The traditional data center has not changed much since the 80s. That is until now. The latest data center rack unveiled at Nvidia’s GTC 2024, “the Woodstock of AI”: 72 Blackwell chips connected by 4 NVlink switches, liquid cooled, with 150 Tbps Infiniband (we go into more detail on these). It sounds like something out of a science fiction novel. With data centers becoming “AI Factories”, we believe that this is the first inning of major technological changes and delve into the data center trends below.

Trends

Network Debottlenecking

Jensen Huang recently debuted NVIDIA’s vision for ‘AI Factories’: These data centers on steroids contain about 32k GPUs with 645 exaFlops of compute. Compute has gone up 1000x in a decade and as long as there are returns to higher compute,  demand will continue to increase. However, while GPUs have accelerated computation, other parts of the chain have not kept up: Namely data connectivity. This ranges from cabling, switches and specialized server chips to other components within the data center environment. Really anything that transmits data between servers and storage systems. Industry reports estimate that 30 percent of the time spent training a LLM due to network latency with the other 70 percent spent on compute. The very last bytes of data control when the next computation cycle begins, which means GPUs sit idle while data is being transmitted. This is compounded if data packets get dropped, called tail latency.  These are the problems that beset just the training phase of an LLM.

Other problems beset inference phase or when a model is put into production model (this is when an LLM can take queries and respond).  Each time a model is trained- all the parameters and their weightings need to be held in memory. The largest recommendation models require over 30TB of memory to hold these parameters and weightings. This is equivalent to about 8k HD movies. These parameters are held in memory in different servers and to be very rapidly recalled for inference mode. The longer the memory recall, the longer a query or task takes.

So what’s being done? Advancements in cabling and switches are mitigating networking issues, significantly enhancing data transfer capabilities. Notably, switch data rates have increased from 5Tbps in 2017 to 50Tbps currently, while transceiver rates have improved from 100G to 400G over the same period. The switch data rate determines the overall network capacity to handle data traffic, whereas the transceiver data rate specifies the speed of individual data stream transmission and reception. Furthermore, cabling technology has evolved, with most High-Performance Computing workloads utilizing Nvidia’s InfiniBand technology instead of Ethernet. InfiniBand enables low-latency data packet transfer with zero packet loss. Through Remote Direct Memory Access, InfiniBand achieves data transfer rates of up to 400 Gbps by directly transferring storage server memory to GPUs, bypassing intermediate steps that contribute to tail latency. Ethernet still remains the prevailing global standard in most data centers and industry collaborations are addressing its limitations, such as slow speeds and data packet loss, to develop innovative solutions.

Additionally, a new specialized CPU addressing networking needs is the Data Process Unit (DPU). Conventional CPUs are not optimized for high-speed data transfer, leading to the development of DPUs as special-purpose CPUs for offloading data processing tasks. This division of labor enables efficient resource allocation, with CPUs handling general-purpose computing, GPUs focusing on accelerated computing, and DPUs accelerating data processing.DPUs significantly reduce application latency by efficiently parsing, processing, and transferring data, thereby alleviating a substantial workload from CPUs. This results in a notable acceleration of High-Performance Computing workloads, with improvements of nearly 30%. The DPU market is currently dominated by industry leaders Nvidia and Broadcom, with newer entrants including Marvell and Kalray.

Power Infrastructure

Training an LLM like Chat GPT-4 with 1.7 trillion parameters is a substantial undertaking. It cost about $100 million and used about 60 GWhrs of energy, equivalent to the total energy usage of 70k homes for one month. These are staggering numbers. A single one of Nvidia’s DGX H100 systems will consume about 5 MWhrs annually and NVIDIA will produce about 500k of these in 2024. By 2027, AI power consumption could reach 200 Twhrs or 5% of US electricity needs, more than 2x the power needs of all the data centers in the US today. Finding utilities to power these data centers will prove difficult and other energy sources will be needed such as small modular nuclear reactors.  Currently, Oklo, led by CEO Sam Altman, is working on developing these modular reactors, which will initially be powered by High Assay Low Enriched Uranium developed by Centrus Energy. While the technology is available, regulatory frameworks will play a crucial role in determining how quickly energy can be brought online, potentially posing a significant obstacle to new data center builds.

Within the data center, to cope with the energy loads from GPUs, higher power equipment will be needed anywhere from Uninterruptible Power Supplies to batteries to switch gears to generators and dry type transformers.  The majority of this equipment will require upgrades. Below2 is an illustration of all the parts in the power chain from the grid to the data center rack. This perennial trend of higher power demands will only accelerate; data centers 20 years ago required only 10 MW and data centers today require over 100MW and it is not far-fetched to think that data centers in the near future will likely need greater than 1GW. Companies like Eaton, Hammond Power, and Powell Industries have benefited thus far from this electrical upgrade cycle.

DC 3
Data Center Power Distribution

Cooling

As more compute and networking is put in servers, power demand increases commensurately. And with high power comes high heat. The power usage of a single AI rack currently can be as high as 100 kilowatts and likely in a few years will exceed 200kw.  With GPUs drawing significantly more power, other methods than air cooling are needed. Newer techniques such as liquid cooling and immersion cooling which cool directly at the chip level will be implemented. A large benefit of liquid cooling is it also lowers the power consumption of servers by up to 20%. Liquid cooling also means no AC and no server fans, increasing reliability of data centers. A company poised to continue to benefit in this space is Vertiv. Other methods are being looked at to also increase the rate of heat transfer away from chips. This can be done by using a better conductor. Diamonds are one the best conductors in the planet and are being experimented at the substrate level. The faster heat can be drawn away from a chip, the cooler it can be kept and the higher loads it can run.

The Rise of Chiplets

Servers will see a significant assortment of chiplets moving forward. A chiplet is a set of specialized chips designed to work collectively, enabling a cheapern and more powerful product. The chiplet industry is currently in the early stages, particularly as the limitations of Moore’s Law become increasingly apparent. With semiconductor technology now advancing to 2nm levels, the expense associated with designing chips has escalated significantly. The design costs for a single 5nm integrated circuit can reach upwards of USD $550 million, marking an almost 2x increase from the previous 7nm generation3. This rise in costs encompasses all aspects of chip development, from validation to prototyping. Additionally, manufacturing these advanced chips is becoming prohibitively expensive, as the defect rate for monolithic chips—or single-chip solutions—increases markedly with the move to leading-edge technologies.

The emerging solution to these mounting costs and challenges is the adoption of chiplets. As the advancement of traditional monolithic chip designs slows, the demand for specialized chips that can efficiently interconnect is on the rise. New technologies such as Interposers allows for high connectivity between dies (beginning stages of a chip).New standards are also being developed such as the Universal Chiplet Interconnect Express (UCIE) , whose purpose is to standardize the communication protocols between chiplets, specifically focusing on die-to-die interconnects. Customers can focus on building out their core differentiator and can outsource the remainder of the chip design. Custom silicon can make AI workloads such as inference more efficient. This increases the need for communication as now all the parts in the chiplet need to talk to one another whereas before they were all on the same die.  Broadcom is the defacto player in this space having built Tensor Processing Units for Google for the last several years. Other competitors that have joined the fray are Marvell and Alphawave. We did a deep dive on Alphawave here.

Source:

1: https://www.iridian.ca/learning_center/optical-interconnects-for-data-centers-lighting-the-cloud-dup/

2. https://www.device42.com/data-center-infrastructure-management-guide/data-center-power/

3. https://semiwiki.com/