Info Image

Is There a Place for CPUs in the GenAI World?

Is There a Place for CPUs in the GenAI World? Image Credit: Sashkin/BigStockPhoto.com

The role of GPUs and CPUs in Generative AI use cases for scalable enterprise applications

In a candid admission to the U.S. Senate, OpenAI's CEO, Sam Altman, highlighted a significant AI challenge today’s companies face: a severe graphics processing unit (GPU) shortage. Altman said, "We're so short on GPUs that the fewer people who use the tool, the better." This raises a vital question: Are we overlooking the potential of Central Processing Units (CPUs) in generative AI (GenAI)? Additionally, do all companies truly need the full range of capabilities offered by large language models (LLMs) like ChatGPT that have billions of parameters?

GPUs and GenAI seem like a perfect match

GPUs were initially designed for rendering graphics in video games and animations. More recently, GPUs have taken on a new role, which is supporting AI tasks. GPUs possess an inherent ability to divide large tasks into smaller ones and execute them in parallel. This capability, known as parallel computing, enables GPUs to handle thousands of tasks simultaneously. This characteristic makes GPUs especially suited for deep learning and reinforcement learning applications, such as facial recognition, object identification, and critical autonomous driving functions like recognizing stop signs and navigating obstacles.

LLMs, like GPT-3 and GPT-4, are essential for AI applications such as virtual assistants or chatbots that utilize intricate models containing billions of parameters. Fine-tuning these parameters requires vast dataset processing at high speeds. In this context, GPUs play a pivotal role. They not only accelerate complex training phases, including forward and backward propagation, but their proficiency in parallel processing ensures that these massive LLMs are continually optimized and updated efficiently.

A perfect storm: The long roots of the GPU shortage

OpenAI's ChatGPT introduction last year ignited a surge of public interest in AI and amplified GenAI’s immense capabilities. ChatGPT continues to gain popularity because it can produce coherent, contextually relevant responses that are often indistinguishable from human text responses. As a result, there has been a rush across industries to integrate GenAI in every application and solution, and almost every startup is now claiming to be an AI company. This has led to an unprecedented demand for high-end GPUs. Despite manufacturers operating at full capacity, the growing demand has outpaced the supply. Several experts predict that this shortage may persist for a few years. Significant efforts are being made to explore alternatives to GPUs, including the use of Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), and CPUs.

Most companies have bypassed the need for large and fixed data centers by utilizing cloud computing services from leading providers such as Google, Microsoft, and Amazon. These services offer access to specialized AI chips and significant computational power with seemingly limitless scalability. However, with the newfound focus on AI, the wait times to acquire and use GPU chips have, in some instances, extended to a year. For companies willing to bypass the wait and pay a premium, the inflated costs lower return on investment (ROI). The GPU shortage is also complicating their AI integration efforts, as they grapple with prolonged project timelines and difficulty procuring necessary computational assets.

A recent New York Times article highlighted the extraordinary measures companies are taking to secure the compute they need for their AI-powered initiatives. As both affluent nations and top-tier corporations scramble for these resources, GPUs have become as sought-after as rare earth metals.

The Cost of GPUs

The extraordinary capabilities of LLMs, like GPT-3, equipped with 175 billion parameters come with a hefty price tag. With an estimated training expenditure of $4.6 million, GPT-3 stands as a testament to the rising costs of cutting-edge AI. Such prohibitive budgets have created a chasm in the AI world, placing state-of-the-art LLMs within the exclusive grasp of industry behemoths and affluent institutions in the Global North. Meanwhile, many startups, and entities without the luxury of supercomputers or substantial cloud credits find themselves sidelined.

An industry analyst told The Information that it might cost OpenAI about $700,000 a day to keep ChatGPT running because it uses high-end servers. This shows just how expensive it is to operate big AI models. The significant operational costs make many companies think twice about how they use AI and where they spend their money. The scale of computational prowess required for ChatGPT to effectively answer queries emphasizes the profound cost differentials in the GPU-driven AI landscape, challenging many enterprises to rethink their strategies and resource allocations. For the first time in history, many enterprises might need to redefine their business and operational strategies because not long ago, GenAI-driven capabilities seemed to be several years away.

Can CPUs be used for GenAI?

Both CPUs and GPUs have distinct computational strengths that play vital roles in generative AI. Designed as general-purpose processors, CPUs excel in handling diverse tasks, especially those that are sequential. They efficiently manage the initial stages of AI processes such as data ingestion, cleaning, and basic processing. In fact, around 65% to 70% of generative AI tasks, including Machine Learning Ops (MLOps) and pipeline management, are driven by CPUs.

Source: ZERO Systems

Even recent frameworks, like Microsoft Semantic Kernel and LangChain, effectively utilize both CPUs and GPUs, aiming for optimal performance within controlled cost environments. The evolution in CPU technology, especially with features like deep learning acceleration has bolstered its place in AI, as highlighted by the successful training of models like Stable Diffusion (a deep learning, text-to-image model) and the Hugging Face BERT model (designed to pre-train deep bidirectional representations from unlabeled text).

In the highly regulated legal industry, early adoption of AI centers around two pivotal metrics: The extent to which AI can automate a specific task and the inherent value derived from this automation. Ropers Majeski, an international law firm, exemplifies the practical use of CPUs in real-world generative AI applications.

What’s the best solution for enterprises looking to deploy GenAI solutions today?

Navigating the GenAI ecosystem requires informed decisions on computational choices and requirements. These decisions must consider both operational and application needs in a balanced and well-articulated architecture. While GPUs have been the go-to chips for data-intensive tasks, the capabilities and flexibility of modern CPUs should not be underestimated. The distinction between these two affects the hardware selection, deployment efficiency and ROI.

Beyond computational power, the holistic architecture involving networking, memory, and the ability to group compute in clusters is vital. The optimal approach might involve harnessing the strengths of both GPUs and CPUs for AI training while capitalizing on CPUs for efficient inference.

Today’s enterprises want to reduce AI operational costs and increase power efficiency by leveraging CPUs, especially for specific inference tasks and application architecture environments. Although CPUs will not always replace the role of GPUs in LLM training, they present a promising avenue for the cost-effective deployment of pre-trained models. Embracing this dynamic not only stands to make generative AI more accessible but also challenges CPU manufacturers to foster strategic alliances, target research and development, and ensure seamless integration with the current AI stack, from data to inference and back.

NEW REPORT:
Next-Gen DPI for ZTNA: Advanced Traffic Detection for Real-Time Identity and Context Awareness
Author

Catalin is the Head of Enterprise Architecture at ZERO Systems, where he leads the development and operation of end-to-end AI platforms and solutions that drive business value in fast-paced technology and GenAI environments. 

With over 30 years of experience in enterprise architecture and business strategy, including significant roles as Chief Architect and Head of Innovation at global corporations like Microsoft, AWS, and Marriott International, Catalin specializes in fostering innovation and crafting cutting-edge technology strategies to enhance enterprise productivity, security, compliance, and revenue growth.

PREVIOUS POST

Push to Eliminate 'Digital Poverty' to Drive Demand for Satellite-Powered Broadband Connectivity Post Pandemic