Your Laptop Isn’t Ready for LLMs—But That’s Changing Soon
Chances are, the PC you use at work today isn’t equipped to run AI large language models (LLMs) locally. Currently, most people access LLMs through online browser interfaces. More technical users might interact with these models via application programming interfaces (APIs) or command line interfaces. In all cases, the queries are sent to remote data centers where the models are hosted and executed. This setup generally works well—until it doesn’t. Data-center outages can leave models offline for hours. Additionally, some users hesitate to send personal data to unknown third parties.
Running AI models directly on your own computer could bring major advantages. It would reduce latency, allow the model to better understand your personal needs, and provide greater privacy by keeping your data on your device. However, most laptops that are even just a year old are not powerful enough to run useful AI models locally. These machines typically have four to eight CPU cores, lack dedicated graphics processing units (GPUs) or neural processing units (NPUs), and offer only around 16 gigabytes of RAM. This hardware is simply underpowered for running LLMs.
Even newer, high-end laptops with NPUs and GPUs can struggle. The largest AI models contain over a trillion parameters and require hundreds of gigabytes of memory. While smaller versions of these models exist and are widely available, they often lack the intelligence and capabilities of the larger models that only specialized AI data centers can handle. The challenge grows when considering additional AI features designed to enhance model performance. Small language models (SLMs) running on local hardware often reduce or omit these features. Tasks like image and video generation remain difficult to perform on laptops and have traditionally been limited to high-end desktop PCs.
This situation poses a significant barrier to wider AI adoption. To enable running AI models locally, laptops will need both hardware and software upgrades. This marks the beginning of a fundamental shift in laptop design, offering engineers the chance to rethink PCs from the ground up.
Your Laptop Isn’t Ready for LLMs: The Role of NPUs and Memory
One of the most straightforward ways to improve AI performance on PCs is to add a powerful neural processing unit (NPU) alongside the CPU. NPUs are specialized chips designed to handle the matrix multiplication calculations that AI models rely on. These operations are highly parallelized, which is why GPUs have traditionally been preferred for AI tasks in data centers. However, NPUs are more power-efficient than GPUs because they focus solely on these matrix operations, unlike GPUs that also handle 3D graphics.
This power efficiency is crucial for portable devices like laptops. NPUs also support low-precision arithmetic better than most laptop GPUs. AI models often use low-precision calculations to reduce computational and memory demands on portable hardware.
Laptops are being redesigned to better support AI workloads in several ways. First, NPUs are being integrated to accelerate AI model processing. Second, laptops are increasing both the capacity and speed of their memory to accommodate large models that require hundreds of gigabytes of RAM. Third, the traditional split memory architecture—where system memory and GPU memory are separate—is being reconsidered. Instead, memory is being pooled together with faster interconnects to meet AI’s heavy data demands.
Fourth, CPUs, GPUs, and NPUs are increasingly being combined onto a single silicon chip. This integration shortens the data path to shared memory and improves communication between processing units, although it may complicate maintenance. Finally, power management is becoming a priority. AI workloads can be intensive and continuous, such as Microsoft’s always-on Windows Recall feature. Power-efficient NPUs help laptops run these AI tasks without draining the battery excessively.
Microsoft’s Steven Bathiche explained that NPUs are designed around tensor data types, making them much more specialized than CPUs. Qualcomm’s Snapdragon X chip, which powers Microsoft’s Copilot+ features like Windows Recall and Windows Photos’ Generative erase, exemplifies this advancement. Qualcomm’s early lead sparked a performance race among AMD, Intel, and Qualcomm, with NPUs now delivering between 40 and 50 trillion operations per second (TOPS). Dell’s upcoming Pro Max Plus AI PC will feature a Qualcomm AI 100 NPU capable of up to 350 TOPS, a 35-fold increase over the best NPUs from just a few years ago. Experts believe NPUs capable of thousands of TOPS are only a few years away.
While the exact TOPS needed to run state-of-the-art models with hundreds of millions of parameters is unknown, it is clear that consumer hardware is rapidly approaching that capability. NPUs are also essential for other AI tasks like image generation and manipulation, which are difficult without specialized hardware.
Building Balanced AI Chips and Unified Memory Architectures
Faster NPUs will improve AI performance by processing more tokens per second, resulting in smoother and faster AI interactions. However, running AI locally requires more than just powerful NPUs. Chip designers must balance AI acceleration with traditional PC tasks. Mike Clark from AMD emphasized that PCs must remain efficient at low-latency, branching code, and handling smaller data types while also supporting AI workloads. The CPU still plays a critical role in preparing data for AI, so a weak CPU can bottleneck performance.
NPUs must also work alongside GPUs, which remain important for AI tasks. High-end GPUs like Nvidia’s GeForce RTX 5090 can deliver AI performance up to 3,352 TOPS, far exceeding current NPUs. However, these GPUs consume significant power—up to 575 watts for desktops and 175 watts for laptop versions—leading to rapid battery drain.
Intel and AMD agree that NPUs offer more efficient AI processing at lower power levels. This efficiency is vital because AI workloads often run longer than other demanding tasks, such as video encoding or graphics rendering. For example, AI personal assistants that are always listening require sustained, low-power operation.
Memory architecture is another crucial factor. Most modern PCs have separate memory pools for the CPU and GPU, a legacy design from over 25 years ago. This split architecture is inefficient for AI, which requires large models to be loaded entirely into memory and shared seamlessly between processing units. Moving data between separate memory pools increases power consumption and slows performance.
The solution is a unified memory architecture that allows the CPU, GPU, and NPU to access the same fast memory pool. Apple’s in-house silicon is a well-known example of this design, but unified memory is still rare in PCs. AMD is adopting this approach with its new Ryzen AI Max APUs, announced at CES 2025. These chips combine Ryzen CPU cores, Radeon GPU cores, and a 50 TOPS NPU on a single silicon die with unified memory supporting up to 128 GB accessible by all units. This design improves memory and performance management and allows better power control.
The Ryzen AI Max is already available in laptops like the HP Zbook Ultra G1a and Asus ROG Flow Z13, as well as in mini desktops from smaller brands. Intel and Nvidia are also collaborating on chips that combine Intel CPUs with Nvidia GPUs, likely featuring unified memory and Intel NPUs.
These advances promise to transform PC architecture by enabling larger memory pools and tighter integration of CPUs, GPUs, and NPUs. This integration will make it easier to assign AI workloads to the most suitable hardware at any moment. However, it will also make upgrades and repairs more difficult because these components will be physically inseparable on the mainboard, unlike traditional PCs where CPU, GPU, and memory can be replaced individually.
Microsoft’s AI Vision and the Future of PCs
While Apple’s macOS and Apple Silicon offer attractive interfaces and unified memory, Apple’s GPUs are not as powerful as the best PC GPUs, and its AI development tools are less widely adopted. Some AI professionals prefer PCs for AI work due to better GPU performance.
This opens an opportunity for competitors to become the preferred platform for AI on PCs, and Microsoft is actively pursuing this. At its 2024 Build developer conference, Microsoft launched Copilot+ PCs, which include AI features like Windows Recall. Despite some initial issues, this launch pushed the industry toward adopting NPUs, with AMD and Intel releasing new chips with enhanced NPUs later in 2024.
At Build 2025, Microsoft introduced Windows AI Foundry Local, a runtime stack featuring a catalog of popular open-source large language models from companies such as Alibaba, Meta, Nvidia, OpenAI, and more. Windows can execute AI tasks locally by automatically directing workloads to the CPU, GPU, or NPU best suited for the job. AI Foundry also offers advanced developer tools for customizing AI models and on-device semantic search.
Steven Bathiche described AI Foundry as a smart system that efficiently uses all available processors and prioritizes workloads across CPU, NPU, and other units, with room for future improvements.
The rapid evolution of AI-capable PC hardware represents a major shift. It is likely to replace PC architectures designed decades ago with new designs focused on AI. The combination of powerful NPUs, unified memory, and advanced software optimizations is closing the gap between local and cloud-based AI faster than expected.
Chip designers are moving toward highly integrated chips that combine CPU, GPU, and NPU with unified memory, even in laptops and desktops. AMD’s Mahesh Subramony envisions users carrying “a mini workstation in your hand” capable of handling AI workloads without needing the cloud.
This transformation will take time but signals a strong commitment from the PC industry to reinvent everyday computers for AI. Qualcomm’s Vinesh Sukumar even envisions affordable consumer laptops achieving artificial general intelligence (AGI) in the near future. He stated, “I want a complete artificial general intelligence running on Qualcomm devices. That’s what we’re trying to push for.”
For more stories on this topic, visit our category page.
Source: original article.
