
Estimated Read Time: 15 minutes
On November 18, 2025, Google unveiled Gemini 3 to the world, and the AI community collectively gasped. Here was a model that could reason through complex problems, juggle a million tokens of context, and seamlessly process text, images, audio, and code, proving once again that scaling laws remain the North Star of AI progress.
But the real story wasn't just what Gemini 3 could do. It was how Google pulled it off.
While the industry has rightfully celebrated Nvidia's incredible GPU innovations that powered the AI revolution, Google chose to explore a parallel path. Their approach? A room-sized liquid-cooled beast called Ironwood, custom silicon that delivers the computing power of 24 supercomputers in a single machine.
Part I: The Hardware Arms Race Nobody Saw Coming
The Origins: When Google Saw the Future
The story of the Google TPU begins not with a breakthrough in chip manufacturing, but with a realization about math and logistics. Around 2013, Google's leadership, specifically Jeff Dean, Jonathan Ross (now CEO of Groq), and the Google Brain team, ran a projection that alarmed them. They calculated that if every Android user utilized Google's new voice search feature for just three minutes a day, the company would need to double its global data center capacity just to handle the compute load.
At the time, Google was relying on standard CPUs and GPUs for these tasks. Everyone in AI was fighting over the same Nvidia GPUs. Prices were through the roof, wait times were measured in months, and if you weren't a tech giant, good luck getting your hands on the latest hardware.
Google looked at this situation and made a decision that seemed crazy at the time: build their own chips specifically for AI. Not general-purpose processors that could also do AI, but chips that did nothing BUT artificial intelligence calculations. The team went from design concept to deploying silicon in data centers in just 15 months. By 2015, TPUs were already powering Google Maps, Google Photos, and Google Translate before the world even knew they existed. Google publicly announced the TPU at I/O 2016.
Meet Ironwood: The Monster Nobody Talks About
The Ironwood chip (officially the TPU v7) isn't just another incremental upgrade. It's what happens when you ask the question 'what if we built silicon that only cared about one thing: training massive AI models?'
While the industry was distracted by the GPU shortage, Google quietly iterated. Ironwood represents the seventh generation of this lineage, and the specs are terrifying:
Each chip cranks out 4,614 teraflops of raw compute
192 GB of memory per chip (enough to hold about 40 HD movies in RAM)
Memory so fast it can read the entire contents of Wikipedia in about a second
9,216 chips wire together into a 'pod' delivering 42.5 exaflops

TPUs for Non-Engineers: What's Actually Happening
At its core, a neural net is just a machine for doing enormous numbers of weighted sums. For those of us who are investors rather than AI engineers, it helps to reduce neural nets to a simple formula: Inputs x Weights = Outputs.
Take handwritten digit recognition as an example. A 28x28 grayscale image equals 784 numbers. An '8-detector' neuron holds 784 weights, one for each pixel. The core operation is multiplying each pixel by its weight, adding everything up, and comparing the score against other digits. Scale this up to modern AI models and you're doing trillions of these 'matrix multiplication' (matmul) operations.
So how do CPUs, GPUs, and TPUs differ in handling these operations?
A CPU is like one very smart and capable worker, good at anything but slow at repetitive jobs. Consumer CPUs have up to 16 cores, data center chips up to 128.
A GPU is like thousands of generalized workers. Nvidia's Blackwell B200, for example, has 192 streaming multiprocessors (SMs), each with 4 tensor cores and 128 CUDA cores, totaling 768 and 24,576 physical cores respectively.
A TPU is like an automated factory line built specifically for one task, with machines arranged so the work has almost no back and forth. TPUs dedicate nearly the entire silicon area to crunching matmuls.

The Systolic Array: TPU's Secret Weapon
The architectural driver behind the TPU is the systolic array. The array is made up of MAC cells (multiply-accumulate), which are ALUs that don't accept instructions, just data. They don't have to choose between hundreds of possible instructions, they just multiply and accumulate. This saves space by eliminating instruction caches and essentially eliminating data caches as data flows through the array in a predetermined fashion.
This is why TPUs can offer very high performance per watt and per dollar on machine learning, but are essentially useless outside that domain. You get extremely high throughput on the linear algebra that defines neural nets, which means lower power per operation and smaller die area for a given level of ML performance. The tradeoff is much less flexibility than a GPU.

Part II: The 3D Donut That Changed Everything
How Google Broke the Rules of Computer Design
Here's where Google's approach gets genuinely fascinating, and a bit weird. While Nvidia built increasingly sophisticated switches to connect their GPUs (imagine a super-smart traffic controller directing data), Google went in a completely different direction.
They arranged their TPUs in what engineers call a '3D Torus,' essentially a three-dimensional donut where each chip talks directly to its six nearest neighbors. Picture it like this: You're at a massive party with 64 people arranged in a perfect cube formation. Instead of everyone shouting at a DJ in the middle, each person whispers directly to the six people closest to them: up, down, left, right, front, back.

It sounds chaotic, but it's actually brilliant. Google arranges 64 TPUs in each rack as a 4x4x4 cube, all connected with high-speed copper cables and cooled by liquid. The beauty is that if one chip fails, messages just route around it, like that party continuing even if someone steps out for fresh air.
The Invisible Conductor: Why Software is the Secret Weapon
However, arranging chips in a 3D donut only works if you know exactly where every piece of data is going before it leaves the station. This is where Google's 'compiler magic' (specifically the XLA compiler) comes in.
Think of a Nvidia GPU network like a fleet of Uber drivers: they are smart, autonomous, and can navigate dynamic traffic jams on the fly. Google's TPU system is more like a high-speed train network. The compiler maps out the entire schedule in advance. It knows that at exactly 12:00:01, Chip A will pass a tensor to Chip B.
Because the movement is deterministic, Google can strip out all the hardware logic required for 'traffic control' and use that silicon real estate for more math. It's harder to program, but once the train leaves the station, nothing moves faster.
Google now runs clusters with over 10,000 TPUs working together. Amazon liked this approach so much they adopted similar principles for their Trainium2 chips. Word on the street is that Meta, Microsoft, and even OpenAI are exploring similar designs.
Part III: Two Roads to Rome - Why Architecture Matters
Imagine you're trying to coordinate 10,000 people to solve a massive puzzle. You have two options:
Option 1 (Nvidia's Way): Build an incredibly sophisticated command center with the world's best coordinators. Every piece of information flows through this center, which instantly routes it to exactly where it needs to go.
Option 2 (Google's Way): Forget the command center. Arrange everyone in a giant 3D grid and have them pass messages to their neighbors. It takes more hops to get a message across, but you can keep adding people to the grid almost infinitely.
Here's the kicker: both approaches are right. They're just solving different problems.

Part IV: The Beautiful Truth About Scaling Laws
Back in 2020, if you'd told AI researchers that the path to smarter AI was simply 'make it bigger and feed it more data,' they'd have laughed you out of the room. Surely intelligence required clever algorithms, sophisticated reasoning systems, something more... elegant?
Nope. Turns out, if you make a model 10x bigger and feed it 10x more data, it gets predictably, reliably smarter. Every. Single. Time.

This discovery, what we now call 'scaling laws,' is like finding out that the secret to making a better cake isn't some exotic ingredient, it's just using more of everything in the right proportions. It's almost embarrassingly simple.
Google's Gemini 3, running entirely on TPUs with their unique 3D donut architecture, proves the recipe works regardless of the kitchen. November 18, 2025 wasn't just a product launch. It was validation that we're still on the exponential curve.
Part V: The $200 Million Training Run
Let's talk about the elephant in the room: cost.
Google's training run for Gemini 3 consumed an estimated 10^26 floating-point operations. That's:
100,000,000,000,000,000,000,000,000 calculations
Enough compute to run your laptop continuously for 10 billion years
About $200 million in pure computational costs
Enough electricity to power San Francisco for a month

But here's the thing: in the grand scheme of Google's ambitions, $200 million is a bargain. This isn't just about having the best chatbot. This is about fundamentally changing how computers understand and interact with the world.
The Memory Mountain
Raw compute is only half the story. The dirty secret of AI training is that moving data around, not computation, is often the bottleneck. It's like having a Ferrari engine in a car with bicycle wheels.

Google solved this with an audacious amount of memory. A full Ironwood pod contains 1.77 petabytes of total memory, enough to hold the entire English Wikipedia 250 times over. In RAM. Ready for instant access.
Part VI: Why This Changes Everything
Here's something wild that nobody talks about: building one of these TPU super-pods is so difficult that only one company in the world has truly mastered it. Celestica has spent five years perfecting the art of cramming 64 liquid-cooled TPUs into a single rack configured as a 3D cube.
Think about what that actually means. You're connecting 64 chips that each generate as much heat as a space heater, cooling them with liquid (pray for no leaks), and wiring them together in a perfect three-dimensional grid where every connection has to work flawlessly.
Amazon tried to replicate this for their Trainium2 chips. Best they could do? 32 chips per rack, half of what Google achieves. That's how hard this is.
The Efficiency Revolution
Here's what really matters to users: Gemini 3 doesn't just perform better. It's cheaper to run. Early users report inference costs 50% lower than alternatives. This isn't just about saving money; it makes previously impossible applications suddenly viable.
Part VII: Who's Actually Using TPUs
The list of companies betting on Google's TPUs reads like a who's who of AI innovation. Anthropic, the company behind Claude, announced a landmark expansion in October 2025: access to up to one million TPU chips, a deal worth tens of billions of dollars that will bring over a gigawatt of compute capacity online in 2026. Anthropic chose TPUs due to their price-performance and efficiency.
But Anthropic isn't alone. Current TPU customers include Midjourney (the AI image generation company), Salesforce, Safe Superintelligence (the startup founded by OpenAI co-founder Ilya Sutskever), Figma, Palo Alto Networks, and Cursor. The numbers tell a compelling story: more than 60% of funded generative AI startups and nearly 90% of gen AI unicorns use Google Cloud's AI infrastructure, including Cloud TPUs.
Notably, even Anthropic maintains what they call a 'diversified compute strategy,' efficiently using three chip platforms: Google's TPUs, Amazon's Trainium, and Nvidia's GPUs. This multi-platform approach, they say, ensures they can continue advancing capabilities while maintaining strong partnerships across the industry. It's a recognition that different hardware excels at different tasks.
The Double-Edged Sword: Optimization vs. Freedom
There is, however, a catch to this efficiency. Achieving these results on TPUs often requires deep optimization using Google's JAX or TensorFlow frameworks. Unlike CUDA code, which can be ported relatively easily between different clouds, highly optimized TPU code has a high 'stickiness' factor.
For a startup, this creates a dilemma: Do you build on Nvidia GPUs to keep your cloud options open, or do you commit to Google's TPUs for the 30-40% cost efficiency? Anthropic's strategy of maintaining codebases for both suggests that for the biggest players, the 'portability tax' is a price worth paying to avoid being locked into a single infrastructure provider.
Part VIII: The Software Ecosystem Angle
Hardware is only half the equation. TPUs are tightly integrated with Google's TensorFlow and JAX frameworks, creating a vertically optimized stack where hardware and software are designed together. This tight coupling enables optimizations that wouldn't be possible with general-purpose chips.
Compare this to Nvidia's approach. The CUDA ecosystem, built over more than a decade, represents a fortress of developer tools, frameworks, and community support. PyTorch, the most popular deep learning framework, runs beautifully on CUDA. Thousands of AI libraries are optimized for Nvidia's architecture. For many developers, switching away from CUDA would require hundreds of hours of testing and rewriting code.
This creates an interesting dynamic. TPUs offer deep optimization for specific workloads within Google's ecosystem, while GPUs offer flexibility and portability across virtually any environment. For hyperscalers with predictable workloads and the engineering talent to optimize for specific hardware, TPUs can deliver significant advantages. For the broader market of researchers, startups, and enterprises who need flexibility, Nvidia's ecosystem remains the default choice.
Part IX: Investment Implications - The Competitive Landscape
The Market Narrative Shift
While the writing has been on the wall for some time, the market's perception of Google has dramatically reversed over the past several months, transforming from an 'AI loser bleeding its search dominance' to a stalking horse destined to undercut the most consensus AI winners.
This sentiment has only accelerated with the release of Gemini 3. Not only is Google now firmly positioned at the cutting edge of frontier models, but it's doing it on its own terms, with TPUs. It is indeed possible to train a frontier model without Nvidia, that is, if you're an ML-pioneering hyperscaler who has spent the past 10 years developing and optimizing for custom silicon.
What This Means for Nvidia
The announcements that both Anthropic and Meta are planning to implement TPU chips raise questions about Nvidia's dominance. Surprisingly, Meta reportedly wants TPUs for training, not just inference. But there are two ways to think about this:
Switching costs for architecture/ecosystem is high for AI accelerators. Gemini 3 should be bullish for compute demand as labs locked into Nvidia rush to buy more next-gen chips to compete.
Nvidia's margins are so high that any competitive threat must be aggressively priced in, even if it means higher volumes.
In other words, isn't a breakthrough model broadly bullish for compute demand? The reality is that CUDA and PyTorch are still dominant today and will probably mean Nvidia continues to see strong demand in the near term regardless of Google's TPU plans. Keep in mind, despite its use of TPUs, Google is still one of Nvidia's largest customers.
It's probably quite bullish for the supply chain. Both Google and Nvidia are going to be fighting for the exact same CoWoS capacity at TSMC, which will remain the rate-limiting factor.
One key thing to emphasize: switching from GPUs to TPUs means learning a new language, switching ML frameworks from PyTorch to JAX. Switching software environments is non-trivial, disruptive, and expensive. Google has spent ten years developing internal workflow, physical infrastructure, and model architecture around TPUs. For others, particularly smaller players, the learning curve will be far steeper.
The custom AI chip market is heating up, but the picture is more nuanced than simple competition. The AI data center total addressable market is expected to grow roughly 5x, from $242 billion in 2025 to more than $1.2 trillion by 2030. That's a rising tide that lifts all boats.
Every major hyperscaler is developing custom silicon: Google has TPUs, Amazon has Trainium and Inferentia, Microsoft is working on Maia chips, and Meta is building MTIA accelerators. OpenAI recently announced plans to work with Broadcom on custom ASICs starting in 2026. As more customers develop custom chip options to handle the diversity of training and inference workloads, Nvidia's share may normalize toward 75% from over 85% currently. But any share changes are likely to be gradual in nature.
Why gradual? There are significant advantages to merchant GPU chips: off-the-shelf availability, multi-cloud portability, Nvidia's full stack of software and developers, and a larger addressable market with sovereign and enterprise on-premise customers who don't have the expertise to build custom chips. The tight supply chain and Nvidia's scale advantages make it difficult to capture share quickly since not enough components are available in the near to medium term.
Here's the key insight: custom chips can be lower-cost for a specific range of internal workloads, which suits customers with large internal workloads such as Google and perhaps Meta. However, they are less useful in a public cloud such as Microsoft Azure or Amazon Web Services, or the 100+ neoclouds where intense levels of flexibility are required. This is why even Google uses GPUs in its public cloud offerings alongside TPUs.
The numbers support this coexistence. AWS claims Trainium offers 30-40% cost savings versus Nvidia GPUs for specific workloads. But AWS also fills its data centers with Nvidia GPUs to meet demand from AI customers like OpenAI. Amazon's strategy is about providing cheaper alternatives and reducing supply chain dependency, not replacing Nvidia. Microsoft's CTO Kevin Scott told CNBC that up to this point, Nvidia has offered the best price-performance, and they're willing to use whatever meets demand.
Custom ASICs make economic sense for hyperscalers with massive, predictable workloads. But designing a custom ASIC requires tens of millions in upfront costs, something only the largest companies can afford. For everyone else, Nvidia's GPUs remain the accessible, flexible, and proven choice. The market is simply large enough for multiple approaches to thrive.
Part X: The Google Cloud Moat
TPUs represent arguably Google Cloud's biggest competitive advantage for the next decade. Unlike Nvidia GPUs, which are available across AWS, Azure, and virtually every cloud provider, TPUs are only available through Google Cloud. This exclusivity creates a powerful lock-in effect.
Google's cloud business reported operating income of $2.8 billion in Q2 2025, more than double the amount from the same quarter the previous year. The Anthropic deal alone represents a powerful validation of TPUs that could attract more companies to try them. As one analyst put it: 'A lot of people were already thinking about it, and a lot more people are probably thinking about it now.'
The Crack in the Walled Garden?
Internally, Google has formed a more sales-oriented team to push TPUs, a significant cultural shift for an engineering-first organization. There is also intense ongoing debate about whether to keep TPUs exclusive to Google Cloud or to license them externally.
Rumors suggest Google may soon offer TPU access through smaller 'neo-cloud' providers. If true, this would be a strategic pivot designed to alleviate antitrust pressure while standardizing the TPU architecture across the industry, much like they did with Kubernetes. But for now, if you want Ironwood, you go to Google.
Will Google Become a Merchant TPU Vendor?
Google faces three strategic options for TPU distribution:
Keep TPUs for themselves as a competitive advantage
Rent out TPUs via GCP (which they already do)
Become a merchant TPU vendor and alternative to Nvidia
The first alternative, full in-house proprietary advantage, risks under-monetizing an incredibly valuable asset (remember, Nvidia is worth more than Google today) and has already been contradicted by chip sales to Anthropic, Meta, and Fluidstack.
However, if Google decides to become a selective merchant vendor, shipping TPUs into neoclouds, sovereigns, and selective hyperscaler data centers like Meta, they would generate revenue and gain presence without committing to infrastructure. In a world where stocks get hit every six months on capex fears, this might be a good way of hedging. Selling TPUs is essentially a pure margin game with variable production that can be ratcheted down. This shifts risk and produces optionality.
In a somewhat reflexive way, the market's reaction here may force Google to sell TPUs. If they don't break out TPUs as merchant revenue and data center/power constraints cap GCP capacity growth, the narrative will become a refusal to monetize the silicon while being subjected to an infrastructure bottleneck they do not control.
Back in September, Google reportedly approached Crusoe, CoreWeave, and Fluidstack to give them access to TPUs. Fluidstack, for example, has a deal where it deploys TPUs in a Terawulf-leased data center in NYC, with Google providing a $3.2 billion backstop and taking 14% equity. They're using the 'franchise' model: you find the power and build the shell, we will drop in our TPUs.
Part X.5: The TPU Supply Chain
If a TPU scale-up is on the horizon, who else stands to benefit? The direct TPU supplier basket has been among the best performing AI infrastructure groups year-to-date, with the majority of outperformance beginning around August 2025.
One thing to emphasize: there's a huge amount of overlap in the Nvidia and Google supply chains, and the same overall capacity constraints haven't changed. To the extent that two giants are competing for the same production capacity, it should benefit the pricing and bargaining power of these supply chain players.

Key Supply Chain Winners
Advanced Packaging: TSMC, ASE, and Amkor should do well as competition heats up. Memory (HBM) remains the bottleneck, with SK Hynix, Samsung, and Micron supplying memory in that order.
Optical Networking: Google's internal TPU forecast for the next cycle has been revised from roughly 2 million units to approximately 4 million. That step-up in TPU V7 servers is being viewed as a key driver behind a jump in 1.6T optical module demand from roughly 3 million modules in 2025 to around 20 million in 2026, with Google alone consuming between 6-10 million of those via TPU V7 racks.
MEMS-Based Optical Switches: Unlike GPUs that use electrical packet switches (InfiniBand/Ethernet), Google uses MEMS-based optical mirrors to physically redirect light (called OCS or Optical Circuit Switches). This eliminates the need for power-hungry electrical transceivers at the switch level. Google has increasingly partnered with Lumentum (LITE) to mass-produce these switches, specifically the Lumentum R300 and newer R64 series.
PCB Makers: TPU boards are denser with proprietary interconnect routing, benefiting niche high-end PCB makers such as Isu Petasys, TTM Technologies, and Unimicron.
Part XI: Energy Efficiency - The Sustainability Angle
One of the most pressing challenges in AI compute is energy consumption. Training a large foundation model today can consume hundreds of megawatt-hours of electricity. According to the International Energy Agency, electricity consumption for accelerated servers (GPUs, TPUs, and ASICs) is projected to grow by 30 percent annually. By 2030, these accelerated servers will account for almost half of the net increase in global data center electricity consumption.
The 10 Megawatt Problem
The efficiency of the TPU isn't just about saving money; it's about physics. A single Ironwood pod consumes roughly 10 MW of power. To put that in perspective, a typical legacy data center rack draws 10-20 kilowatts.
You cannot simply roll an Ironwood pod into a standard data center; it would melt the fuses and overwhelm the cooling instantly. These machines require bespoke facilities built around the hardware. This physical constraint, finding locations that can actually supply and cool 10 MW densities, is perhaps the biggest bottleneck of the AI era, and one that vertically integrated players like Google are uniquely positioned to solve.
This is where TPUs shine. Google's Ironwood (TPU v7) is reportedly nearly 30x more energy-efficient than the first-generation TPU. In cloud computing environments, TPUs demonstrate 30-40% better energy efficiency than GPUs when accounting for the entire system infrastructure. TPU v4 setups are estimated to reduce overall costs by 20-30% compared to similar GPU deployments, thanks to lower power consumption, reduced cooling needs, and lower maintenance expenses.
The efficiency advantage stems from specialization. High-end GPUs consume between 300-700W during operation, with some reaching up to 1,000W. TPUs, designed exclusively for tensor operations, can achieve comparable AI performance with significantly lower power draw. As one industry insider noted: 'TPU v6 is 60-65% more efficient than GPUs, prior generations 40-45%.'
Because Google designs the TPU, the server rack, the cooling system, and the data center itself, they can optimize everything holistically. The cooling flow matches the exact thermal profile of the chip, squeezing out efficiency gains that are impossible with off-the-shelf hardware. The shift to TPUs and high-end GPUs has made water cooling a necessity rather than a luxury, as air alone can no longer dissipate the heat generated by modern silicon.
Conclusion: The Future Has Multiple Paths
The story of Gemini 3 isn't about one company or one type of chip winning. It's about how the entire industry is pushing forward together, each taking different paths up the same mountain.
Google's November 18, 2025 release proves what many of us suspected: there's more than one way to build the future. Nvidia provides the versatile, powerful tools backed by an unmatched software ecosystem that democratized AI development for researchers, startups, and enterprises worldwide. Google shows the benefits of vertical integration and specialization for hyperscale workloads. Amazon, Microsoft, and others are finding their own paths.
This diversity isn't just healthy. It's essential. Competition drives innovation. Different approaches reveal different insights. Custom ASICs push the boundaries of efficiency while GPUs push the boundaries of flexibility. And ultimately, we all benefit from the race to make AI more capable, more efficient, and more accessible.
The next time someone asks you how Google caught up in the AI race, tell them this: They didn't just build a better model. They built a better way to build models. And in the long game of AI, that might matter even more.

Ben Pouladian
Ben is a tech investor and AI infrastructure enthusiast with a degree in Electrical Engineering from UC San Diego. Drawing on his technical background, Ben has long championed the transformative power of GPU computing and Nvidia's crucial role in the AI revolution. He writes about the intersection of hardware innovation, scaling laws, and the exponential progress of artificial intelligence.
Ben believes that hardware diversity, from Nvidia's versatile GPUs to Google's specialized TPUs, strengthens the entire AI ecosystem and accelerates our path toward AGI.
Follow Ben for more insights on AI compute, scaling laws, and the hardware powering our AI future.
👉 Connect with Ben on LinkedIn

