In 2023, a single product became the most sought-after commodity on the planet — not oil, not rare earth minerals, but a chip. NVIDIA's H100 GPU, priced at $30,000–$40,000 each, became the bottleneck through which the entire AI industry had to pass. That scarcity was not accidental. It was structural.
How the Monopoly Was Built
NVIDIA's dominance in AI compute did not emerge from a single decision. It was constructed over a decade through three interlocking advantages: CUDA lock-in, manufacturing partnerships, and first-mover timing on the deep learning transition.
CUDA — NVIDIA's proprietary parallel computing platform — was introduced in 2006, long before AI was commercially relevant. By the time deep learning exploded in 2012 with AlexNet, the entire research community had already built their tooling, frameworks, and institutional knowledge around CUDA. Switching costs had compounded invisibly for six years.
The CUDA Moat
PyTorch, TensorFlow, and every major ML framework are CUDA-native. The global talent pool of ML engineers was trained on CUDA. PhD students wrote dissertations assuming NVIDIA hardware. By 2020, AMD's ROCm and Intel's oneAPI offered technically viable alternatives, but the ecosystem gap had grown too wide for most practitioners to justify crossing.
Scarcity as Leverage
The H100 shortage of 2023–2024 demonstrated something important: even when NVIDIA's chips were available, their allocation decisions had power implications. Which cloud providers got priority? Which frontier labs got early access? Which countries were restricted by export controls?
These allocation decisions — made by a single private company — determined who could train frontier models and on what timeline. That is not a market outcome. It is a power structure.
The Durability Question
The obvious challenge to NVIDIA's position is custom silicon. AWS Trainium, Google TPUs, Microsoft's Maia, Meta's MTIA — every hyperscaler is investing in proprietary chips to reduce NVIDIA dependency. But these chips face the same CUDA ecosystem problem in reverse: they require custom software stacks, optimized kernels, and retraining of inference pipelines.
The transition is happening, but slowly. For frontier model training — the highest-stakes, highest-value use case — NVIDIA remains structurally irreplaceable in the near term. That window is likely 3–5 years, not indefinite.
Strategic Implications
For investors: compute scarcity is a sustained structural advantage, not a temporary supply chain issue. NVIDIA's pricing power will remain elevated as long as CUDA lock-in persists.
For operators: over-indexing on NVIDIA creates sovereign and supply chain risk. The hyperscalers building custom silicon are hedging correctly — but the timeline for parity is longer than most assume.
For policymakers: compute allocation decisions by a single company constitute critical infrastructure governance in all but name. The policy frameworks for this are decades behind the reality.