The Gorilla Game
Introduction
Life often comes full circle before we even realize it. After a decade in the semiconductor industry, working on firmware and software for networking, storage, and mobile chips, I saw software engineers treated as second-tier citizens. Around 2012, a colleague in hardware security encouraged me to join a FAANG company. We had many discussions about career prospects, especially as Broadcom was winding down its multimedia and communications chipsets. He joined Nvidia, while I took a job at Amazon.
Fast forward a decade, and I now work at a startup developing advanced AI/ML solutions for the medical field, interpreting complex medical data at scale. We partner with many tech giants, including Nvidia. This collaboration with a semiconductor company, albeit at a higher software stack level, brings back memories. Is Nvidia just a semiconductor company? While not a direct competitor, Nvidia GPU Cloud (NGC) seems to be encroaching on AWS/GCP cloud territory. This reminded me of the book “The Gorilla Game” by Geoffrey Moore and Paul Johnson.
In this post, I will share my views on how this is unfolding in the Machine Learning (ML) and Artificial Intelligence (AI) space, focusing on the dynamics between semiconductor/chip companies like Nvidia and cloud providers like AWS and Google Cloud Platform while applying the Gorilla theory.
What is a Gorilla?
Gorillas are dominant companies that leverage their position to maintain market leadership and profitability. These companies excel by adopting disruptive technologies, setting new market standards, and significantly influencing market trends. Examples include Intel, Oracle, SAP, and Cisco. As these gorillas dominate their industries, they expand into adjacent markets, increasing competition and market disruption. This leads to innovation, strategic alliances, and shifts in market dynamics, reshaping the competitive landscape.
Market Leadership and Dominance
Nvidia's GPU Stronghold: Nvidia has carved out a dominant position by becoming the undisputed "gorilla" in the development of Graphics Processing Units (GPUs), crucial for training complex ML models. Further solidifying their position, Nvidia's CUDA platform has become the industry standard for parallel computing, that is perfectly suited for handling the massive datasets involved in AI.
Cloud Providers Expansion: Recognizing the importance of GPUs, cloud giants like AWS and GCP haven't been idle. They've incorporated Nvidia GPUs into their cloud offerings, providing scalable GPU instances for machine learning practitioners and enterprises.
Encroachment and Competitive Strategies
Nvidia
DGX Systems and Supercomputers: Nvidia's DGX series offers integrated AI systems for high-performance AI computation in data centers, positioning Nvidia as a comprehensive solution provider.
AI Software Ecosystem: Nvidia's software tools (cuDNN, TensorRT) complement their hardware, creating a tightly integrated AI workflow.
Cloud Providers
Custom Chips Development: AWS’s Trainium and Inferentia, and Google’s TPUs, are custom chips designed for AI workloads, challenging Nvidia’s dominance.
Integrated AI Services: AWS SageMaker and Google Vertex AI provide end-to-end AI services, integrating custom chips with other AI tools.
Innovation and Differentiation
Nvidia continues to innovate with new GPU architectures that push the boundaries of performance and efficiency. In Computex, this year, Nvidia CEO Jensen Huang announced the launch of a new product family annually instead of releasing once every two years. They plan to launch Blackwell this year, Blackwell Ultra next year and next generation chip named Ruben in 2026. Their focus on AI-specific features, like Tensor Cores, keeps them at the forefront of hardware innovation.
Amazon's custom chips, Inferentia and Trainium, are optimized for specific AI workloads to offer cost and performance benefits. Inferentia excels at inference tasks, efficiently running pre-trained deep learning models on smaller, cheaper instances than traditional GPUs, providing significant cost savings. Trainium, on the other hand, is optimized for training large deep learning models with high throughput, enabling faster and potentially more cost-effective training compared to traditional GPUs. Additionally, both AWS and GCP provide tight integration, through SageMaker and Vertex AI with their cloud platforms simplifying deployment and management of AI workloads for enterprises.
Strategic Partnerships and Acquisitions
Nvidia has pursued strategic acquisitions, such as Mellanox for networking and ARM for CPU capabilities, to expand its technological capabilities and strengthen its position in the data center market. When the ARM acquisition failed, Nvidia turned it into a strategic partnership. In 2003, we shared an office building with Mellanox, which was developing networking products based on InfiniBand and Ethernet while we were building Fibre Channel switches. Nearly 16 years later, in 2019, Nvidia acquired Mellanox to bridge the gap needed to offer a full-stack solution, from AI compute to networking, for their next-gen data center platforms.
AWS's acquisition of Annapurna Labs was instrumental in launching the AWS Nitro system in 2016. During my time at AWS, I worked on the Nitro security module as part of the EC2 team. The same Annapurna team is likely behind AWS's recent AI chips. In addition, AWS has strategically partnered with Mistral.AI and Anthropic to enhance its AI and machine learning capabilities. These partnerships allow AWS to integrate cutting-edge AI research and technologies into its cloud services. Mistral.AI contributes expertise in advanced AI models, while Anthropic focuses on creating reliable and interpretable AI systems. Together, these initiatives position AWS as a leader in the AI and machine learning space, offering robust and innovative solutions to its customers.
Conclusion
The Gorilla Game is an interesting read and provides a valuable framework for analyzing industry leadership. I wanted to apply it to discuss in the domain that I am currently working and have shared my thoughts on industry leaders in the AI/ML space. While Nvidia is currently dominant, it is too early to declare a definitive winner. Cloud companies like Amazon and Google are building their own special chips and offering complete AI services all in one place. This means the future might see both Nvidia and cloud companies working together, with the best company winning based on who solves customers' problems most effectively.
The "Gorilla Game" lens can be applied to other tech sectors too! See if you can spot the gorillas and challengers shaping the future of your favorite technology.