In a remarkable technological advancement, Cerebras, a lesser-known company in comparison to industry giant Nvidia, has announced the development of the Cerebras Wafer Scale Engine-2 (WSE-2), a massive processor designed exclusively for artificial intelligence (AI) and deep learning. Despite its ordinary name, the WSE-2 is a groundbreaking achievement that even Nvidia's CEO, Jensen Huang, may covet.

At the core of this innovation is the tremendous size of the processor, which spans an entire 12-inch wafer. Utilizing the TSMC 7nm process, the second-generation WSE-2 boasts a staggering 2.6 trillion transistors, surpassing Nvidia's Hopper with its 80 billion transistors. The WSE-2's architecture consists of 850,000 cores optimized for tensor computing, setting a new benchmark in the industry. To support its processing power, each wafer incorporates 40GB of high-speed static RAM with a bandwidth of 20 petabytes per second. The interconnect that facilitates communication within the wafer operates at an astonishing 220 petabytes per second.

Cerebras aims to leverage this monumental processing capacity to construct a series of supercomputers. The initial one, named Condor Galaxy 1, is being developed in collaboration with AI service provider G42. This impressive machine comprises 64 WSE-2 chips, totaling a remarkable 54 million cores capable of delivering 4 exaflops of computing power. Cerebras plans to complete two additional siblings, CG-2 and CG-3, by early 2024. These three supercomputers will be interconnected via dedicated fiber optic cables, forming a distributed AI supercomputer with a combined power of 12 exaflops and 162 million cores. Supporting this setup are more than 218,000 high-performance AMD EPYC CPU cores. Ultimately, nine of these 64-chip supercomputers, boasting a total of 36 exaflops of raw power, will be constructed, with the first three located at the Colovore data center in Santa Clara, California. The locations of the remaining six are yet to be announced.

To facilitate the scaling of large AI tasks, the Cerebras clusters integrate various technologies to support weight broadcasting on neural network edges and weight updates. Notably, Cerebras has dedicated significant effort to the development of its software development kit (SDK), streamlining the programming process and improving scalability. Unlike distributed clusters, where the distribution of tasks across compute engines can be complex, Cerebras’ SDK simplifies the scaling process. With a mere configuration file change, machine learning applications built on the Cerebras SDK can scale up across all available compute nodes. Cerebras claims near-linear scaling, with 540 Wafer Scale Engines providing a 540-fold increase in speed compared to a single engine, achieving an impressive 1:1 scaling factor. This is a notable improvement over existing technologies, including Nvidia's DGX-A100 system with its less favorable scaling factor of 89x.

The power and potential of even a single Wafer Scale Engine-2 are awe-inspiring. If Cerebras can truly achieve near-linear scaling, the interconnected clusters of the Galaxy project have the potential to redefine the boundaries of computing power. As these clusters come online over the next year, their capabilities are set to challenge the limits of imagination.