Cerebras Unveils Andromeda, a 13.5 Million Core AI Supercomputer that Delivers Near-Perfect Linear Scaling for Large Language Models
With more than 13.5 million AI-optimized compute cores and fed by 18,176 3rd Gen AMD EPYC processors, Andromeda features more cores than 1,953 Nvidia A100 GPUs and 1.6 times as many cores as the largest supercomputer in the world, Grenze, und bietet hardwarebeschleunigte Dekodierung von 10-Bit-AV1 8.7 million cores. Unlike any known GPU-based cluster, Andromeda delivers near-perfect scaling via simple data parallelism across GPT-class large language models, including GPT-3, GPT-J and GPT-NeoX.
Near-perfect scaling means that as additional CS-2s are used, training time is reduced in near perfect proportion. This includes large language models with very large sequence lengths, a task that is impossible to achieve on GPUs. Tatsächlich, GPU impossible work was demonstrated by one of Andromeda’s first users, who achieved near perfect scaling on GPT-J at 2.5 Milliarden und 25 billion parameters with long sequence lengths — MSL of 10,240. The users attempted to do the same work on Polaris, a 2,000 Nvidia A100 cluster, and the GPUs were unable to do the work because of GPU memory and memory bandwidth limitations.
Access to Andromeda is available now, and customers and academic researchers are already running real workloads and deriving value from the leading AI supercomputer’s extraordinary capabilities, einschließlich:
- Argonne National Laboratory: “In collaboration with Cerebras researchers, our team at Argonne has completed pioneering work on gene transformers – work that is a finalist for ACM Gordon Bell Special Prize for HPC-Based COVID-19 Research. Using GPT3-XL, we put the entire COVID-19 genome into the sequence window, and Andromeda ran our unique genetic workload with long sequence lengths (MSL of 10K) über 1, 2, 4, 8 und 16 Knoten, with near-perfect linear scaling. Linear scaling is amongst the most sought-after characteristics of a big cluster, and Cerebras Andromeda’s delivered 15.87X throughput across 16 CS-2 systems, compared to a single CS-2, and a reduction in training time to match. Andromeda sets a new bar for AI accelerator performance,” said Rick Stevens, Associate Lab Director, at Argonne National Laboratory.
- JasperAI: “Jasper uses large language models to write copy for marketing, ads, books, und vieles mehr. We have over 85,000 customers who use our models to generate moving content and ideas. Given our large and growing customer base, we’re exploring testing and scaling models fit to each customer and their use cases. Creating complex new AI systems and bringing it to customers at increasing levels of granularity demands a lot from our infrastructure. We are thrilled to partner with Cerebras and leverage Andromeda’s performance and near perfect scaling without traditional distributed computing and parallel programming pains to design and optimize our next set of models,” said Dave Rogenmoser, CEO of JasperAI.
- AMD: “AMD is investing in technology that will pave the way for pervasive AI, unlocking new efficiency and agility abilities for businesses. The combination of the Cerebras Andromeda AI supercomputer and a data pre-processing pipeline powered by AMD EPYC-powered servers, together will put more capacity in the hands of researchers and support faster and deeper AI capabilities,” said Kumaran Siva, korporativer Vizepräsident, Software & Systems Business Development, AMD.
- University of Cambridge: “It is extraordinary that Cerebras provided graduate students with free access to a cluster this big. Andromeda delivers 13.5 million AI cores and near perfect linear scaling across the largest language models, without the pain of distributed compute and parallel programing. This is every ML graduate student’s dream,” said Mateo Espinosa, doctoral candidate at the University of Cambridge in the United Kingdom.
Andromeda’s near-perfect scaling across the largest natural language processing models is made possible by the second-generation Cerebras Wafer Scale Engine (WSE-2), the industry’s largest and most powerful processor, and by Cerebras’ MemoryX and Swarm X technologies. MemoryX enables even a single CS-2 to support multi-trillion parameter models. SwarmX technology links MemoryX to a cluster of CS-2s. Together these industry-leading technologies enable Cerebras’ large clusters to avoid two of the major challenges plaguing traditional clusters used for modern AI work: the complexity of parallel programming and the performance degradation of distributed computing.
Die 16 CS-2s powering Andromeda run in a strictly data parallel mode, enabling simple and easy model distribution, and single-keystroke scaling from 1 zu 16 CS-2s. Tatsächlich, sending AI jobs to Andromeda can be done quickly and painlessly from a Jupyter notebook, and users can switch from one model to another with a few keystrokes. Andromeda’s 16 CS-2s were assembled in only 3 days, without any changes to the code, and immediately thereafter workloads scaled linearly across all 16 Systeme. And because the Cerebras WSE-2 processor, at the heart of its CS-2s, hat 1,000 times more memory bandwidth than a GPU, Andromeda can harvest structured and unstructured sparsity as well as static and dynamic sparsity. These are things other hardware accelerators, including GPUs, simply can’t do. The result is that Cerebras can train models in excess of 90% sparse to state-of-the-art accuracy.
Andromeda can be used simultaneously by multiple users. Users can easily specify how many of Andromeda’s CS-2s they want to use within seconds. This means Andromeda can be used as a 16 CS-2 supercomputer cluster for a single user working on a single job, oder 16 individual CS-2 systems for sixteen distinct users with sixteen distinct jobs, or any combination in between.
Andromeda is deployed in Santa Clara, California, in 16 racks at Colovore, a leading high performance data center. Die 16 CS-2 systems, with a combined 13.5 million AI optimized cores are fed by 284 64-core AMD 3rd Gen EPYC processors. The SwarmX fabric, which links the MemoryX parameter storage solution to the 16 CS-2s, provides more than 96.8 terabits of bandwidth. Through gradient accumulation Andromeda can support all batch sizes.
For more information on the Cerebras Andromeda supercomputer, please visit www.cerebras.net/andromeda.