Reviews

13.5 millionth Andromeda AI supercomputer unveiled

This week, Cerebras Systems introduced the new Andromeda supercomputer. The 13.5 million core AI supercomputer is now available for commercial and academic work. Equipped with more than 13.5 million AIoptimized compute cores and 18,176 3rd Gen AMD EPYC processors (1.6 times as many cores as the world’s largest supercomputer). Andromeda provides nearperfect scaling through simple data parallelism in large GPTclass language models including GPT3, GPTJ and GPTNeoX, unlike any known GPUbased cluster.

Andromeda is deployed in Santa Clara, California in 16 racks at Kolovore, a leading high performance data center. 16 CS2 systems with 13.5 million AIoptimized cores are powered by 284 64core 3rd generation AMD EPYC processors. The SwarmX fabric, which links the MemoryX parameter storage solution to 16 CS2s, delivers over 96.8 terabits of throughput.

“Nearperfect scaling means that with additional CS2s, training time is reduced to nearperfect proportions. This includes large language models with very long sequence lengths that cannot be done on GPUs. Virtually impossible GPU performance was demonstrated by one of the first Andromeda users who achieved near perfect scaling on GPTJ at 2.5 billion and 25 billion longsequence parameters – MSL 10 240. Users tried to do the same work on Polaris, an Nvidia A100 cluster from 2000, but GPUs were unable to do the job due to GPU memory and memory bandwidth limitations.”

Andromeda

“Nearperfect scaling of Andromeda for the largest natural language processing models is made possible by the second generation Cerebras Wafer Scale Engine (WSE2), the largest and most powerful processor in the industry, as well as Cerebras MemoryX and Swarm X technologies.

MemoryX allows even one CS2 to support models with several trillion parameters. SwarmX technology links MemoryX to the CS2 cluster. Together, these industryleading technologies enable large Cerebras clusters to avoid two of the main challenges faced by traditional clusters used for modern AI work: the complexity of parallel programming and the performance degradation of distributed computing.”

artificial intelligence supercomputer

“Andromeda’s 16 CS2s run in a strictly parallel data mode, allowing for simple and easy distribution of models and scaling from 1 to 16 CS2s with a single keystroke. In fact, submitting AI jobs to Andromeda can be done quickly and painlessly from a Jupyter laptop, and users can switch from one model to another with a few keystrokes.

The 16 CS2 Andromeda were built in just 3 days with no code changes, and the workloads scaled linearly across all 16 systems right after that. And because the Cerebras WSE2 processor at the heart of its CS2 has 1,000 times more memory bandwidth than a GPU, Andromeda can collect structured and unstructured sparsity, as well as static and dynamic sparsity. This is something that other hardware accelerators, including GPUs, simply cannot do. As a result, Cerebras can train models over 90% sparse to stateoftheart accuracy.”

  • Argonne National Laboratory: “In collaboration with Cerebras researchers, our team at Argonne has completed groundbreaking work on gene transducers, work that has been a finalist for the ACM Gordon Bell Special Award for High Performance ComputingBased COVID19 Research. Using GPT3XL, we placed the entire COVID19 genome in the sequence window, and Andromeda ran our unique long sequence (MSL 10,000) genetic load at 1, 2, 4, 8, and 16 nodes with nearly perfect linear scaling. Linear scaling is one of the most requested features of a large cluster, and Cerebras Andromeda delivered 15.87 times throughput on 16 CS2 systems compared to a single CS2, as well as reduced training time. Andromeda sets a new benchmark for AI accelerator performance,” said Rick Stevens, Associate Laboratory Director, Argonne National Laboratory.
  • JasperAI: “Jasper uses large language models to write copy for marketing, advertising, books and more. We have over 85,000 clients who use our models to create dynamic content and ideas. Given our large and growing customer base, we are exploring test and scale models to suit each customer and their use cases. Creating complex new AI systems and delivering them to customers at an increasing level of detail requires a lot from our infrastructure. We are very excited to partner with Cerebras and leverage Andromeda’s performance and nearperfect scaling without traditional distributed computing and parallel programming challenges to develop and optimize our next set of models,” said Dave Rogenmoser, CEO of JasperAI.
  • AMD: “AMD is investing in technologies that will pave the way for the spread of artificial intelligence, opening up new opportunities for business efficiency and agility. The combination of the Cerebras Andromeda AI supercomputer and the AMD EPYC serverbased preprocessing pipeline will provide researchers with more options and support faster and deeper AI capabilities,” said Kumaran Siva, Corporate Vice President, Software and Systems Business Development, AMD.
  • University of Cambridge: “It’s amazing that Cerebras gave graduate students free access to such a large cluster. Andromeda offers 13.5 million AI cores and nearperfect linear scaling for the largest language models without the hassle of distributed computing and parallel programming. This is the dream of every machine learning PhD student,” said Mateo Espinosa, PhD student at the University of Cambridge in the United Kingdom.

Source: Cerebra Systems.

Headings: Technology news, Main news

Latest Geeky Gadgets Deals

Disclosure: Some of our articles contain affiliate links. If you buy something from one of these links, Geeky Gadgets may earn an affiliate commission. Learn more.

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker.