小程序
传感搜
传感圈

Is AMD narrowing the AI gap on Nvidia?

2023-07-05 08:25:11
关注

  •  

AMD-built artificial intelligence chips are “almost” as fast as the industry leading devices from Nvidia. That is according to a new study by Databricks-owned AI software company MosaicML which found AMD’s technology achieved 80% of Nvidia’s performance when training large language models and performing other AI-intensive tasks.

MosaicML put the AMD MI250 against the Nvidia A100 and had both train different sized large language models (Photo: Jimmy Tudeschi / Shutterstock)
MosaicML put the AMD MI250 against the Nvidia A100 and had both train different sized large language models (Photo: Jimmy Tudeschi / Shutterstock)

Nvidia currently dominates the market when it comes to training AI models such as those used to run ChatGPT or Midjourney. The success of these products and demand for compute power has pushed Nvidia to a $1trn valuation and sparked a shortage of GPUs. 

MosaicML recently put AMDs M1250 GPUs to the test against the Nvidia A100s. Both devices, which are one generation behind their respective developer’s top of the range chip, were used to train large language models, with researchers finding that the AMD and Nvidia chips both worked “out of the box” in training the models and AMD had about 80% of the Nvidia performance.

The team trained models ranging from one billion to 13 billion parameters, similar to those being used in enterprise to provide AI-driven tools for search and summary of large company datasets.  They were trained on a single node of four GPUs and found the throughput of the MI250 was within 80% of the A100s. The MI250 had a slight edge in terms of floating-point operations per second and memory, which according to MosaicML allows for larger models per GPU.

The company plans to profile larger models on larger clusters of GPUs to confirm whether the AMD systems can perform at scale and are doing so in partnership with hyperscalers. There are also plans to create inference benchmarks and use other models like diffiusion models on both systems to test a wider range of options.

While the chips weren’t the top-tier products from each company, both are widely used in datacentres and in training AI models. MosaicML says new ML training hardware is necessary to “increase compute availability amid the Nvidia supply crunch”.

AMD driven by software

MosaicML says the AMD performance was related to a new version of the vendor’s software that was released last year and interacts with open-source AI software PyTorch. Hanlin Tang, MosaicML CTO says further software updates from AMD for the MI250 will allow it to match the performance of the Nvidia A100 by the end of the year.

He said that AMD had done particularly well in software, allowing it to keep pace with and catch up to Nvidia despite differences in hardware performance. Tang says its possible to switch to AMD without requiring changes to code bases or re-writing the large language model, adding that he believes “they’re essentially interchangeable”.

Content from our partners

The security challenges of digitalising the energy grid

The security challenges of digitalising the energy grid

A renewed demand for film rewards Kodak’s legacy

A renewed demand for film rewards Kodak’s legacy

Why plugging the sustainability skills gap is key to ESG

Why plugging the sustainability skills gap is key to ESG

Tang said AMD did not pay it to conduct the research. His company produces software designed to make it easier for enterprise to create AI models and train them in-house rather than rely on tools from OpenAI or other large AI labs. He said the research was to show there are choices beyond Nvidia.

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

“Overall, we are incredibly optimistic about the future market for AI training hardware,” he said. “More good options means more compute supply, more market pressure on prices, and ultimately lower costs for users who want to train their own models.”

Databricks revealed it had paid $1.3bn for MosaicML last week as part of a wider effort to build an ecosystem of enterprise-ready open-source AI models. Both companies produce tools that make AI algorithms smaller and cheaper to run on large datasets but the MosaicML software will be used to enhance Databricks offering.

The report comes as Intel announced its long-term plans last week to compete on AI chips from 2025. It is shifting its strategy to focus on building products that go up against hardware from Nvidia and AMD.

Last week Intel announced its Falcon Shores chip will have 288gb of memory and support 8-bit floating point computation, which is important for training AI models. Intel also claims its Ponte Vecchio AI chip outperforms the Nvidia H100. The Ponte Vecchio has faced delays but it will be at the core of the latest supercomputer from the Argonne National Lab, with shipments due to be complete this year.

Read more: France wants to become Europe’s capital for AI

Topics in this article : AI , AMD , NVIDIA

  •  

参考译文
AMD 是否正在缩小与英伟达在人工智能领域的差距?
由AMD制造的人工智能芯片的性能几乎可以与Nvidia的行业领先产品相媲美。根据Databricks旗下的AI软件公司MosaicML最新研究,AMD的技术在训练大型语言模型和执行其他高强度AI任务时,可达到Nvidia性能的80%。MosaicML将AMD的MI250与Nvidia的A100进行了对比,并让它们训练了不同规模的大型语言模型(图片来源:Jimmy Tudeschi / Shutterstock)目前,Nvidia在训练AI模型方面占据主导地位,这些模型被用于运行ChatGPT和Midjourney等产品。这些产品的成功以及对计算能力的需求,使Nvidia的估值达到1万亿美元,同时引发了GPU的短缺。MosaicML最近对AMD的MI250进行了测试,并将其与Nvidia的A100进行对比。这两款产品都比各自厂商最新型号落后一代,但它们在训练大型语言模型时都能“即插即用”,AMD芯片的性能约为Nvidia的80%。该研究团队训练了参数数量从10亿到130亿不等的模型,这些模型与企业用于提供AI驱动工具(如搜索和大数据集摘要)的应用类似。这些模型是在由四个GPU组成的一个节点上训练的,结果显示MI250的吞吐量达到了A100的80%。此外,MI250在每秒浮点运算能力和内存方面略有优势,据MosaicML称,这使得每个GPU上可以训练出更大的模型。MosaicML计划在更大规模的GPU集群上进行更大规模模型的测试,以确认AMD系统在扩展性方面的表现,同时与大型云服务商合作进行相关实验。他们还计划创建推理基准,并使用扩散模型等其他模型在两套系统上进行测试,从而覆盖更广泛的选择。虽然这些芯片并不是各自公司最顶级的产品,但它们都在数据中心和AI模型训练中被广泛使用。MosaicML表示,新的机器学习训练硬件对于“在Nvidia供应紧张的情况下增加计算能力”是必要的。MosaicML指出,AMD的性能提升与该公司去年发布的软件新版本相关,这款软件可以与开源AI软件PyTorch协同工作。MosaicML首席技术官唐涵林表示,AMD对MI250的进一步软件更新将使其在年底前达到与Nvidia A100相当的性能。他说,AMD在软件方面表现出色,使它能够弥补硬件性能的差距,并赶上甚至超越Nvidia。唐还表示,从AMD切换过来不需要修改代码库或重新编写大型语言模型,并相信“它们基本上可以互换”。内容来自我们的合作伙伴:数字化能源电网的安全挑战;电影奖励的重新需求:柯达的遗产;填补可持续发展技能缺口对ESG的重要性唐表示,AMD并没有出资进行这项研究。他的公司开发的软件旨在帮助企业更容易地创建AI模型,并在内部进行训练,而不是依赖OpenAI或其他大型AI实验室的工具。他说进行这项研究是为了表明,除了Nvidia之外,还有其他选择。查看所有通讯订阅我们的通讯由The Tech Monitor团队为您带来数据、洞察和分析订阅地址:https://www.thetechmonitor.com他说:“总体而言,我们对AI训练硬件的未来市场充满信心。更多优质的选择意味着更多的计算能力供应,更多的市场价格竞争,最终为希望训练自有模型的用户带来更低的成本。”上周,Databricks宣布以13亿美元收购MosaicML,作为其更广泛战略的一部分,旨在建立一个由企业级开源AI模型构成的生态系统。两家公司都提供工具,帮助在大型数据集上运行的AI算法变得更小、更便宜,而MosaicML的软件将用于增强Databricks的产品线。这份报告发布之际,英特尔上周宣布了其长期计划,准备从2025年开始在AI芯片领域与Nvidia和AMD竞争。该公司正在调整战略,专注于开发可以与Nvidia和AMD硬件竞争的产品。上周,英特尔宣布其Falcon Shores芯片将配备288GB的内存,并支持8位浮点计算,这对训练AI模型至关重要。英特尔还声称,其Ponte Vecchio AI芯片的性能优于Nvidia的H100。尽管Ponte Vecchio芯片的发布有所延迟,但它将成为阿尔贡国家实验室最新超级计算机的核心,预计今年内完成交付。阅读更多:法国致力于成为欧洲的AI之都本文涉及的主题:AI,AMD,NVIDIA
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告
提取码
复制提取码
点击跳转至百度网盘