Ant Group makes use of home chips to coach AI fashions and lower prices

3 April 2025

14

Ant Group is counting on Chinese language-made semiconductors to coach synthetic intelligence fashions to scale back prices and reduce dependence on restricted US expertise, in accordance with folks acquainted with the matter.

The Alibaba-owned firm has used chips from home suppliers, together with these tied to its mum or dad, Alibaba, and Huawei Applied sciences to coach massive language fashions utilizing the Combination of Specialists (MoE) methodology. The outcomes had been reportedly akin to these produced with Nvidia’s H800 chips, sources declare. Whereas Ant continues to make use of Nvidia chips for a few of its AI growth, one sources mentioned the corporate is popping more and more to alternate options from AMD and Chinese language chip-makers for its newest fashions.

The event indicators Ant’s deeper involvement within the rising AI race between Chinese language and US tech companies, notably as corporations search for cost-effective methods to coach fashions. The experimentation with home {hardware} displays a broader effort amongst Chinese language companies to work round export restrictions that block entry to high-end chips like Nvidia’s H800, which, though not probably the most superior, remains to be one of many extra highly effective GPUs out there to Chinese language organisations.

Ant has printed a analysis paper describing its work, stating that its fashions, in some exams, carried out higher than these developed by Meta. Bloomberg News, which initially reported the matter, has not verified the corporate’s outcomes independently. If the fashions carry out as claimed, Ant’s efforts might symbolize a step ahead in China’s try and decrease the price of operating AI functions and cut back the reliance on international {hardware}.

MoE fashions divide duties into smaller information units dealt with by separate parts, and have gained consideration amongst AI researchers and information scientists. The method has been utilized by Google and the Hangzhou-based startup, DeepSeek. The MoE idea is just like having a group of specialists, every dealing with a part of a job to make the method of manufacturing fashions extra environment friendly. Ant has declined to touch upon its work with respect to its {hardware} sources.

Coaching MoE fashions will depend on high-performance GPUs which could be too costly for smaller corporations to amass or use. Ant’s analysis targeted on lowering that price barrier. The paper’s title is suffixed with a transparent goal: Scaling Fashions “with out premium GPUs.” [our quotation marks]

The route taken by Ant and using MoE to scale back coaching prices distinction with Nvidia’s method. CEO Officer Jensen Huang has mentioned that demand for computing energy will proceed to develop, even with the introduction of extra environment friendly fashions like DeepSeek’s R1. His view is that corporations will search extra highly effective chips to drive income progress, relatively than aiming to chop prices with cheaper alternate options. Nvidia’s technique stays targeted on constructing GPUs with extra cores, transistors, and reminiscence.

Based on the Ant Group paper, coaching one trillion tokens – the fundamental items of information AI fashions use to be taught – price about 6.35 million yuan (roughly $880,000) utilizing typical high-performance {hardware}. The corporate’s optimised coaching methodology decreased that price to round 5.1 million yuan through the use of lower-specification chips.

Ant mentioned it plans to use its fashions produced on this manner – Ling-Plus and Ling-Lite – to industrial AI use circumstances like healthcare and finance. Earlier this 12 months, the corporate acquired Haodf.com, a Chinese language on-line medical platform, to additional Ant’s ambition to deploy AI-based options in healthcare. It additionally operates different AI companies, together with a digital assistant app referred to as Zhixiaobao and a monetary advisory platform generally known as Maxiaocai.

“Should you discover one level of assault to beat the world’s greatest kung fu grasp, you may nonetheless say you beat them, which is why real-world software is necessary,” mentioned Robin Yu, chief expertise officer of Beijing-based AI agency, Shengshang Tech.

Ant has made its fashions open supply. Ling-Lite has 16.8 billion parameters – settings that assist decide how a mannequin capabilities – whereas Ling-Plus has 290 billion. For comparability, estimates counsel closed-source GPT-4.5 has round 1.8 trillion parameters, in accordance with MIT Expertise Overview.

Regardless of progress, Ant’s paper famous that coaching fashions stays difficult. Small changes to {hardware} or mannequin construction throughout mannequin coaching generally resulted in unstable efficiency, together with spikes in error charges.

(Picture by Unsplash)