BEIJING, December 6 (TMTPOST) – Infinigence AI, a large AI model computing power company affiliated with Tsinghua University, has recently welcomed heavyweight investors including Baidu, Tencent, Zhipu AI, and Sequoia China, according to its business registration.

The company has also added several key personnel, including Cao Xi, founding partner of Monolith as a director on the board.

According to public information, Infinigence AI has completed three rounds of financing, all undisclosed, with Sequoia China as the lead investor.

The company was founded in May 2023, with the aim to create the best integrated solution for large model software and hardware.

The founding team is led by Wang Yu, the director of the Department of Electronic Engineering at Tsinghua University. Xia Lixue, the CEO of the company, and Zeng Shulin, the legal representative of the company, are both students of Wang.

Wang Yu

Wang, the youngest department head at Tsinghua University, is known as a "pioneer" in AI chips and holds titles such as IEEE Fellow.

In January 2016, the deep learning processor project led by Wang passed the evaluation by the School of Electronic Information at Tsinghua University and received support. Subsequently, the project team turned their intellectual property into shares and established the AI chip company DeePhi Tech for industrial operation.

In 2018, DeePhi Tech was acquired by the world's largest FPGA manufacturer, Xilinx, and later Xilinx was acquired by the U.S. chip giant AMD. It became the most successful AI chip company in China.

In early 2023, with AI large models represented by ChatGPT gaining popularity globally, the era of AI 2.0 was arriving. However, challenges such as high training costs and deploying computing facilities still constrained the development of large models.

In July 2023, at the World Artificial Intelligence Conference, Wang said that Infinigence AI is a platform for MxN software and hardware joint optimization in large models. By utilizing its platform infrastructure, the company aims to reduce inference costs, fine-tuning costs, and labor costs by more than 10 times, while also increasing text length by more than 10 times.

In November 2023, Infinigence AI, in collaboration with teams from Tsinghua University and Shanghai Jiao Tong University, published a paper on Arxiv introducing a new method called FlashDecoding++. This method achieves true parallelism in attention calculation through asynchronous methods, resulting in a 2-4 times speedup in GPU (graphics processing unit) inference. On Nvidia A100 GPUs, the average acceleration of inference is 37%, and it can support GPUs from both Nvidia and AMD.

FlashDecoding++ has been integrated into Infinigence AI's large model computing engine "Infini-ACC." With the support of "Infini-ACC," Infinigence AI is developing a series of integrated solutions for large model software and hardware, including the large model "Infini-Megrez".

"Infinigence AI aims to promote the development of large model technologies across various industries," said Wang in July of this year.

(This article was first published on the TMTPost App, Author | Lin Zhijia)