Tencent has expanded its household of open-source Hunyuan AI fashions which are versatile sufficient for broad use. This new household of fashions is engineered to ship highly effective efficiency throughout computational environments, from small edge units to demanding, high-concurrency manufacturing programs.
The discharge features a complete set of pre-trained and instruction-tuned fashions obtainable on the developer platform Hugging Face. The fashions are available in a number of sizes, particularly with parameter scales of 0.5B, 1.8B, 4B, and 7B, offering substantial flexibility for builders and companies.
Tencent has indicated that these fashions had been developed utilizing coaching methods just like its extra highly effective Hunyuan-A13B mannequin, permitting them to inherit its efficiency traits. This method allows customers to pick out the optimum mannequin for his or her wants, whether or not it’s a smaller variant for resource-constrained edge computing or a bigger mannequin for high-throughput manufacturing workloads, all whereas making certain sturdy capabilities.
One of the crucial notable options of the Hunyuan collection is its native assist for an ultra-long 256K context window. This permits the fashions to deal with and preserve secure efficiency on long-text duties, a significant functionality for complicated doc evaluation, prolonged conversations, and in-depth content material technology. The fashions assist what Tencent calls “hybrid reasoning,” which permits for each quick and gradual considering modes that customers can select between relying on their particular necessities.
The corporate has additionally positioned a powerful emphasis on agentic capabilities. The fashions have been optimised for agent-based duties and have demonstrated main outcomes on established benchmarks comparable to BFCL-v3, τ-Bench, and C3-Bench, suggesting a excessive diploma of proficiency in complicated, multi-step problem-solving. As an example, on the C3-Bench, the Hunyuan-7B-Instruct mannequin achieves a rating of 68.5, whereas the Hunyuan-4B-Instruct mannequin scores 64.3.
The collection’ efficiency is a give attention to environment friendly inference. Tencent’s Hunyuan fashions utilise Grouped Question Consideration (GQA), a method recognized for enhancing processing velocity and lowering computational overhead. This effectivity is additional enhanced by superior quantisation assist, a key component of the Hunyuan structure designed to decrease deployment boundaries.
Tencent has developed its personal compression toolset, AngleSlim, to create a extra user-friendly and efficient mannequin compression resolution. Utilizing this software, the corporate gives two important forms of quantisation for the Hunyuan collection.
The primary is FP8 static quantisation, which employs an 8-bit floating-point format. This methodology makes use of a small quantity of calibration information to pre-determine the quantisation scale with out requiring full retraining, changing mannequin weights and activation values into the FP8 format to spice up inference effectivity.
The second methodology is INT4 quantisation, which achieves W4A16 quantisation via the GPTQ and AWQ algorithms:
- The GPTQ method processes mannequin weights layer by layer, utilizing calibration information to minimise errors within the quantised weights. This course of avoids requiring mannequin retraining and improves inference velocity.
- The AWQ algorithm works by statistically analysing the amplitude of activation values from a small set of calibration information. It then calculates a scaling coefficient for every weight channel, which expands the numerical vary of necessary weights to retain extra data in the course of the compression course of.
Builders can both use the AngleSlim software themselves or obtain the pre-quantised fashions immediately.
Efficiency benchmarks verify the sturdy capabilities of the Tencent Hunyuan fashions throughout a variety of duties. The pre-trained Hunyuan-7B mannequin, for instance, achieves a rating of 79.82 on the MMLU benchmark, 88.25 on GSM8K, and 74.85 on the MATH benchmark, demonstrating strong reasoning and mathematical abilities.
The instruction-tuned variants present spectacular ends in specialised areas. In arithmetic, the Hunyuan-7B-Instruct mannequin scores 81.1 on the AIME 2024 benchmark, whereas the 4B model scores 78.3. In science, the 7B mannequin reaches 76.5 on OlympiadBench, and in coding, it scores 42 on Livecodebench.
The quantisation benchmarks present minimal efficiency degradation. On the DROP benchmark, the Hunyuan-7B-Instruct mannequin scores 85.9 in its base B16 format, 86.0 with FP8, and 85.7 with Int4 GPTQ, indicating that effectivity beneficial properties don’t come at a value to accuracy.
For deployment, Tencent recommends utilizing established frameworks like TensorRT-LLM, vLLM, or SGLang to serve the Hunyuan fashions and create OpenAI-compatible API endpoints, making certain they are often built-in easily into current improvement workflows. This mix of efficiency, effectivity, and deployment flexibility positions the Hunyuan collection as a unbroken highly effective contender in open-source AI.
See additionally: Deep Cogito v2: Open-source AI that hones its reasoning skills

Need to be taught extra about AI and massive information from trade leaders? Try AI & Big Data Expo happening in Amsterdam, California, and London. The excellent occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Discover different upcoming enterprise expertise occasions and webinars powered by TechForge here.