🚀Accelerates Japanese LLM Development with FPT AI Factory

Summary

The Japan Advanced Institute of Science and Technology (JAIST), a leading national research university, required a robust and scalable infrastructure to build a state-of-the-art Large Language Model (LLM) specifically for the Japanese language. Their goal was to conduct extensive experimentation, from optimal data combination discovery to large-scale continual pre-training, demanding significant computational power and a streamlined MLOps platform.

Partnering with FPT AI Factory, JAIST leveraged a comprehensive suite of services, including FPT AI Studio and FPT AI Inference, to accelerate their research and development pipeline. This collaboration enabled JAIST to systematically identify the best data mixtures, execute multiple large-scale continual pre-training phases on massive datasets, and efficiently evaluate model performance. By offloading the complexities of GPU infrastructure management to FPT AI Factory, JAIST's research team could focus on their core mission: advancing the frontiers of natural language processing for Japanese.

About JAIST

The Japan Advanced Institute of Science and Technology (JAIST) is a national graduate university established in October 1990 in Nomi, Ishikawa, Japan. Situated at the heart of Ishikawa Science Park, JAIST is dedicated exclusively to postgraduate education and research in advanced science and technology. JAIST concentrates on three core schools: Information Science, Materials Science and Knowledge Science and is home to specialized research centres such as the International Research Center for Artificial Intelligence and Entertainment Science, and the Research Center for AI & Soft Robots. Its mission is to cultivate the next generation of global leaders by providing world-class education and research opportunities, enabling students to drive scientific breakthroughs and contribute to society’s future through technology and innovation. JAIST emphasizes an interdisciplinary, high-level research environment and promotes close collaboration with industry and international institutions.

Challenges

Building a large-scale Japanese LLM required extensive computational resources and a flexible infrastructure to support multi-node, multi-phase training. JAIST faced long training cycles, unpredictable infrastructure demands, and a small research team without access to large in-house GPU clusters. They needed a scalable, managed AI infrastructure that could handle complex workloads while allowing researchers to focus on model development instead of system operations.

FPT AI Factory Solution

JAIST's ambitious project to build a premier Japanese LLM required a partner that could provide not just raw computing power, but also a sophisticated platform to manage the entire model development lifecycle. FPT AI Factory, with its integrated FPT AI Studio and FPT AI Inference services, provided the end-to-end solution JAIST needed.

Data Discovery

The collaboration began with a systematic search for the most effective training data combination. Using FPT AI Studio, JAIST’s researchers trained the Qwen3-0.6B model across 768 unique training data combinations, equivalent to 768 separate training runs. This critical phase was also accelerated by utilizing FPT AI Inference’s embedding models to analyze and classify text domains within the mixed training data.

Training phases

Once the ideal data combination was identified, JAIST embarked on a massive continual pre-training effort using the Qwen2.5-32B as the base model. This process was broken down into three distinct, computationally intensive phases, all managed within FPT AI Studio:

Phase 1: The base model was trained on a 100B tokens dataset, utilizing a powerful cluster of 30 nodes, each equipped with 8 NVIDIA H100 GPUs.
Phase 2: The training was scaled up significantly, with the model learning from a 267B tokens dataset running on 29 nodes. We promptly detected a faulty node and proceeded to isolate it.
Phase 3: The final phase involved a 273B tokens dataset. This dataset included the 267B tokens from the previous phase, augmented with new instruction data generated by the Qwen3-235B-A22B model, a task facilitated by FPT AI Inference services. This phase reused a 30-node H100 GPU cluster for training.

Throughout this complex process, FPT AI Factory's engineers provided close, dedicated support, ensuring the seamless execution of these large-scale training jobs.

Evaluation

For evaluation, JAIST utilized the full capabilities of FPT AI Studio. The continually pretrained models underwent LoRA fine-tuning and were rigorously benchmarked against the Nejumi Leaderboard 3 using the Test Jobs feature. Furthermore, the Interactive Session feature allowed JAIST researchers to serve the fine-tuned models and conduct their own internal, custom benchmarks.

Business Impact

Infrastructure Efficiency: By leveraging the FPT AI Factory platform, JAIST completely eliminated the overhead of managing a complex, large-scale GPU infrastructure. This allowed their team of researchers to dedicate their time and expertise to model development and core research challenges rather than MLOps and job orchestration.
Accelerated Research & Development: The powerful, scalable infrastructure and streamlined workflow provided by FPT AI Studio enabled JAIST to rapidly iterate through experiments. The ability to systematically test hundreds of data combinations and execute multi-stage pre-training on hundreds of billions of tokens significantly accelerated their path to developing a high-performing Japanese LLM.
Enhanced Model Performance and Evaluation: The integrated solution allowed for a seamless transition from large-scale training to fine-tuning and robust evaluation. Access to features like Test Jobs for standardized benchmarking and Interactive Session for custom assessments provided JAIST with the comprehensive tools needed to validate and refine their models effectively.
Collaborative Partnership: The close support from FPT AI Factory's AI engineers functioned as an extension of the JAIST team. This collaborative approach ensured that technical challenges were swiftly addressed, and the project maintained its ambitious timeline, fostering an environment of shared learning and innovation.

PreviousSuccess Stories - How we help teams build AI faster!NextFAQ

Last updated 5 days ago