Mistral AI and NVIDIA have launched the Mistral NeMo 12B, a state-of-the-art language model designed to revolutionize enterprise AI applications. This advanced model, boasting 12 billion parameters, promises unparalleled accuracy, flexibility, and efficiency, making it a versatile tool for various enterprise needs.
The Mistral NeMo 12B is designed to excel in a wide range of tasks, including chatbots, multilingual processing, coding, and summarization. One of its standout features is the impressive context window of up to 128,000 tokens, allowing the model to process and understand extensive and complex information more coherently than its predecessors.
Guillaume Lample, co-founder and chief scientist of Mistral AI, highlighted the significance of this collaboration:
We are fortunate to collaborate with the NVIDIA team, leveraging their top-tier hardware and software. Together, we have developed a model with unprecedented accuracy, flexibility, high efficiency, and enterprise-grade support and security thanks to NVIDIA AI Enterprise deployment.
The Mistral NeMo 12B was trained on the NVIDIA DGX Cloud AI platform, which provides scalable access to the latest NVIDIA architecture. This model utilizes NVIDIA TensorRT-LLM for accelerated inference performance and the NVIDIA NeMo development platform for building custom generative AI models. This combination ensures that the Mistral NeMo 12B delivers high performance across diverse applications.
One of the key technological advancements in the Mistral NeMo 12B is its use of the FP8 data format for model inference. This reduces memory size and speeds up deployment without compromising accuracy. Additionally, the model's architecture allows it to fit on the memory of a single NVIDIA L40S, NVIDIA GeForce RTX 4090, or NVIDIA RTX 4500 GPU, making it highly efficient and cost-effective.
Mistral NeMo 12B is designed for global applications, with robust multilingual capabilities. It excels in languages such as English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. This broad linguistic proficiency is achieved through the new Tekken tokenizer, which is based on Tiktoken and trained on over 100 languages. Tekken is approximately 30% more efficient at compressing source code and several major languages compared to previous tokenizers, making it a significant advancement in natural language processing.
Packaged as an NVIDIA NIM inference microservice, Mistral NeMo 12B offers performance-optimized inference with NVIDIA TensorRT-LLM engines. This containerized format allows for easy deployment across various environments, providing enhanced flexibility for enterprise applications. The model also comes with comprehensive support, direct access to NVIDIA AI experts, and defined service-level agreements, ensuring reliable and consistent performance.
Mistral NeMo 12B's release under the Apache 2.0 license encourages innovation and supports the broader AI community. This open-source approach is likely to accelerate the model's adoption among researchers and enterprises, facilitating the development of advanced AI solutions. The model's weights are hosted on HuggingFace, making them readily available for developers and researchers to experiment with and adapt to their specific needs.