Jul 23, 2024
On Tuesday, July 23, Meta announced the launch of the Llama 3.1 collection of multilingual large language models (LLMs). Llama 3.1 comprises both pre-trained and instruction-tuned text in/text out open source generative AI models in sizes of 8B, 70B and—for the first time—405B parameters.
Llama 3.1 405B is the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. With the release of the 405B model, we’re poised to supercharge innovation—with unprecedented opportunities for growth and exploration. We believe the latest generation of Llama will ignite new applications and modeling paradigms, including synthetic data generation to enable the improvement and training of smaller models, as well as model distillation—a capability that has never been achieved at this scale in open source.
As part of this latest release, we’re introducing upgraded versions of the 8B and 70B models. These are multilingual and have a significantly longer context length of 128K, state-of-the-art tool use, and overall stronger reasoning capabilities. This enables our latest models to support advanced use cases, such as long-form text summarization, multilingual conversational agents, and coding assistants. We’ve also made changes to our license, allowing developers to use the outputs from Llama models—including the 405B—to improve other models. True to our commitment to open source, starting today, we’re making these models available to the community for download on llama.meta.com and Hugging Face and available for immediate development on our broad ecosystem of partner platforms.
An important step forward for accessible, open, responsible AI innovation
In December of 2023, Meta and IBM launched the AI Alliance in collaboration with over 50 global founding members and collaborators. Bringing together leading organizations across industry, startups, academia, research and government, the AI Alliance aspires to shape the evolution of AI to best reflect the needs and complexity of our societies. Since its founding, the Alliance has grown to over 100 members.
More specifically, the AI Alliance is dedicated to fostering an open community that enables developers and researchers to accelerate responsible innovation while ensuring trust, safety, security, diversity, scientific rigor and economic competitiveness. To that end, the Alliance supports projects that develop and deploy benchmarks and evaluation standards, help address society-wide challenges, support global AI skills building and encourage open development of AI in safe and beneficial ways.
Llama 3.1 furthers that mission by providing the global AI community with an open, state-of-the-art model family and development ecosystem to build, experiment and responsibly scale new ideas and approaches. Alongside its powerful new models, the release includes robust system level safety measures, new cyber security evaluation measures and updated inference-time guardrails. Collectively, these resources encourage standardization of the development and usage of trust and safety tools for generative AI.
Getting started with Llama 3.1
Unlike closed models, Llama model weights are available to download. Developers can fully customize the models for their needs and applications, train on new datasets, and conduct additional fine-tuning. This enables the broader developer community and the world to more fully realize the power of generative AI. Developers can fully customize for their applications and run in any environment, including on prem, in the cloud, or even locally on a laptop—all without sharing data with Meta.
While many may argue that closed models are more cost effective, Llama models offer some of the lowest cost per token in the industry, according to testing by Artificial Analysis. And as Mark Zuckerberg noted, open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn’t concentrated in the hands of a small few, and that the technology can be deployed more evenly and safely across society. That’s why Meta claims to continue to take steps on the path for open access AI to become the industry standard.
For the average developer, using a model at the scale of the 405B is challenging. Meta takes initiatives to enable everyone to get the most out of the 405B, including:
Real-time and batch inference
Supervised fine-tuning
Evaluation of your model for your specific application
Continual pre-training
Retrieval-Augmented Generation (RAG)
Function calling
Synthetic data generation
This is where the Llama ecosystem can help. On day one, developers can take advantage of all the advanced capabilities of the 405B model and start building immediately. Developers can also explore advanced workflows like easy-to-use synthetic data generation, follow turnkey directions for model distillation, and enable seamless RAG with solutions from partners, including AWS, NVIDIA, and Databricks. Additionally, Groq has optimized low-latency inference for cloud deployments, with Dell achieving similar optimizations for on-prem systems.
Try the Llama 3.1 collection of models today
Comments