New Llama 3.3: the 70B AI model from Meta for developers


dida


Meta AI has introduced Llama 3.3 in December 2024, a 70-billion-parameter language model designed to be both powerful and efficient. Despite being smaller than its predecessor, Llama 3.1 405B model, it delivers similar performance while requiring much less computing power. This makes it easier for developers to work with advanced AI tools without needing high-end hardware. Llama 3.3 is built to handle a wide range of tasks, from coding to multilingual understanding, making it a versatile tool for many applications. In this article, we will dive deeper into its key features, performance highlights, and how it compares to both its predecessors and other leading language models in the market.


Key features of Llama 3.3 70B model


Llama 3.3 features a 70-billion-parameter architecture, utilizing a transformer-based design optimized for performance and efficiency. Compared to its predecessor, Llama 3.1, it delivers significantly better results while requiring much lower computational power, making it more accessible to developers with standard hardware setups. This optimization includes enhancements like Grouped-Query Attention (GQA), which speeds up text processing and reduces memory usage. Focused solely on text-based assignments, The model does not yet support image or voice input, prioritizing efficiency in natural language understanding and generation. Trained on over 15 trillion tokens from publicly available data sources, the model has a knowledge cutoff in December 2023, ensuring it is equipped with relatively recent and diverse information. It has been fine-tuned with supervised learning and reinforcement learning from human feedback (RLHF) to align its outputs with human preferences. Designed to run efficiently on common GPUs, it also supports quantization techniques such as 8-bit and 4-bit precision, reducing resource requirements further. The model supports multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, making it a practical tool for handling a variety of text-based tasks.


Benchmarks: comparison with different models


From the comparison overview shared by Ollama, we can see that Llama 3.3 offers clear improvements over the earlier Llama 3.1 models. For general knowledge capabilities (MMLU), Llama 3.3 performs just as well as Llama 3.1 70B in zero-shot tests (86.0%) but does better in more complex 5-shot tests (68.9% vs. 66.4%). It also handles instructions more accurately, scoring 92.1% on IFEval, compared to 87.5% from Llama 3.1 70B.

In coding benchmarks, Llama 3.3 is stronger, with a pass@1 score of 88.4% on HumanEval, compared to Llama 3.1 70B’s 80.5%. It also improves on math problems, scoring 77.0% on the MATH benchmark versus 68.0% from Llama 3.1 70B. For reasoning tasks (GPQA Diamond), Llama 3.3 is slightly better at 50.5%, up from 48.0%.

Its multilingual abilities are impressive, with a score of 91.1% on MGSM, beating Llama 3.1 70B's 86.9%. Pricing is also unchanged from Llama 3.1, meaning developers get better performance at no extra cost.


Comparing Llama 3.3 to other LLM models


Compared to other large language models like Amazon Nova Pro, Gemini Pro 1.5, and GPT-4o, new Llama 3.3 holds its own. For general knowledge (MMLU), Llama 3.3 matches Amazon Nova Pro at 86.0% in zero-shot tests and does better than GPT-4o (85.9%). In following instructions (IFEval), Llama 3.3 scores 92.1%, tying with Amazon Nova Pro and doing better than GPT-4o (84.6%) and Gemini Pro 1.5 (81.9%).

For coding tasks, Llama 3.3 scores 88.4% on HumanEval, slightly behind Amazon Nova Pro (89.0%) but ahead of GPT-4o (86.0%). It also handles math problems well, scoring 77.0%, which is better than Amazon Nova Pro (76.6%) and GPT-4o (76.9%) but not as strong as Gemini Pro 1.5 (82.9%).

In multilingual assignments, Llama 3.3 stands out with 91.1%, ahead of OpenAI GPT-4o (90.6%) and Gemini Pro 1.5 (89.6%). Most importantly, Llama 3.3 is the most cost-effective option, offering the lowest price per input and output token, making it a great choice for developers who want both high performance and affordability. 


Accessibility


Llama 3.3 is built with accessibility in mind, enabling it to run efficiently on standard developer workstations without requiring specialized hardware. It incorporates quantization techniques, such as 8-bit and 4-bit precision, to reduce memory usage while maintaining strong performance. This adaptability allows developers to easily test and deploy it across a wide range of hardware setups, from single GPUs to large-scale distributed systems.


Open source and pricing


Llama 3 is completely free and open source, continuing the tradition of its previous versions. This approach sets Meta apart from proprietary competitors by making the model publicly accessible. However, while Llama 3 itself incurs no cost, using it through third-party vendors or self-hosted servers will involve fees, depending on the chosen infrastructure setup or services.


Conclusion


In conclusion, the new model offers similar performance to its predecessor, Llama 3.1, but stands out with its significantly lower hardware requirements, making it more accessible to a wider range of developers. Compared to other AI models in the market, Llama 3.3 is far more cost-effective, delivering high-quality results at a fraction of the price. While it currently focuses solely on text-based assignments and does not support image or voice processing, its efficiency and affordability make it an excellent choice for applications involving large-scale text generation and processing.