Snowflake joins Meta to optimize new model family in Snowflake Cortex AI

snowflakes | Snowflake joins Meta to optimize new model family in Snowflake Cortex AI

Key Takeaways

Snowflake is hosting the Llama 3.1 multilingual large language models, including Meta's largest open-source LLM, Llama 3.1 405B, in Snowflake Cortex AI to support real-time AI applications at scale.

The optimized Llama 3.1 405B model enables real-time inference with significantly lower latency and higher throughput, supporting fine-tuning on a single GPU node, thus reducing costs and complexity for developers.

The partnership between Meta and Snowflake emphasizes trust and safety in AI, while also open-sourcing the LLM Inference and Fine-Tuning System Optimization Stack to promote innovation within the AI community.

Larger model scale and memory requirements pose significant challenges for users pursuing low-latency inference for real-time use cases, cost-effectiveness and long context support for enterprise-grade generative AI use cases. 

Led by such constraints, Snowflake has announced that it will host the Llama 3.1 collection of multilingual open-source large language models (LLMs) in Snowflake Cortex AI for organizations to harness and build powerful AI applications at scale. 

This offering includes Meta’s largest open-source LLM, Llama 3.1 405B, with Snowflake developing and open-sourcing the inference system stack to enable real-time, high-throughput inference and further democratize natural language processing and generation applications. 

Explore related questions

Joining forces in this partnership, Meta and Snowflake aim to provide users with easy, efficient and trusted ways to access, fine-tune and deploy Meta’s newest models in the AI Data Cloud, with a comprehensive approach to trust and safety built-in at the foundational level.

Snowflake’s AI Research Team has optimized Llama 3.1 405B for both inference and fine-tuning, supporting a 128K context window while enabling real-time inference with up to three times lower end-to-end latency and 1.4 times higher throughput than existing open-source solutions.

This promises to allow for fine-tuning of the model using just a single GPU node – eliminating costs and complexity for developers and users within Cortex AI. 

Snowflake’s AI Research Team also continues to push the boundaries of open-source innovations through its regular contributions to the AI community and transparency of the way it is building LLM technologies. 

Along with the launch of Llama 3.1 405B, Snowflake’s AI Research Team is now open-sourcing its LLM Inference and Fine-Tuning System Optimization Stack in collaboration with DeepSpeed, Hugging Face, vLLM and the broader AI community.

“As a leader in the customer engagement and customer data platform space, Twilio’s customers need access to the right data to create the right message for the right audience at the right time,” said Kevin Niparko VP, product and technology strategy, Twilio Segment. 

“The ability to choose the right model for their use case within Snowflake Cortex AI empowers our joint customers to generate AI-driven, intelligent insights and easily activate them in downstream tools. In an era of rapid evolution, businesses need to iterate quickly on unified data sets to drive the best outcomes.”

Matthew Scullion, Matillion CEO and co-founder, said: “The upcoming addition of Llama 3.1 gives our team and users even more choice and flexibility to access the large language models that suit use cases best and stay on the cutting-edge of AI innovation. Llama 3.1 within Snowflake Cortex AI will be immediately available with Matillion on Snowflake’s launch day.”