News

The Week in AI: GitHub Models, DeepMind's Gemma Scope, Stability AI's Stable Fast 3D, More

This edition of our weekly roundup of AI products and services includes Google DeepMind's Gemma Scope, GitHub's new GitHub Models, Arcee AI's DistillKit, Stability AI's Stable Fast 3D, and more.

Google DeepMind announced Gemma Scope, a comprehensive, open suite of sparse autoencoders for language model interpretability. Designed to help researchers understand "the inner workings of Gemma 2," Google's lightweight family of open models, Gemma Scope is a collection of hundreds of freely available, open sparse autoencoders (SAEs) for Gemma 2 9B and Gemma 2 2B. The company is also open sourcing Mishax, a tool we built that enabled much of the interpretability work behind Gemma Scope. "We hope today’s release enables more ambitious interpretability research," the DeepMind group said in a blog post. "Further research has the potential to help the field build more robust systems, develop better safeguards against model hallucinations, and protect against risks from autonomous AI agents like deception or manipulation."

Amazon announced a new advanced AI model to increase the accuracy of its "Just Walk Out" checkout-free service for retailers. Just Walk Out uses cameras, weight sensors, and a combination of AI technologies to enable shoppers in physical stores to buy things like food, beverages, and other merchandise without having to wait in a checkout line or stop at a cashier. The new multi-modal foundation model uses the same transformer-based machine learning models underlying many generative AI applications, and applies them to physical stores. The model analyzes data from cameras and sensors throughout a store simultaneously, instead of looking at which items shoppers pick up and put back in a linear sequence. "For retailers, the new AI system makes Just Walk Out faster, easier to deploy, and more efficient," said Jon Jenkins, VP of Just Walk Out Technology in the AWS Applications group, in a blog post. "For shoppers, this means worry-free shopping at even more third-party checkout-free stores worldwide." Just Walk Out was introduced in 2018.

GitHub launched GitHub Models, a new sandbox environment in which developers will be able to experiment with AI models from different providers. Developers will be able to access each model via a built-in playground that lets them test different prompts and model parameters for free right in GitHub. GitHub Models currently includes models from AI21 Labs, Cohere, Meta, Mistral, OpenAI, and Microsoft’s Phi-3. There's also a glide path to bring the models to the developer environment in Codespaces and VS Code. When the developer is ready to go to production, Azure AI offers built-in responsible AI, enterprise-grade security and data privacy, and global availability, with provisioned throughput and availability in more than 25 Azure regions for some models. GitHub CEO Thomas Dohmke promised in a blog post that "no prompts or outputs in GitHub Models will be shared with model providers, nor used to train or improve the models."

Stability AI introduced a new generative AI modeled called Stable Fast 3D, which was designed to transform a single input image into a detailed 3D asset in 0.5 seconds.  The new model "sets a new standard for speed and quality in the field of 3D reconstruction," the company said. It's built on the foundation of a model called TripoSR, a fast 3D reconstruction model, and features "significant architectural improvements and enhanced capabilities." The company is pitching the new model to both enterprises and indie developers in gaming and virtual reality, as well retail, architecture and design. The model is available now on GitHub, with model weights and demo space on Hugging Face. It was released under the Stability AI Community License.

Arcee AI, which specializes in creating AI-driven tools for business automation, introduced DistillKit, an open-source tool for the development and deployment of Small Language Models (SLMs). This release supports Arcee AI’s mission to make artificial intelligence more accessible and efficient for researchers, users, and businesses by providing user-friendly distillation methods. DistillKit focuses on model distillation, a process that transfers knowledge from large, resource-intensive models to smaller, more efficient ones. The primary objective of DistillKit is to produce smaller models that maintain the sophistication of their larger counterparts, optimized for use on devices with limited processing power, such as laptops and smartphones. This democratizes access to advanced AI, promoting energy efficiency and cost savings. With this tool Acree AI aims to extend advanced AI capabilities to a broader audience by reducing the computational resources needed for these models, the company said.

Agrilla, developer of a collaboration platform designed to streamline the model development process, announced Magpie-Ultra, a new synthetically generated dataset for supervised fine-tuning. It features 50,000 instruction-response pairs and utilizes the advanced Llama 3.1 405B-Instruct model and other Llama models, such as Llama-Guard-3-8B and Meta-Llama-3.1-8B-Instruct. The dataset covers a range of tasks, including coding, mathematics, data analysis, creative writing, advice-seeking, and brainstorming, offering challenging instructions and responses to enhance AI model training. This dataset is created with distilabel, Agrilla's framework for synthetic data and AI feedback for AI engineers, and the dataset’s creation follows the Magpie recipe, as outlined in the paper "Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing," the company said.

Israeli-based AI startup aiOla unveiled Whisper-Medusa, an open-source AI model that combines OpenAI's Whisper automatic speech recognition (ASR) system with its own Medusa multi-head attention architecture. The model is trained using weak supervision, which involves freezing the main components of OpenAI’s Whisper while training additional parameters using Whisper’s transcriptions of audio datasets as labels to train Medusa’s token prediction modules. Whisper-Medusa operates 50% faster than Whisper without sacrificing performance, the company says, because of its ability to predict ten tokens simultaneously, compared with Whisper’s one-at-a-time token prediction. (Tokens are units of data processed by algorithms.) This advancement significantly accelerates speech prediction speed and runtime, especially for long-form audios. aiOla currently offers a 10-head version of Whisper-Medusa and plans to release a 20-head version with equivalent accuracy in the future. Whisper-Medusa’s model weights and code are now available on Hugging Face and GitHub.

Featured