The Week in AI: Intel's RAG Foundry, Mistral AI's Open Weight Models, More -- Pure AI

The Week in AI: Intel's RAG Foundry, Mistral AI's Open Weight Models, More

By Pure AI Editors
08/12/2024

This edition of our weekly roundup of AI products and services includes Mistral AI's new open-weight language models, Intel's new RAG Foundry framework, Hugging Face's new inference-as-a-service offering, the new Qwen 2-Math series, and more!!

Intel Labs introduced RAG Foundry, an open-source framework designed to address the challenges inherent in Retrieval-Augmented Generation (RAG) systems. Built on existing contributions in the field, RAG Foundry integrates data creation, training, inference, and evaluation into a unified workflow, enabling rapid prototyping, dataset generation, and model training using specialized knowledge sources. RAG Foundry's modular structure allows for extensive customization and isolated experimentation across various RAG aspects, including data selection, retrieval, and prompt design. This flexibility aims to overcome common challenges in RAG systems, such as evaluation difficulties, reproducibility issues, and the inherent complexity of integrating large language models (LLMs) with retrieval mechanisms.

Hugging Face introduced an inference-as-a-service offering that leverages NVIDIA NIM to provide developers with streamlined access to accelerated AI model inference. The service, powered by NVIDIA DGX Cloud and optimized through NVIDIA NIM microservices, enables the rapid deployment of leading large language models such as Llama 3 and Mistral. This new capability allows developers to quickly prototype and deploy AI models hosted on the Hugging Face Hub, using scalable GPU resources tailored for generative AI tasks. Access requires an Enterprise Hub organization and a fine-grained token for authentication. Initially, the service supports chat.completions.create and models.list APIs, and there are plans to expand API offerings and model support. Billing is based on compute time, utilizing NVIDIA H100 Tensor Core GPUs. Additionally, Hugging Face is collaborating with NVIDIA to integrate the NVIDIA TensorRT-LLM library into its Text Generation Inference framework, enhancing AI inference performance. Hugging Face also offers a separate AI training service, Train on DGX Cloud.

Qwen announced the launch of its Qwen 2-Math series, a collection of AI models designed to tackle complex mathematical challenges. The series features six models, each optimized for different computational needs:

Qwen 2-Math-72B
Qwen 2-Math-72B-Instruct
Qwen 2-Math-7B
Qwen 2-Math-7B-Instruct
Qwen 2-Math-1.5B
Qwen 2-Math-1.5B-Instruct

At the top of the range, the Qwen 2-Math-72B model, with 72 billion parameters, is tailored for advanced mathematical computations requiring deep learning and extensive data processing. The “Instruct” versions across all models are enhanced to follow user instructions with greater precision. The series also includes more accessible options, such as the Qwen 2-Math-7B and Qwen 2-Math-1.5B, catering to users with varying needs in computational power and efficiency. Each model builds on the previous Qwen architecture, incorporating new deep learning techniques, natural language processing, and symbolic reasoning to solve a wide range of mathematical problems.

Mistral AI announced the release of three new open-weight language models: Mistral NeMo, Codestral Mamba, and Mathstral. Mistral NeMo, a 12-billion parameter general-purpose language model, boasts a 128k token context window and supports 11 languages, including Chinese, Japanese, and Arabic. It outperforms similar models on benchmarks like MMLU and Winogrande. Codestral Mamba, a 7-billion parameter code-generation model, is based on the Mamba architecture, which provides faster inference and theoretically infinite context length. Mathstral, also a 7-billion parameter model, is fine-tuned for STEM subjects and was developed in collaboration with Project Numina, achieving high scores on MMLU and MATH benchmarks. All three models are available under the Apache 2.0 license and can be downloaded from Huggingface or via Mistral's SDK. Mistral NeMo and Codestral Mamba are accessible through Mistral AI's la Plateforme API, with additional deployment options available through NVIDIA and TensorRT-LLM.

Scamnetic, an AI-powered scam detection startup, announced that its flagship platform is now available to software providers via an application programming interface (API). The platform is designed to protect users from a wide range of scams in real-time by scanning communications across multiple channels, including email, text messages, chat platforms, and phone calls. Scamnetic's platform leverages AI to detect fraudulent communications. The platform features three core tools: Scan&Score, which evaluates the risk of any communication; IDeveryone, which verifies the identity of counterparties; and Scam Intervention, offering 24/7 support to scam victims. Scamnetic CEO Al Pascual emphasized the importance of AI in countering AI-driven scams, stating that traditional detection methods are no longer sufficient. The startup plans to roll out a direct-to-consumer offering later this year, following the API integration release.

Vaadin, provider of an open-source web app dev platform for Java developers, announced the release of version 24.4. The new release introduces enhancements aimed at improving the developer experience. The update includes the integration of the Hilla framework with the Vaadin platform and the introduction of Vaadin Copilot, an AI-powered development tool designed to streamline UI creation. Vaadin Copilot was designed to enable developers to drag and drop components, reorganize layouts, and edit labels and captions in real-time. The tool is integrated with supported IDEs, automatically updating source code as changes are made. Additionally, Copilot’s generative AI capabilities allow for the modification of UI components based on user prompts, and its Theme Editor offers an easy way to adjust application themes without altering CSS. However, Vaadin Copilot is currently limited to views built in Hilla/React. Version 24.4 also marks the unification of Vaadin's platform with the inclusion of the Hilla framework in the Vaadin BOM and Vaadin Spring Boot starter. This integration allows for the creation of hybrid applications combining Flow and Hilla views. Developers can now embed Flow components into Hilla/React views and vice versa, fostering greater flexibility in application design.

A collaborative team of researchers from leading institutions, including KAUST, Carnegie Mellon University, Stanford University, and Oxford University, introduced the Crab framework, a novel benchmarking tool designed to assess the performance of autonomous agents in complex, cross-environment tasks. The Crab framework addresses limitations in traditional benchmarks by enabling the evaluation of agents across multiple platforms, such as desktops and mobile devices, and incorporating a graph-based method that provides a more nuanced assessment of agent capabilities. The Crab framework's innovative approach decomposes complex tasks into manageable sub-tasks, offering a detailed evaluation at multiple stages of task execution. This allows for a more accurate reflection of real-world conditions and agent performance, the team says. The framework was tested using advanced multimodal language models, including GPT-4o and Claude 3 Opus, which revealed significant insights into the strengths and challenges faced by current autonomous agents. This new tool is expected to advance research in autonomous agents by providing a more comprehensive and realistic evaluation framework, paving the way for improved human-computer interaction across diverse environments.