News

Last Week in AI: Announcements from Neural Magic, Zilliz, Rabbit, aiOla, DeepSeek, and more.

This edition of our roundup of AI products and services announced last week that didn't make splashy headlines, but should be on your radar, includes Neural Magic's LLM Compressor, the Zilliz Cloud update, new features in Rabbit's r1 device, aiOla's Whisper-NER, and more.

Neural Magic unveiled the LLM Compressor, an advanced optimization tool designed to accelerate the performance of large language models (LLMs) through state-of-the-art compression techniques. The tool integrates fragmented model compression methods into a unified library, simplifying workflows for the deep learning community. LLM Compressor consolidates algorithms like GPTQ, SmoothQuant, and SparseGPT into a single framework, enabling developers to compress models efficiently while maintaining high accuracy. This optimization reduces inference latency and is especially suited for production environments. Key advancements include support for activation and weight quantization, leveraging INT8 and FP8 tensor cores for GPUs with NVIDIA’s Ada Lovelace and Hopper architectures. By optimizing compute-bound workloads, the LLM Compressor delivers up to a twofold performance boost for inference tasks under heavy server loads. For instance, a compressed Llama 3.1 70B model achieves near-uncompressed latency performance on two GPUs versus four.

Zilliz , a startup specializing in vector databases for AI apps, unveiled an updated version of its managed platform, Zilliz Cloud designed to offers significant performance improvements and enhanced features to simplify AI development. Zilliz Cloud, a paid, cloud-based version of the open-source Milvus database, is optimized to store embeddings—mathematical structures used by AI models to represent and process data. The new version promises up to a 10-fold improvement in query processing speed, making it particularly valuable for use cases like recommendation systems, retrieval-augmented generation (RAG), and image searches. The upgraded platform introduces a feature called AutoIndex, which automates the creation of database indexes, saving developers time. Additionally, it supports advanced search options, including hybrid searches that combine keyword filtering with similarity matching. The new release also includes a specialized unified IVF- and graph-based index, designed to narrow the data that AI models must analyze, as well as performance enhancements tailored to Intel and Arm processors.

Rabbit, maker of pocket-sized AI mobile device, introduced "teach mode," a new feature that lets users train its AI assistant to perform tasks on websites and digital interfaces. The feature, now in beta testing, allows the device to navigate websites, check information, and operate a variety of digital systems by learning from user demonstrations. Rabbit's r1 device, launched earlier this year, is powered by Rabbit OS and features a unique AI model called a large action model (LAM). The r1 allows users to perform tasks like purchasing plane tickets or transcribing text using natural language commands. The new teach mode expands its capabilities, enabling users to teach the AI to automate custom actions without requiring coding or advanced technical knowledge.

AI startup aiOla unveiled Whisper-NER, an open-source AI model that combines speech-to-text transcription with Named Entity Recognition (NER). The model transcribes spoken content while simultaneously identifying key entities such as names, dates, and specialized terminology, offering real-time contextual understanding. Built on OpenAI’s Whisper architecture, Whisper-NER leverages transformer technology to provide accurate transcription and entity recognition in a single step. Designed for privacy-conscious applications, it includes features for real-time data redaction, making it ideal for industries such as healthcare, customer service, and legal services where sensitive information must be protected. The company has made the model available as open-source software, encouraging developers and researchers to customize and innovate further.

Chinese AI  startup DeepSeek launched a preview of its first reasoning model, DeepSeek-R1, which it claims rivals OpenAI's o1 large language model in solving math and science problems with enhanced accuracy. Reasoning models like DeepSeek-R1 are designed to address a common shortfall of traditional large language models (LLMs): hallucinations. These models rely on techniques such as "chain of thought" (CoT), which breaks down complex problems into smaller steps, enabling more accurate responses. DeepSeek-R1’s approach also provides transparency, allowing users to follow its reasoning process step-by-step.

Elastic announced the integration of Amazon Web Services (AWS) generative AI services into Elastic Observability. The collaboration includes large language model (LLM) observability support for Amazon Bedrock, AWS’s fully managed service that offers foundation models from leading AI providers via a single API. The integration allows site reliability engineers (SREs) to monitor the performance and usage of Amazon Bedrock-powered LLMs through Elastic Observability. Key metrics such as invocations, errors, and latency can now be tracked, enabling SREs to proactively prevent issues, diagnose root causes, and optimize generative AI application performance. Additionally, Elastic AI Assistant, powered by Amazon Bedrock, aids SREs in analyzing data, creating visualizations, and resolving issues with actionable recommendations.

NVIDIA unveiled the Generative AI Red-teaming & Assessment Kit (Garak), a new tool aimed at identifying and mitigating vulnerabilities in large language models (LLMs). The comprehensive framework automates the assessment process, combining static and dynamic analyses with adaptive testing to provide a holistic evaluation of AI system security. Garak employs a three-step methodology: vulnerability identification, classification, and mitigation. The tool uses static analysis to inspect model architecture and training data, dynamic analysis to simulate interactions via diverse prompts, and adaptive testing to iteratively refine its processes and uncover hidden weaknesses. Vulnerabilities are categorized based on severity, and the tool offers mitigation strategies such as retraining models, refining prompts, and implementing content filters.

Blockchain project Morpheus officially launched its decentralized AI compute network, aiming to empower developers and users with greater control over generative AI technologies. Morpheus offers an open-source, peer-to-peer platform for running generative AI using blockchain technology. Described as a "Linux-type alternative for developers," the Morpheus platform enables quick deployment of large language models (LLMs) at no cost. Users retain control over their data, minimizing risks such as data leaks or hacks. The network also connects directly to blockchain and cryptocurrency infrastructures, allowing developers to integrate AI with decentralized finance tools, cryptocurrency transactions, and token exchanges. Morpheus operates on the Arbitrum blockchain, a leading Ethereum layer 2 scaling solution. The platform incentivizes developers and users with MOR tokens, rewarding those who build applications and provide compute power by running network nodes to process user queries.

Featured