News

The Week in AI: OpenAI Fine Tuning for GPT-4o, Gemini Update, Alibaba's Quen2-VL, More

It was a short week, thanks to the long Labor Day weekend, but the indefatigable entrepreneurs driving the AI revolution didn't seem to notice. Here are some of the products and services they announced last week while your attention was turned to the barbecue.

Google announced updates to its Gemini AI assistant, introducing new image generation and customization capabilities that were previewed at the Google I/O event in May. These enhancements will be available in both the consumer and business versions of Gemini. Among the updates is the introduction of Imagen 3, a new image generation model designed for creating photorealistic images and handling complex user instructions. Users can refine images through follow-up prompts if the initial output doesn’t meet expectations. Google also said it is reactivating Gemini's capability to generate images of people, a feature previously disabled due to accuracy concerns. The reintroduction comes with enhanced guardrails to prevent the creation of harmful content. The new "Gems," a feature allows users to create customized versions of Gemini tailored to specific tasks. Several premade Gems are available, focusing on tasks like code troubleshooting and writing assistance.

Google Researchers also announced the development of a GameNGen, an AI-powered gaming engine capable of real-time environment generation with a neural model. According to Google, GameNGen can generate complex environments at very high frame rates The company says it can simulate the classic 1993 video game Doom at over 20 frames per second on a single Tensor Processing Unit (TPU). The announcement was made through a paper published on the pre-print platform arXiv and detailed in a GitHub listing. The creation of such a game engine is a significant achievement, requiring the system to generate both 2D and 3D environments swiftly and logically, maintaining level progression. GameNGen's training involved two processes: first, it was trained using Stable Diffusion v1.4, and a novel method was employed to mitigate auto-regression drift by adding Gaussian noise to encode the frames. The second process utilized automatic reinforcement learning (RL) agents to collect data, as using human players at scale would have been impractical. Currently, the AI-powered game engine is not available for public use, and the research paper has yet to undergo peer review, leaving some claims unverified.

Alibaba Cloud announced the release of the latest version of its vision-language model, Qwen2-VL, which comes with some significant advancements over its predecessor, Qwen-VL. This new model delivers state-of-the-art performance in visual understanding, the company says, and its capable of handling images of various resolutions and aspect ratios. It also excels at complex tasks, such as MathVista and DocVQA. Qwen2-VL also supports video comprehension for clips over 20 minutes long, enabling high-quality video-based question answering and content creation. The model was designed to integrate with mobile devices and robots, facilitating complex reasoning and decision-making. It also offers multilingual support, including languages like Japanese, Korean, Arabic, and many European languages. Qwen2-VL is available in three versions: a 2B model optimized for mobile use, a 7B model offering competitive performance in document and multilingual comprehension, and the flagship 72B model, which surpasses many closed-source alternatives in visual understanding tasks, the company says. The 2B and 7B models are open-sourced under the Apache 2.0 license, with integration into popular frameworks such as Hugging Face Transformers and vLLM, while the 72B model is accessible via an API.

OpenAI launched a new fine-tuning capability for GPT-4o, calling it one of the most requested features from developers. Developers can now fine-tune GPT-4o with custom datasets to get higher performance at a lower cost for their specific use cases, the company says. Fine-tuning enables the model to customize structure and tone of responses, or to follow complex domain-specific instructions. Developers can already produce strong results for their applications with as little as a few dozen examples in their training data set. GPT-4o fine-tuning is available now to all developers on all paid usage tiers. GPT-4o mini fine-tuning is also available to all developers on all paid usage tiers.

IBM and Intel have announced a new collaboration to deploy Intel Gaudi 3 AI accelerators as a service on IBM Cloud, with availability expected in early 2025. This initiative aims to provide enterprises with cost-effective AI scaling solutions that prioritize security and resiliency. IBM Cloud will be the first cloud service provider to adopt Gaudi 3, which will be integrated into both hybrid and on-premises environments and supported within IBM's watsonx AI and data platform. The two companies are partnering to address the growing demand for affordable and high-performance AI computing solutions, they said in a statement. By combining Gaudi 3 AI Accelerators with Intel's 5th Gen Xeon CPUs, the offering will enhance enterprise AI workloads in the cloud and data centers. This collaboration also emphasizes the importance of an open and collaborative ecosystem for AI development. The Gaudi 3 service on IBM Cloud is expected to optimize the cost and performance of AI inferencing workloads, providing enterprises with scalable, secure, and flexible solutions. The collaboration continues the long-standing partnership between IBM and Intel, furthering their joint commitment to innovation in AI and cloud computing. The Gaudi 3-powered IBM Cloud offerings will be generally available in early 2025.

The Plaud NotePin, an AI wearable device designed to record and transcribe conversations, is now available for pre-order. While Plaud does not use an in-house AI model, it offers a selection of AI models, including OpenAI's GPT-4o and Claude 3.5 Sonnet, for enhanced functionality. The basic AI features are available for free. Originally gaining popularity on TikTok with its AI-powered recording app, Plaud has expanded its offerings with this device, which can be worn on the wrist, as a necklace, or as a tie-pin. The NotePin, measuring 51x21x11mm and weighing just 25g, features 64GB of storage and a 270mAh battery. It is equipped with two MEMS microphones, ensuring clear audio capture. The device comes with various accessories, including a magnetic pin, clip, lanyard, wristband, charging dock, and USB Type-C charging cable. The AI within the NotePin transcribes recorded conversations onto more than 20 professional templates and custom formats for saving data. Users can also access advanced features such as summary templates and speaker labeling with a $79 annual subscription. Additionally, users can search and interact with transcriptions, prompting the AI to share specific data.

 

Featured