News
DeepSeek-AI Releases Open-Source Janus-Pro 7B Multimodal AI Model
- By John K. Waters
- 07/22/2025
Chinese AI research company DeepSeek-AI has released Janus-Pro, an open-source multimodal artificial intelligence model capable of both understanding and generating images from text prompts, positioning it as a competitor to proprietary models from OpenAI and Stability AI.
The model, available in 1 billion and 7 billion parameter versions, scored 80% on the GenEval text-to-image benchmark, outperforming DALL-E 3 (67%) and Stable Diffusion 3 Medium (74%), according to DeepSeek-AI's testing. The 7B variant achieved 79.2 on the MMBench multimodal understanding benchmark.
Janus-Pro uses separate visual encoders for understanding and generation tasks, addressing what DeepSeek-AI identified as inefficiencies in models that use shared encoders for both functions. The architecture employs SigLIP for semantic feature extraction and VQ tokenization for image-to-discrete representation conversion.
The model was trained on an expanded dataset including 72 million synthetic aesthetic samples and 90 million multimodal understanding datasets. DeepSeek-AI implemented a three-stage training process with extended pretraining duration and adjusted data ratios to improve convergence and performance.
On the DPG-Bench benchmark testing dense prompt handling, Janus-Pro scored 84.19, demonstrating its capability in processing complex text-to-image instructions with detailed semantic requirements.
The release adds to growing competition in open-source multimodal AI, where models combine text and image processing capabilities. Major technology companies, including Google, OpenAI, and Anthropic, have developed similar systems, though most remain proprietary.
DeepSeek-AI has made both model variants publicly available, continuing the company's pattern of releasing open-source AI research. The models are designed for applications including visual question answering, instruction-following, and creative content generation.
The company did not disclose computational requirements or training costs for the models. Janus-Pro builds on DeepSeek-AI's original Janus framework, which introduced the decoupled encoding approach but faced scalability and efficiency limitations that the new version aims to address.
About the Author
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].