Microsoft Simplifies AI Image Generation in DALL-E 2
Microsoft simplies DALL-E 2 image generation for its apps.
Microsoft has simplified the process of AI-driven image generation in DALL-E 2, allowing users to produce their own images through its range of consumer applications, the company announced. The company has taken steps to facilitate the use of DALL-E 2 within applications such as the "new Bing" search site and a preview version of Microsoft Designer.
Microsoft partner OpenAI developed DALL-E, a generative AI technology that enables users to create new images with text-to-graphics prompts. OpenAI is also responsible for the highly advanced ChatGPT chatbot based on the GPT series of large language models. OpenAI recently introduced GPT-4.
DALL-E 2 is a sophisticated AI system capable of generating lifelike images and artwork based on natural language descriptions, thanks to breakthroughs in natural language processing (NLP). It even allows for the combination of various concepts, attributes, and styles specified through text commands. However, when trying to generate images on the DALL-E website, users are often met with the message "The server is currently overloaded with other requests."
Microsoft has addressed this problem first in two applications.
Microsoft Designer (Preview)
Microsoft Designer aims to assist users in creating high-quality social media posts, invitations, digital postcards, graphics, and more.
Using Microsoft Designer is straightforward. Initially, users must visit the website, provide their email address, and click the "Join the waitlist" button. Once accepted, users can return to the site and select "Add image" to work with an existing image from their device or "Generate image" to describe the desired image for AI to create. The tool appears to focus on social media and marketing, automatically incorporating branding text into generated images regardless of their intended purpose. However, users have the flexibility to customize all text and image elements after generation.
Earlier this year, Microsoft published guidance titled "How to use AI image prompts to generate art using DALL-E," applicable to both the DALL-E site and Designer. The guidance highlights the advantage of using Designer as a graphic design app, offering not only unique images based on user-provided ideas but also the ability to add additional design elements like text and graphics, along with AI-powered editing for seamless integration into the design.
The guidance also advises users to be specific by providing ample adjectives and other details, as well as directive instructions. Instead of simply instructing the AI to create an image in an "oil painting" style, users should specify requests like an "oil-on-canvas masterpiece by Caravaggio from 1599."
On the other hand, there are certain pitfalls to avoid, including:
- Complex scenes featuring multiple subjects
- Detailed layout requests (e.g., "A big red Object X on the left, friendly Object Y on the right, a small Object Z wearing Item A above them")
- Images with multiple faces (as these often result in distortion)
- Requests for text (e.g., "a sign saying, 'Happy birthday!'") since the generator struggles with text generation.
Microsoft has also adopted the "Copilot" label from GitHub Copilot, an "AI pair programmer" tool, to infuse AI technology into its entire product portfolio. The Designer site suggests that its name may change to "Designer Copilot" after the preview stage.
Bing Image Creator
Although Microsoft's "new Bing" search experience has been powered by OpenAI's GPT-4 LLM for some time, the company recently introduced Bing Image Creator a couple of weeks ago.
In a March 21 announcement, Microsoft stated, "We're excited to announce we are bringing Bing Image Creator, new AI-powered visual Stories, and updated Knowledge Cards to the new Bing and Edge preview." Powered by an advanced version of the DALL-E model from OpenAI, Bing Image Creator allows users to generate images simply by describing the desired picture using their own words. This feature enables the creation of both written and visual content within a single platform, directly within the chat interface. Note that using the image generator requires switching to Creative mode.
The functionality is also expected to be available in the Edge browser, accessible via an Image Creator icon in the sidebar. However, this feature may not be immediately visible to some users, even after updating Edge or downloading an Insider build.
Microsoft emphasizes that it has been gathering real-world feedback and is gradually rolling out Bing Image Creator to preview users before expanding its availability. Initially, Image Creator will be limited to Creative mode within Bing chat, with plans to make it accessible in Balanced and Precise modes over time. The company is also working on optimizing Image Creator for use in multi-turn chats.
Using Image Creator is straightforward. For instance, entering a request to generate an image of "an astronaut walking through a galaxy of sunflowers" yields various image options.
Compared to the often-overloaded DALL-E site, utilizing Microsoft's tools in Bing, Microsoft Designer, and other consumer apps offers more immediate results.
David Ramel is an editor and writer for Converge360.