AI Castaway
Battle of the AI Image Generators
A surreal, surprising and occasionally disturbing tour of the top AI-powered image generators in the market today, from Stable Diffusion to DALL-E to Gemini and beyond.
- By Ginger Grant
- 03/07/2024
As a person who cannot draw, I was happy to find out that AI could draw for me. Getting an AI model to create a picture using a text prompt is easy and, for the most part, free. But after trying out different generators for this column, one thing stands out: The quality of the images differs greatly from site to site.
The sites reviewed here are Stable Diffusion, Google Gemini, OpenAI DALL-E, Microsoft Copilot Designer, Craiyon, ImgCreator and Dream by Wombo. The information provided here will show you what each of their strengths and weaknesses are and what factors may influence you to use one over another. To test all of these different image generators, the following text was used to generate the images:
- Show me an image of a cat playing with a dog
- Show me an image of a family eating sandwiches at a picnic outdoors
- Show me an image of some people getting their fingernails manicured
To set a baseline, whenever I was given the option to choose an image style, I selected "photographic" or the equivalent.
Stable Diffusion
Stable Diffusion has a really good image generator. To access it, users are required to create a free account, which provides a limited set of options but more than enough to generate the images for this article. The different style options include cinematic, animation, line art, cyberpunk, photograph, pixel art, "GTA," papercraft, 3-D character, baroque and caricature.
As you can see in the image below, Stable Diffusion had problems determining what a dog was. I tried it twice to see if I could get a better picture.
Those definitely look like a cat and a kitten, without a dog in the frame. My second try got me closer to a cat and a dog, but the "dog" is more catlike if you look at the eyes and forehead than any dog I have ever seen.
It did do a reasonable job on the image of people at a picnic, but the arms and hands are not accurate on the two women and the sandwiches are a little off.
For the last image, the hands look pretty good for being AI-generated, although I do not think the algorithm is familiar with what a manicure is -- not to mention, the number of fingers is incorrect.
Overall, I was impressed with the quality of the photos created by Stable Diffusion, the ease of use and the number of free options. Just know that if you want an image that does not have a watermark (see the lower-right corner of each of the images above), you will need to pay.
Google Gemini
Like everything created by Alphabet, you need to have a Google ID to access Gemini. Google also has several terms of service documents that you have to agree with in order to generate a photo.
There has been a lot of coverage of Gemini's image generator, so I was surprised by the photos it generated. Its photos have the lowest resolution of any of the ones I created for this column, and their accuracy varied.
Based on the picture below, Gemini understood that I was asking for a picture of a cat and a dog and rendered them correctly. Interestingly, though, when I asked Gemini to show me a cartoon picture of a cat playing with a dog, I got clipart from Pinterest, which was not what I was expecting.
For the second image, Gemini was less successful. The family at the picnic only contained two people and no sandwiches. The faces and hands were pretty good, however.
Gemini also missed the mark with the third try. The image below was not what I asked for, as no one is getting a manicure here. The fingers are also a little off.
Unlike Stable Diffusion, Gemini offered no options for different styles of images.
OpenAI DALL-E
While there is technically a free version of OpenAI's DALL-E image generator, if you try to use it, you get the message, "DALL·E 2 Registration is Now Closed." To generate images, you need to upgrade to a Plus account, which is $20 a month. This provides you with the ability to access GPT-4, which allows you to generate images using DALL-E 3.
The images created by DALL-E were all high-resolution and accurately depicted what I asked. However, there's one thing that I did not like: When downloading DALL-E images, the default format is WebP, not JPG, which is the default format of most of the other tools.
Also, there is no easy way to change the style to be photorealistic. I just got cartoons, which I did not ask for. I tried changing the prompt from "Show me an image of a cat playing with a dog" to "Show me an image of a cat playing with a dog, photorealistic," but it still generated a cartoon. It also took longer to generate the images than most of the other tools here.
Case in point: This is an accurate depiction of a dog and cat playing, but rendered as a cartoon, which -- again -- was not the desired style.
The "photorealistic" version, below, still looks cartoonish.
The family in this image is also depicted in a cartoon style, and there are a lot more food choices than the requested sandwiches. One person is also missing a leg.
When it came to the third prompt, DALL-E 3 was the only image generator that showed me a salon when I asked for people getting a manicure, which was interesting; I expected to see a salon from all of them.
Once again, it is very cartoon-like, so I changed the prompt to request an image of a person getting their fingernails manicured close-up and photorealistic.
"Realistic" included a robot hand close-up. There is no nail polish on the brush, but there are also no out-of-place fingers -- so, a mixed bag.
Microsoft Copilot Designer
The Copilot Designer image generator can be accessed via the Microsoft Bing search engine. You can also access it via the Microsoft Edge browser by clicking on the Copilot icon. As Microsoft has paid for an exclusive license from OpenAI, Copilot Designer also uses DALL-E 3.
One thing to note is that with Copilot Designer, if you do not specifically use the word "generate" in your prompt, it will not create an image -- it will just select images from the Internet. Used correctly, though, it will generate four photos from a single prompt automatically. There are no style options, though; these four photos may either include multiple styles or they may all be in the same style. In my case, some were cartoons and some were photorealistic, like this one.
This is a pretty good image of a dog and cat playing. It also gave a pretty accurate visual for my family picnic request. They don't have a lot of other food besides sandwiches, but Copilot Designer did include french fries.
Bing also provides suggestions if you don't like the four pictures it generates, including things like:
- Add a frisbee to the image
- Make it look like they are near a lake
- Change the sandwiches to burgers
The manicure photo looked pretty good, giving me exactly what I asked it to generate. There are a few things that are a little off, including one set of two left hands, and the fact that the manicurist has oddly shaped fingernails and too many fingers.
Overall, though, the images were very accurate and contained the photorealistic style I wanted.
Craiyon
The Craiyon site is really easy to use, as you don't need to sign in or create a login to use the free version. However, I was not impressed with the images. It did not generate what I requested at all and -- like DALL-E -- only gives WebP-formatted files.
For starters, this is supposed to be a cat playing with a dog, but it was morphed into a clown-dog-cat (I think).
Meanwhile, the people in the picnic picture are downright creepy.
The manicure picture took several tries. The first one, below, did not match my request for an image of people getting their fingers manicured, so I generated another image.
The second image was not an improvement. The nails are inaccurate, there is no thumb and the hand is misshaped.
ImgCreator
Like Stable Diffusion, ImgCreator lets you pick the style of picture you want -- anything from anime, magic journey, realistic photo, freeform, vector illustration, art, character and 3-D design. I selected realistic photo for my tests.
You can also change the aspect ratio to 9x16, 1:1, 3:4 or something else, as well as change the output resolution to 640 px, 1024 px or 2048 px (but those are premium features; the free account can't do that). The paid accounts are $4.90 to start, but you have to pay for a year upfront. The only way you can download photos made by ImgCreator is to pay for an account.
I declined to do this because the pictures were not very good. When asking it to draw a cat playing with a dog, there were no dogs in the photo and some of the cats were disembodied heads.
Besides an image generator, the ImgCreator site also offers a picture editor -- which, to be honest, might be more marketable.
Dream by Wombo
A free registration is required to use Wombo. The free version does offer a lot of different style options, including dreamland, retro pop, sketch and cartoon, to name a few. It also provides a phone-sized picture for each request.
All that being said, I cannot figure out why, for my first prompt, I got three cats instead of a dog.
For the picnic image, the people looked a little off, as do the sandwiches.
It also stumbled in my manicure prompt. While the below does count as showing manicured hands, it does not represent a person getting a manicure and the number of fingers is a little off.
Conclusion
Going into this experiment, I expected that my third prompt ("Show me an image of some people getting their fingernails manicured") would be the one that image generators would have the most difficulty reproducing correctly, and that was true. What surprised me was their inability to distinguish a cat from a dog.
Overall, though, I was surprised to see how easy it was to create many different AI-generated pictures, and mostly for free. The only one where there was a cost involved was OpenAI's DALL-E 3. However, since Copilot Designer uses the same model, I don't see a reason to pay for OpenAI's version.
As far as results go, with most of the AI tools I tried here, you'll get different results depending on what and how you ask, so changing the request to be more detailed will probably provide better results. Gemini and DALL-E 3 (which straddles both OpenAI and Copilot Designer) were the most accurate, though Gemini's pictures are lower-resolution than DALL-E's and neither of them lets you select a specific style.
Of all of them, my favorite is Copilot Designer because there is no cost to use it and it was the most accurate. If I was trying to generate art, however, I would use Stable Diffusion, as I liked the variety of different styles you could pick from and it was pretty accurate.
About the Author
Ginger Grant is a Data Platform MVP who provides consulting services in advanced analytic solutions, including machine learning, data warehousing, and Power BI. She is an author of articles, books, and at DesertIsleSQL.com and uses her MCT to provide data platform training in topics such as Azure Synapse Analytics, Python and Azure Machine Learning. You can find her on X/Twitter at @desertislesql.