Stability AI to Offer Image Generation -- Pure AI

Stability AI to Offer Image Generation

Stable AI expands into the image generation space.

By David Ramel
04/19/2023

Stable AI, the company behind the popular Stable Fusion generative AI tool, is expanding into the image generation space. The company has released the pre-trained model weights for Stable Diffusion, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

The model is being released under a Creative ML OpenRAIL-M license, a permissive license that allows for commercial and non-commercial usage. This license is consistent with the company's focus on the ethical and legal use of generative AI technology.

Stable Diffusion is often featured in lists of top AI image generators, such as Dall-E 2 and Midjourney, which create images based on natural language text inputs. What makes Stability AI's offerings unique is that they're open-source. The company is positioning Stable Diffusion as "a revolutionary image model that represents a transparent, open, and scalable alternative to proprietary AI."

Proprietary AI is seen by some as advancing too far too quickly in the drive for immediate profits on the part of Microsoft, Google, and others, prompting many in the industry to advocate for slowing down on the on creation of more advanced large language models (LLMs) until safety, legal, ethical, and other considerations can be addressed. While AI leaders like Microsoft have lately hidden the underlying details of their advanced AI systems -- a departure from the norm in research-centric projects -- Stability AI encourages developers to freely inspect, use and adapt the company's new StableLM base models for commercial or research purposes, subject to the terms of the project's CC BY-SA-4.0 license.

According to the StableLM: Stability AI Language Models GitHub repo, the new StableLM models, in the Alpha stage, can be used for:

Chit-Chat
Formal Writing
Creative Writing
Writing Code

Some example prompts listed by the company include:

"What would you say to a friend who is graduating high school?"
"Please write an email."
"Write an epic rap battle song between deep neural networks and symbolic AI."
"Write me a C program that computes the meaning of life."

"In 2022, Stability AI drove the public release of Stable Diffusion, a revolutionary image model that represents a transparent, open, and scalable alternative to proprietary AI," Stability AI said in an April 19 announcement. "With the launch of the StableLM suite of models, Stability AI is continuing to make foundational AI technology accessible to all. Our StableLM models can generate text and code and will power a range of downstream applications. They demonstrate how small and efficient models can deliver high performance with appropriate training.

"The release of StableLM builds on our experience in open-sourcing earlier language models with EleutherAI, a nonprofit research hub. These language models include GPT-J, GPT-NeoX, and the Pythia suite, which were trained on The Pile open-source dataset. Many recent open-source language models continue to build on these efforts, including Cerebras-GPT and Dolly-2."

The company noted "surprisingly high performance" in the StableLM model, even though it has far fewer parameters than the GPT-3 LLM from Microsoft partner OpenAI. That company has subsequently released GPT-4, but has not divulged its underlying details like the number of parameters used, which are key in machine language algorithms in that they determine how well models can learn from data and generalize to new tasks. While GPT-3 is said to use 175 billion parameters, StableLM uses only 3 to 7 billion parameters. And, though OpenAI has kept mum about the number of parameters used by GPT-4, industry speculation ranges from 1 trillion to even 100 trillion.

As far as the underlying data used to train StableLM -- a new, larger, experimental dataset built on The Pile open-source dataset -- Stability AI said it "will release details on the dataset in due course," noting that a full technical report is expected in the near future.

Stability AI also released a set of research models that it described as being instruction fine-tuned. "Initially, these fine-tuned models will use a combination of five recent open-source datasets for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT and HH," the company said. "These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford's Alpaca license."

Stability AI isn't the only company providing an alternative to proprietary, profit-driven AI initiatives, as Mozilla, an open-source champion famous for the Firefox web browser, recently announced a new startup, Mozilla.ai, for "trustworthy AI."

"We've learned that this coming wave of AI (and also the last one) has tremendous potential to enrich people's lives," said Mark Surman, president, and executive director of the Mozilla Foundation, in a March 22 blog post. "But it will only do so if we design the technology very differently -- if we put human agency and the interests of users at the core, and if we prioritize transparency and accountability. The AI inflection point that we're in right now offers a real opportunity to build technology with different values, new incentives and a better ownership model."

About the Author

David Ramel is an editor and writer at Converge 360.