The open source DALL-E Mini software isn’t perfect, but sometimes it does offer images that match people’s textual descriptions.
As you browse your social media feeds lately, you’ve likely noticed a few illustrations with captions. They are popular now.
The images you see are probably made possible by a text-to-image conversion program called DALL-E. Before posting illustrations, people insert words, which are then converted into images using artificial intelligence models.
For example, a twitter user posted a tweet with the text “To be or not to be, a rabbi holds a lawyer, marble sculpture.” The attached photograph, quite elegant, shows a marble statue of a bearded man in a robe and a bowler hat, holding a lawyer.
The AI models come from Google Imagen software as well as OpenAI, the Microsoft-backed startup that developed DALL-E 2. On its website, OpenAI calls DALL-E 2 “a new AI system that can create realistic images and art from the description in natural language.
But much of what happens in this space comes from a relatively small group of people who share their photos and, in some cases, generate active participation. Indeed, Google and OpenAI have not made this technology widely available to the public.
Many of the early users of OpenAI are friends and family of employees. If you would like to access it, you must register on the waitlist and indicate whether you are a professional artist, developer, academic researcher, journalist, or online creator.
“We are working hard to speed up access, but it will likely take some time to reach everyone; as of June 15, we have invited 10,217 people to try DALL-E,” OpenAI’s Joan Jang wrote on the help page on the company’s website.
The public system is DALL-E Mini. it is based on the open source code of a loosely organized development team and is often overloaded with requests. Attempts to use may be met with a dialog box with the message “Too much traffic, please try again”.
It’s a bit like Google’s Gmail service, which in 2004 lured people in with unlimited email storage. Early adopters could enter by invitation only at first, leaving millions of people waiting. Gmail is one of the most popular email services in the world today.
Creating images from text may never be as common as email. But the technology definitely has its moment, and part of its appeal is exclusivity.
Midjourney’s private research lab requires people to fill out a form if they want to experiment with its image-generating bot from a channel on the Discord chat app. Only a select group of people use Imagen and post images from it.
The text-to-image summation services are complex, identifying the most important parts of the user’s clues and then guessing the best way to illustrate those terms. Google trained its Imagen model with hundreds of its internal AI chips on 460 million internal image-text pairs in addition to external data.
The interfaces are simple. There is usually a text area, a button to start the build process, and an area below to display images. To indicate the source, Google and OpenAI add watermarks to the lower right corner of images from DALL-E 2 and Imagen.
Software companies and groups are rightly concerned that everyone is knocking on the door at once. Managing web requests to make requests against these AI models can be costly. More importantly, models are not perfect and do not always produce results that accurately represent the world.
The engineers trained the models on large collections of words and images from the Internet, including photos posted on Flickr.
San Francisco-based OpenAI recognizes the potential harm that can come from a model that has learned to create images by essentially browsing the web. To try to mitigate the risk, employees have removed violent content from training data, and there are filters in place to prevent DALL-E 2 from generating images if users submit requests that could disrupt business. policy against nudity, violence, conspiracies or political content.
“There is an ongoing process of improving the security of these systems,” said OpenAI researcher Prafulla Dhariwal.
Outcome biases are also important to understand and present a broader problem for AI. Boris Daima, a developer from Texas, and others who worked on the DALL-E Mini explained the problem in their software description.
“Professions showing a higher level of education (for example, engineers, doctors, or scientists) or high physical labor (for example, in the construction industry) are predominantly represented by white men,” they wrote. “On the other hand, nurses, secretaries or assistants are usually women, often also white. »
Google described similar flaws in their Imagen model in an academic paper.
Despite the risks, OpenAI is excited about what the technology can do. Dhariwal said it could open up creative possibilities for people and could help with commercial home decor applications or websites.
Results should continue to improve over time. DALL-E 2, introduced in April, produces more realistic images than the original version announced by OpenAI last year, and the company’s text generation model, GPT, has grown more complex with each build.
“You can expect this to happen with many of these systems,” Dhariwal said.
WATCH: Former president. Obama fights disinformation, says AI could get worse