The ridiculed Stable Diffusion 3 release excels at AI-generated body horror

Enlarge / An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

On Wednesday, Stability AI released weights for Stable Diffusion 3 Medium, an AI image synthesis model that converts text prompts into AI-generated images. However, its arrival has been ridiculed online, as it generates images of people in a way that appears to be a step back from other advanced image synthesis models such as Midjourney or DALL-E 3. It can produce wild, anatomically incorrect visual horrors with ease.

A thread on Reddit titled: “Is this release meant as a joke? [SD3-2B]”, describes SD3 Medium’s spectacular failures at rendering people, especially human limbs like hands and feet. Another thread, titled: “Why is SD3 so bad at generating girls lying on the grass?” shows similar problems see, but for entire human bodies.

Hands have traditionally been a challenge for AI image generators due to a lack of good examples in early training datasets, but recently several image synthesis models seemed to have solved this problem. In that sense, SD3 seems to be a huge step backwards for the image synthesis enthusiasts gathering on Reddit, especially compared to recent Stability releases like SD XL Turbo in November.

“It wasn’t that long ago that StableDiffusion was competing with Midjourney, now it just seems like a joke. At least our datasets are safe and ethical!” wrote one Reddit user.

An AI-generated image created using Stable Diffusion 3 Medium.
An AI-generated image created using Stable Diffusion 3 of a woman lying in the grass.
An AI-generated image created with Stable Diffusion 3 showing mangled hands.
An AI-generated image created using Stable Diffusion 3 of a woman lying in the grass.
An AI-generated image created with Stable Diffusion 3 showing mangled hands.
An AI-generated SD3 Medium image that a Reddit user created with the prompt “Woman wearing a dress on the beach.”
An AI-generated SD3 Medium image that a Reddit user created with the prompt “photo of a person taking a nap in a living room.”

AI graphics fans blame Stable Diffusion 3’s anatomical errors so far on Stable’s insistence on filtering out adult content (often called “NSFW” content) from the SD3 training data that teaches the model how to generate images. “Believe it or not, heavily censoring a model also takes away the human anatomy, so… that’s what happened,” one Reddit user wrote in the thread.

Basically, every time a user asks about a concept that is not well represented in the AI model’s training dataset, the image synthesis model will come up with the best interpretation of what the user is asking for. And sometimes that can be completely frightening.

The 2022 release of Stable Diffusion 2.0 suffered from similar problems in properly portraying humans, and AI researchers quickly discovered that censoring adult content that contains nudity could hinder an AI model’s ability to accurately could seriously hinder human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some of the skills lost by heavily filtering NSFW content.

Another problem that can arise during model pretraining is that the NSFW filter that researchers use to remove images of adults from the dataset is sometimes too picky, accidentally removing images that may not be offensive and affecting the model. images of people are denied in certain situations. “[SD3] works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data has decided that anything humanoid is nsfw,” one Redditor wrote on the topic.

Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to others. For example, the prompt “a man showing his hands” displayed an image of a man holding up two giant backward hands, even though each hand had at least five fingers.

We generated an SD3 Medium example with the prompt ‘A woman lying on the beach’.
We generated an nSD3 Medium example with the prompt “A man showing his hands.”

Stability AI
We generated an SD3 Medium example with the prompt ‘A woman shows her hands’.

Stability AI
We generated an SD3 Medium example with the prompt “a muscular barbarian with weapons next to a CRT television set, cinematic, 8K, studio lighting.”
We generated an SD3 Medium example with the prompt “A cat in a car with a can of beer.”

Stability announced Stable Diffusion 3 in February and the company plans to make it available in several model sizes. Today’s release is for the ‘Medium’ version, a model with 2 billion parameters. In addition to the weights being available on Hugging Face, they are also available for experimentation via the company’s Stability Platform. The weights can only be downloaded and used for free under a non-commercial license.

Shortly after the February announcement, delays in releasing the SD3 model weights led to rumors that the release was delayed due to technical issues or mismanagement. Stability AI as a company recently entered a downward spiral with the resignation of its founder and CEO, Emad Mostaque, in March and subsequent series of layoffs. Just before that, three key engineers – Robin Rombach, Andreas Blattmann and Dominik Lorenz – left the company. And the problems go back even further, with news of the company’s dire financial position since 2023.

For some Stable Diffusion fans, the failures with Stable Diffusion 3 Medium are a visual manifestation of the company’s mismanagement – and a clear sign that things are falling apart. Although the company hasn’t filed for bankruptcy, some users made dark jokes about the possibility after seeing SD3 Medium:

‘I think they can now go bankrupt in a safe and ethical way [sic] way after all.”

Leave a Comment Cancel reply