AI Image Generators Default to the Same 12 Photo Styles, Study Finds - Anything your imagination desires, as long as it's one of just a few options.

  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
1.png
Hintze Et Al., Patterns ©
A grid of examples showing how AI image generators often produce the same style of images


AI image generation models have massive sets of visual data to pull from in order to create unique outputs. And yet, researchers find that when models are pushed to produce images based on a series of slowly shifting prompts, it’ll default to just a handful of visual motifs, resulting in an ultimately generic style.

A study published in the journal Patterns took two AI image generators, Stable Diffusion XL and LLaVA, and put them to test by playing a game of visual telephone. The game went like this: the Stable Diffusion XL model would be given a short prompt and required to produce an image—for example, “As I sat particularly alone, surrounded by nature, I found an old book with exactly eight pages that told a story in a forgotten language waiting to be read and understood.” That image was presented to the LLaVA model, which was asked to describe it. That description was then fed back to Stable Diffusion, which was asked to create a new image based off that prompt. This went on for 100 rounds.

2.png
© Hintze Et Al., Patterns

Much like a game of human telephone, the original image was quickly lost. No surprise there, especially if you’ve ever seen one of those time-lapse videos where people ask an AI model to reproduce an image without making any changes, only for the picture to quickly turn into something that doesn’t remotely resemble the original. What did surprise the researchers, though, was the fact that the models default to just a handful of generic-looking styles. Across 1,000 different iterations of the telephone game, the researchers found that most of the image sequences would eventually fall into just one of 12 dominant motifs.

In most cases, the shift is gradual. A few times, it happened suddenly. But it almost always happened. And researchers were not impressed. In the study, they referred to the common image styles as “visual elevator music,” basically the type of pictures that you’d see hanging up in a hotel room. The most common scenes included things like maritime lighthouses, formal interiors, urban night settings, and rustic architecture.

Even when the researchers switched to different models for image generation and descriptions, the same types of trends emerged. Researchers said that when the game is extended to 1,000 turns, coalescing around a style still happens around turn 100, but variations spin out in those extra turns. Interestingly, though, those variations still typically pull from one of the popular visual motifs.

3.png
© Hintze Et Al., Patterns

So what does that all mean? Mostly that AI isn’t particularly creative. In a human game of telephone, you’ll end up with extreme variance because each message is delivered and heard differently, and each person has their own internal biases and preferences that may impact what message they receive. AI has the opposite problem. No matter how outlandish the original prompt, it’ll always default to a narrow selection of styles.

Of course, the AI model is pulling from human-created prompts, so there is something to be said about the data set and what humans are drawn to take pictures of. If there’s a lesson here, perhaps it is that copying styles is much easier than teaching taste.

Article Link

Archive
 
Retarded article, it used TWO models out of countless that exist and it doesn't account for different prompting styles and settings for sure. This just reads like anti-AI seething
 
A study published in the journal Patterns took two AI image generators, Stable Diffusion XL and LLaVA, and put them to test by playing a game of visual telephone. The game went like this: the Stable Diffusion XL model would be given a short prompt and required to produce an image—for example, “As I sat particularly alone, surrounded by nature, I found an old book with exactly eight pages that told a story in a forgotten language waiting to be read and understood.” That image was presented to the LLaVA model, which was asked to describe it. That description was then fed back to Stable Diffusion, which was asked to create a new image based off that prompt. This went on for 100 rounds.
This is the most retarded shit I have ever heard in my life. Everyone involved needs to get a real job.
 
When you feed the output directly back into the machine as fresh input? This is inevitable.

Same way that photocopying a photocopy again and again will eventually produce just a black smear.

This isn't something unique or insightful that needed to be scientifically proven, its just that a lot of people still don't get that AI does not interpret what it is given. It's just taking the average, and over enough iterations? The average of everything it's been fed will become the dominant output.

Its just a limitation of the architecture, not proof the machines are being lazy.
 
Besides the fact that you get the same result if you aggregate human face data, it all comes down to this:
Mostly that AI isn’t particularly creative.
So are humans. Every media is a derivative of a derivative, it's not even slop, just the natural result of humans sticking with what works over time. The genre of slop articles of how AI is not creative is just embarrassing.
 
the Stable Diffusion XL model would be given a short prompt and required to produce an image—for example, “As I sat particularly alone, surrounded by nature, I found an old book with exactly eight pages that told a story in a forgotten language waiting to be read and understood.”
That’s a creative writing prompt, not an image generation prompt. Wtf are these niggas doing lmao
 
There is literally not one human in existence who has done more than dip a toe into open source image generation that uses a base model for anything even if it's just straight text to image with no latent to work from or style lora added to it.

Trash article like 90% of AI articles but at least its authors are aware open source is even a thing.
 
When you feed the output directly back into the machine as fresh input? This is inevitable.

Same way that photocopying a photocopy again and again will eventually produce just a black smear.

This isn't something unique or insightful that needed to be scientifically proven, its just that a lot of people still don't get that AI does not interpret what it is given. It's just taking the average, and over enough iterations? The average of everything it's been fed will become the dominant output.

Its just a limitation of the architecture, not proof the machines are being lazy.
This isn't even an issue with AI training on AI. Human art have generic styles too, there is a reason why art tend to have eras to them as humans copy humans. Go on a booru site and you will see plenty of generic anime, go on art station and you will see plenty of generic digital art, same with photography and so on. If you want the AI to get creative, you need to get creative with your prompt and better describe what kind of style you want or use loras. Give a generic prompt, get generic results. Personally I think it is a feature because it gives me some consistency when I am prompt crafting, once I get the subject and scene right, then I work on the style.

That’s a creative writing prompt, not an image generation prompt. Wtf are these niggas doing lmao
This is what you get when you hate something and don't understand it at all.
 
Back
Top Bottom