Every Rose Has Its Thorns, Including AI. Is The Quality of Chatbots' Threatened?

Published 7. 12. 2023

It is expected that the popularity of AI is going to rise over time. In the future, we will not be able to distinguish whether certain output was generated by AI or created by a human being.


Nowadays, we use AI in many areas of our lives. And already, we struggle to identify what was made by AI and humans. Often, such content is created by AI versions 0.1 or earlier that are deeply rooted in many technologies we use daily.

It is bittersweet that generative AI has been sensational for many. Such excitement proves that some of us are still not ready for the grand arrival of artificial intelligence.

What terrifies many even more is the fact that AI is not ready for itself. Some experts warn about the finite number of natural data available.

Natural data plays a key role in the economics of artificial intelligence. They are vital for the AI models to function properly and to produce good quality content. The more natural data AI models train on (e.g. human-made), the more useful it gets.

Unfortunately, the amount of natural data is limited.




Rita Matulionyte who teaches IT law at Macquarie University writes in her essay for The Conversation “AI researchers have been sounding the dwindling-data-supply-alarm-bells for nearly a year. One study last year by researchers at the AI forecasting organization Epoch AI estimated that AI companies could run out of high-quality textual training data by as soon as 2026, while low-quality text and image data wells could run dry anytime between 2030 and 2060.”

Her article is available here.

We have the option to use synthetic data or AI-generated data. But such solutions needn’t be viable. Why? There is a possibility that synthetic data might destroy the AI models completely. Research on training data shows that data trained on AI-generated content causes the effects of inbreeding - an increase in genetic disorders.




Due to the current omnipresence of AI, there is more and more synthetic content produced. Paradoxically, the synthetic content can be the biggest threat to generative AI. In other words, by using its own data, AI can become dumb.

I came across this issue for the first time this year in February. I read a comment written by a data researcher Jathan Sadowski from Monash University “a system that is so heavily trained on the outputs of other generative AI’s that it becomes an inbred mutant, likely with exaggerated, grotesque features.”

Sina Alemohammad and Josue Casco-Rodriguez, the machine learning researchers and Ph.D. students in Rice University’s Electrical and Computer Engineering department have dived into this issue quite thoroughly, too. In collaboration with their supervisor, Richard G. Baraniuk, and researchers at Stanford, they wrote an article titled Self-Consuming Generative Models Go MAD (not peer-reviewed yet). MAD is an abbreviation for Model Autophagy Disorder.

You’ll find the interview with them here.




Baraniuk explains “Say there are companies that, for whatever reason - maybe it’s cheaper to use synthetic data, or they just don’t have enough real data - and they just throw caution to the wind. They say, ‘we’re going to use synthetic data.’”

He describes “What they don’t realize is that if they do this generation after generation, one thing that’s going to happen is the artifacts are going to be amplified. Your synthetic data is going to start to drift away from reality. That’s the thing that’s really the most dangerous, and you might not even realize it’s happening.”

Baraniuk's concerns about the usage of synthetic data are well-founded. He expresses “And by drift away from reality, I mean you’re generating images that are going to become increasingly, like, monotonous and dull. The same thing will happen for text as well if you do this — the diversity of the generated images is going to steadily go down. In one experiment that we ran, instead of artifacts getting amplified, the pictures all converge into basically the same person. It’s totally freaky.”