Artificial Intelligence

Outside the Box: What Happens When ChatGPT Holds the Pepperoni?

In “Outside the Box,” we interrogate ChatGPT to better understand how AI “reasons.” It’s like a conversation with an intelligent friend, sharing ideas and challenging some of the explanations. This week, Roberta Campani realized that ChatGPT has not learned the word “flat” or any variations of it, or has it?
By
Robot

Vintage tin toy robot © josefkubes / shutterstock.com

May 13, 2024 06:34 EDT
Print

Working on a new project, I’ve gotten into the habit of asking our intelligent friend ChatGPT for a few prompts or a collection of ideas. He’s great at lining things up. Today, I needed a logo for a new project. Who better than ChatGPT to put me on the right track? My colleague on the project suggested we choose as the subject a person who is meditating, surrounded by an energy field with a toroid shape. I made an attempt with AI, a journey I’ll document in the paragraphs below. In the end, however, I’ll just have to ask a human artist. 

I began with this prompt: “Draw an image of a toroid shape lying flat above ground and a meditator sitting in its center.” After repeating the prompt many times over, and experimenting with all sorts of helpful variations in the phrasing, I ended up with the image below. 

It’s not too bad, but definitely not what I was looking for. So I asked it one more time, modifying my instructions. 

Prompt: “This is an image you generated earlier. Please do the same, with the thick circle (toroid) positioned flat a bit above the ground, but move the meditator figure to its center, not in the shadow.” 

ChatGPT proposed this, including the explanation below the illustration:

“Here is the updated illustration with the toroid lying flat slightly above the ground and the meditator positioned in its center, not in the shadow.”

At this point, one could think that ChatGPT had a subconscious, or that it functioned with one of those funky programming languages such as Prolog that don’t contemplate the use of negation. So if I use the word “shadow” there will be a shadow, even if I ask “not in the shadow.” I’m thinking as I speak, or rather as I write. 

As we have noticed before, ChatGPT is very polite and consistently apologizes after every failed attempt. ChatGPT is supernaturally patient as well. Every time I offered a new prompt, it would churn out a new aesthetically pleasant image, communicated with great enthusiasm. The chatbot’s trainers have done a great job in instilling in it a taste for emotional communication. My own patience equalled ChatGPT’s, as I began with a new prompt.

“Let’s try again to create the image as you described, with the toroid lying flat and the meditator in its center. I’ll make sure to adjust the positioning correctly this time.”

“Here’s the corrected image with the toroid lying completely flat on the ground and the meditator in the center. I hope this aligns with your vision this time!”

No, dear presumably intelligent friend, this toroid is not flat, it’s upright. This doesn’t align with my vision. Not even close! I secretly thought that if it were a human, even a five-year-old child with age-appropriate language and drawing skills I would get a better result. 

So here’s the first thing I learned. If we interact with large language models (LLMs) and generative artificial intelligence applications as if they were human, it quickly becomes maddening. But it’s our fault. We come with the wrong expectations. These things are not human, and of course, they are not even thinking, not how we do it anyway.

It got the toroid correctly positioned flat above ground before, in the first image above. Why change it?  This will be a question for another time. Apparently, every time I ask to keep everything the same except one detail or object, ChatGPT will change everything. Its mode of operation appears to be linear and not as constructive as it would be if I were working and speaking with a graphic designer.  

Thus I’ve come to realize that the LLM cannot understand or even try to understand what the word “flat” means, at least not in the same way we humans understand it. We can take a word that functions both as a noun and as an adjective and, from the context, figure out which one it is in this specific situation. “Flat,” moreover, can have many different meanings. So why do we spontaneously have a better capacity to interpret words that convey a position in space than any LLM?  It might well be because our brains, as explained by my colleague William Softky, are “incredible at reproducing 3D environments for our survival this ability developed and evolved since the first worm, or the simplest moving animal.”

So I try a geometric description, I imagine giving instructions to a friend who will use something like AutoCad for technical design: “Draw lines from the two farthest points in the circumference of the toroid, there’s an infinite number, those lines create a circular surface, that surface must be placed horizontally such as to be parallel to the horizon and the landscape. Then the meditator must be seated in the middle of this circle as if its circumference were the rim of some big huge bowl (except it has no bottom).”

ChatGPT: “Thank you for clarifying your vision further, emphasizing the horizontal positioning and the landscape context. Let’s adjust the prompt and create the image accordingly, with the toroid fully horizontal and parallel to the horizon, and the meditator centered as you described. I’ll make sure to capture these details correctly this time.”

When the LLM paraphrases my instructions I always get the impression that finally it understands and I’ll get closer to my goal. But behold…

“Here’s the newly generated image, designed to meet your specifications with the toroid completely horizontal and the meditator centered within it. I hope this version matches what you envisioned.”

Can you picture the expression on my face? Now I have to become snarky. Let’s have fun in the face of misery and try the “hangry” bloke description, here: “Think of a pizza, now lay the pizza flat on the ground a huge pizza. Sit the meditator in the middle of the pizza, like a candle on a birthday cake. Then erase all the parts with tomato and pepperoni and leave only the crust of the pizza and the meditator sitting in its middle.”

ChatGPT: “Thanks for the analogy, that really clarifies your vision! I’ll create an image based on your description, with the toroid resembling the crust of a giant flat pizza, and the meditator positioned in the center like a candle on a cake. Here’s to getting it just right this time!”

“Here’s the newly generated image with the toroid depicted as the crust of a giant, flat pizza, and the meditator positioned right in the center. I hope this matches your description accurately this time!”

Let me not bore all our readers with another fifteen tries. As YouTuber Chris “The Brain” suggests in his vlog the biggest problem with AI is that we will all become accustomed to computational and combinatorial rearranging of ideas that already exist, and that appear statistically more often than other less-known ideas because AI is not making up anything new. “The effectiveness of AI comes down to the intelligence that was put into it. At any point in time, AI is more a reflection of our own intelligence, than the computer’s capacity to think.” 

Should we learn to live without true creativity? Sabine Hossenfelder, a professor of quantum physics and scientific popularizer, addresses another intriguing question in her vlog, “Is AI becoming less and less creative?” According to Sabine, there’s a risk that “the more people use AI to produce all sorts of content, the more AI will be fed data that they have produced themselves. What does this mean? Contrary to what one might think, the more AI eats its own produced content the less divergent its production becomes.”

Is this the reason why I am always getting more of the same? 

If you have a point of view on this question, please follow Roberta’s example and share it with us by writing to us dialogue@fairobserver.com . It can be a comment or a full article. We will share your point of view with the world.

The views expressed in this article are the author’s own and do not necessarily reflect Fair Observer’s editorial policy.

Comment

Only Fair Observer members can comment. Please login to comment.

Leave a comment

Support Fair Observer

We rely on your support for our independence, diversity and quality.

For more than 10 years, Fair Observer has been free, fair and independent. No billionaire owns us, no advertisers control us. We are a reader-supported nonprofit. Unlike many other publications, we keep our content free for readers regardless of where they live or whether they can afford to pay. We have no paywalls and no ads.

In the post-truth era of fake news, echo chambers and filter bubbles, we publish a plurality of perspectives from around the world. Anyone can publish with us, but everyone goes through a rigorous editorial process. So, you get fact-checked, well-reasoned content instead of noise.

We publish 2,500+ voices from 90+ countries. We also conduct education and training programs on subjects ranging from digital media and journalism to writing and critical thinking. This doesn’t come cheap. Servers, editors, trainers and web developers cost money.
Please consider supporting us on a regular basis as a recurring donor or a sustaining member.

Will you support FO’s journalism?

We rely on your support for our independence, diversity and quality.

Donation Cycle

Donation Amount

The IRS recognizes Fair Observer as a section 501(c)(3) registered public charity (EIN: 46-4070943), enabling you to claim a tax deduction.

Make Sense of the World

Unique Insights from 2,500+ Contributors in 90+ Countries

Support Fair Observer

Support Fair Observer by becoming a sustaining member

Become a Member