Clarification Note: All images in the post were crafted in cooperation with Midjourney AI, which followed my prompts.

Quick Summary for Those Who Prefer a Short Read:

2024. Things are going wrong, and we’re unsure how to fix them. ‘Thinking outside the box’ sounds good, but it usually just creates another box. Using intuition and non-logical thinking might help, but it’s hard to convey the results to others because words often fall short. We’re at a confusing crossroads, and here’s my visual take on this less-than-optimistic observation

'Suppose you succeed in breaking the wall with your head.

And what, then, will you do in the next cell?'

Stanislaw Jerzy Lec

2024. Everything is going in the wrong direction and nobody knows what to do with it.

We need leaders who can think outside the box, just like the heroes in all those great books and movies. ‘Thinking outside the box’ sounds like a great idea. It should save the world, right?

But the more I thought about it, the more I realized that stepping outside one box just puts us into another box. Here's why:

If we define ‘thinking’ as the mental process that involves language and logic, then those things create a sort of ‘box’ for our thoughts. So, ‘thinking outside the box’ means going beyond the usual ideas we know, but it’s still just thinking within another set of ideas, or another box.

What about ‘thinking outside A box’? That means thinking beyond any set of established ideas, not just swapping one set for another. But here's the tricky part: as soon as we start thinking, we create structures or ‘boxes’ to organize our thoughts. So, the idea of thinking completely outside any box is kind of impossible because even trying to do that makes a new box. It leads nowhere.

Well, there is a way out of the ‘thinking box’ trap that’s not very popular in our science-focused world: tapping into intuition and other non-logical ways of knowing. While we usually think of logical thinking as the best kind, this overlooks other ways of understanding the world. Intuition, for example, lets us process information quickly and often subconsciously, leading to insights that logic might miss. Non-verbal thinking, like visual or spatial reasoning, also plays a big part in solving problems and being creative.

But here's the catch: communicating these intuitive and non-verbal insights is tough. They don’t follow the clear, structured paths of logical reasoning. When we try to explain these insights, we have to put them into words, which can be limiting and might not capture the full essence of our understanding.

So, we have a paradox: while intuition and non-verbal thinking are essential for a deeper understanding of the world, sharing these insights effectively requires words. And words can only roughly translate the original insight, losing some of the nuance along the way. This means that while we can personally appreciate these ways of knowing, sharing them with others is often a complex and imperfect task.

Although blending all these ways of thinking might happen in the future if our civilization survives (kind of like what Hermann Hesse imagined in his book [Ch.4]), it’s not something we can use right now. Today, we’re at a crossroads without a clear view of the path ahead.

These thoughts have been bugging me for a while. After almost a year of navigating through various attempts with mixed results, here’s my visual take on this less-than-optimistic observation.

Glimpses into AI Image Creation:

The crafting of this image took me the longest time so far — almost a year of on-and-off attempts. It's a good example of how the real process of AI image creation works. Despite the general misconception that one just needs to write a verbal prompt, click a button, and voilà, a great result appears — there is much more to it.

It is true that one can submit to AI a prompt with a few words, many words, or even a blank prompt, and receive a visual response within minutes. The response could be a great image or garbage, but almost never what the prompter had in mind. To get a more predictable result, a prompter has to do a lot of tweaking and adjusting of the prompt's text and parameters, sometimes through hundreds of tries and errors without guaranteed success. Depending on the prompter's persistence and patience, the process can take from hours to weeks or even months. Eventually, the prompter just stops because of exhaution and accepts the result no matter what result was achieved.

One unforeseen aspect of this endeavor is how long the process takes and how the model evolves along the way. The AI-assisted image generation is a quickly developing field. For example, I use the Midjourney AI model, and during the year it took to create this image, I started with version 4 and successively switched to versions 5, 5.0, 5.1, 5.2, and 6. Each version had significantly improved abilities, which forced me to change directions and ideas in my attempts. Of course, this would not have happened if I had completed the image in days or weeks.

Another important aspect of the current AI image generation state is that despite explosive growth, this field is still in its infancy (Midjourney is about 2 years old). It can generate great images, but the more complex the image, the less perfect and predictable is the result. Therefore, to create a complex image like the one listed above, one has to break the final idea into parts, create them separately, and gradually blend them together. This is an extremely tedious process requiring numerous tries and errors by itself.

To illustrate the process, here is the image that contains a minuscule portion from hundreds of attempts I made over the year:

The first two rows show examples of images generated by AI prior to some selection and tweaking. Finally, I got an image I considered the seed for the final result (the second from the left in the bottom row) and started to work around it. I created additional parts of the final image separately (numerous tries and errors by themselves), blended them together, and eventually did all tonal and color corrections and resizing in Photoshop.

So, the question, 'Who is the creator of an AI-generated image — the prompter, the AI, the artists whose works were loaded into the AI database, or all of them?' does not have a simple answer. You be the judge. I am only positive that the image is unique, and nobody, including myself, can reproduce it even by thoroughly repeating each step of its creation. Welcome to the brave new world!

mark berman

mjb and mj...ch6. 2024...box that is always with you

'Suppose you succeed in breaking the wall with your head.

And what, then, will you do in the next cell?'

tags:

categories: