AI-generated images

Posted on 2022-September-01 in fun

If you haven’t followed, there has been a recent (Summer 2022) wave of AI tools for image generation from a text prompt. It works this way: describe what you want to see, press a button, see what you get.

Here are some prompts and the images that were generated from them. Taken from the Dall-E2 examples:

An armchair in the shape of an avocado.

An expressive oil painting of a basketball player dunking, depicted as an explosion of a nebula.

A photo of a white fur monster standing in a purple room.

How does it work? Feed an AI with billions of images and let it compress it all down to a much smaller data set of coefficients that encode all the inputs. When I say “feed” it takes a bit more than that: if you give it Van Gogh paintings you would also tag each image as ‘Van Gogh’ so it can attach a name to a style, but also a rather complete description of what is in each image, and where. That’s the hard part and it has already been done by several companies out there.

A few years back those tools gave us Prisma a mobile app that transforms your photos into paintings copying an artist’s style. That was very neat and fun. Here are some examples:

Go one step further: instead of essentially applying a filter to a photograph, tell the AI what you want to see and let it randomly pick from its own knowledge base to generate something completely new. The results you will get are entirely dependent on the kind of images that were used to train the AI. From what I observed, Dall-E has no trouble generating photo-realistic pictures that are hard to tell apart from stock photos. Midjourney must have been trained with a lot of paintings and gothic art, because most of what I got were museum-quality paintings.

And of course there’s the matter of the prompt you used to generate the image. Finding the right way to express what you want can be very hard. The AI engine does not handle ambiguity very well and sometimes completely misunderstands the request.

The prompt was:

Cartoon drawing of a cute monster hidden under a child's bed, with the child awake sitting on the bed

If you haven’t tried yet, I would encourage you to go and have fun with those tools. It doesn’t require any software knowledge, just be creative with what you want to see and let it do all the legwork! So what is available today?

OpenAI has a beta program now accessible to all. You get a small number of free credits to start with, you pay to get more, which is fair.

Midjourney receives prompts from a discord server. Invites are free, same story as Dall-E: get free credits to start, buy a subscription to play more.

Another option is to download and run the whole software and data and do it yourself on your PC. Stable diffusion does just that and it works fine provided you have a PC running Linux with a very beefy GPU on an Nvidia graphics card with 6GB or more RAM. That’s expensive hardware and spicy electricity bills so all in all a subscription with a cloud-based AI might well be a lot cheaper.

Here are some images I generated using stable diffusion with the prompt:

A digital illustration of Steampunk Paris in 1900

First question: who created those images? Is that me, human, who came up with the idea in the first place? Or is it the SD engineers who took the pain of processing billions of tagged images and I merely used their results? Or is it all the common knowledge contained into billions of images fed to a fancy data compression algorithm?

Let’s take an example: here’s what came out when I asked for some cute monsters.

Good results, right? If I had contracted out a human illustrator to do the same, I am pretty sure we would agree that all the creativity came from whoever did the drawing and the customer merely expressed an intent.

Getting quality pictures requires fairly elaborate prompts that are rooted in graphical designer know-how. If you want to get an idea, check out some of the sites dedicated to explaining what you can do with elaborate prompts

Hard to tell who the real artist is, right?

A friend of mine pointed out that the same questions came up when photography started. I think everybody agrees that photography is an art created by humans despite the fact that the artist didn’t create the reality they are capturing. Yes, reality exists independently from photographers but seeing it through their opto-mechanical eyes changes everything. Think about portraits by Annie Leibovitz or Harold Feinstein if that wasn’t obvious.

I don’t have the answer to that question. Right now I get the feeling that there is a consensus that generated pictures are the work of whoever came up with the prompt, but time will tell.

If you want to find inspiration for prompts or just have some fun, check out the SubReddits about AI-generated images. The Dall-E2 subreddit is just fantastic.

Here is a recent one: Mona Lisa attending a techno party, having the time of her life.

Mona Lisa having the time of her life in a techno festival

That one had me in tears! At first the face looks familiar but you just don’t know where from, and then the prompt hits: of course that’s Mona Lisa! But it’s not a painting it’s a photo, so probably something recent, she’s probably old by now (about 500 years old). Look at her again: she’s way past her youth but she still has that haircut and tattoo, and dress, with jewellery on top she must have picked for the festival. Participating in techno festivals is way past her age but she’s having the time of her life! Her smile is unequivocally sincere this time, and she’s tilting her head in an attempt to be charming, just like when she was 15 and seduced Leonardo to the point he decided to make her immortal. A picture is worth a thousand words.

Of course I am making up that complete backstory for that picture, but that’s the whole point behind art, right? Makes us think, imagine, feel, invent and be creative ourselves.

Here is another touching one: two girls pranking a boy on the phone in 1968. The tone is right, the picture is messed up as expected, the characters are perfectly composed.

Girls pranking a boy on the phone in 1968, by Dall-E2

I will finish with that one:

Daughter looking at her dad who just made a dad joke, by Dall-E2

Girl looking at her dad who just made a dad joke

There are some fantastic guides around how to generate the best images with various services out there. I just discovered Making AI Art with Midjourney.

The look in her eyes is priceless. I can see a teenager feeling desperate about her dad’s poor jokes. There is both tenderness and regret in those eyes. Can’t imagine how a photographer would manage to capture that except by sheer luck.

If anyone had asked me just a few months back which jobs would never be replaced by an AI, art would have been my first choice. There is still a very long way to go until we replace graphical designers, photographers, illustrators, and design artists in general, but there is no question that those tools will change the game in very radical ways.

Right now I am having fun re-creating complete worlds in the style of Miyazaki so stay tuned :-)