After a year or so mucking around on Midjourney I’ve only recently begun using the –sref and –cref functions. What are these, do you ask? Well, they are offshoots of the basic reference pic users can paste into their /imagine prompts. Midjourney calls them Imagine URLS. As the user’s manual says, “Imagine URLS can be added to a prompt to influence the style and content of the finished result. ” You simply find a pic online that you like, copy the URL, and paste it into the prompt before the main text.
This prompt was used to generate the pic of the white-hatted fiddler at the start of this post. If you have a stash of pics online, say through your own site or an image cache such as Pinterest, so much the better.
After this, Midjourney does its magic by compressing and saving the image from the URL. Not sure how long it’s stored in there — for a few days at least. I’ve revisited the same prompt a week or so later and it still worked. The Imagine URL puts limits on the generated image by copying its style or subject matter to it in a random way. Random — that’s a word to remember.
The –sref and –cref commands let the user further tune these linked reference pics, and make them less of a crap shoot for the result, by copying the pic’s style (–sref) or character (–cref). Broadly, –sref is best for inanimate images, –cref for ones with a live subject. But, they can also be used out of this context.
For this post, I’m going to take the image on left, one of my favorite pieces of Michael Whelan artwork, and apply it to five different subjects in various ways. (The illustration is from a recent edition of the epic SF novel The Snow Queen, by Joan Vinge.)
The subjects are:
A war elephant with many tusks, two horns, and armor, like the mumakil from The Lord of the Rings, fantasy illustration by Michael Whelan.
Modern, cozy reading nook with a pile of books in a Scandinavian country house, higher level than the floor below, a few steps up. Through a window is a field of snow. Photorealistic architecture magazine style shoot.
America soldier arriving home and being greeted by his dog, journalism photo.
Ruth Bader Ginsburg, cartoon portrait, cool blues, mint green.
Japanese bento box, winter foods, photoshoot for Gourmet magazine.
In each series of images, the regular reference is on the left, the –sref in the middle, and the –cref on the right. Let’s see what happens.
All of the above are very… odd, to say the least, and overdetailed. But the original pic was too. I’d say the first two satisfied the prompt, but the last one is just… whoopsville. A Snow Queen does not a good war elephant make, especially as the only part of her humanity left is a peek of her bare midriff and ornamented panties.
We’re on a better footing here with a scene as the subject matter. The first pic really blows it out of the park, IMO. I can’t call it photorealistic, as it’s more of an illustration in the same style as the original. But the combo of mask, furry throw (?), books, landscape, and insect-ornamented wooden staircase leading up to a night sky adds up to something meta. Like a commentary on the many published versions of the Snow Queen tale and how universal it is.
The center copies the original painting’s style in its colors, forms, and shapes but the end result is nothing special.
The last is photorealistic and would be dull, save for that out-of-place costumed mannequin bust. Here it’s clear what happened in the inner workings of the AI: the human head of the Snow Queen was transformed into a sculpture in order for the prompt to “make sense.”
Here I used a specific style and a specific subject matter and action. Of the images generated, I used the least ridiculous. The first copies the original’s palette and would have been acceptably realistic save for the plumage on the soldier’s helmet — the Snow Queen element reduced to just feathers.
In the second, it’s only the palette that was copied, but I like it. Notice how the white, feathery headdress has its echoes in the white fur of the dog in this one and the previous one.
The third incorporates the look of a B&W newspaper photo but it’s a costumed woman, not a soldier, embracing the white-furred retriever dog. The gold of the hawk’s beak and the pendants of the original have been transformed into a metallic visor on the woman, an interesting touch. I kinda like it — it’s both futuristic AND touching.
Next we’ll move on to a specific person: Ruth Bader Ginsburg.
The first is an acceptable portrait in fantasy style — it’s clearly her, wearing a feathered collar and jeweled skullcap. But the second moves further away from reality with the overlong neck and caricatured features, not to mention that amoeba appendage at the back of her head. If I didn’t tell you it was Ruth Bader Ginsburg, you probably wouldn’t have guessed.
The third is clearly not her. It might be the woman of the original pic, in a different costume and headdress.
Using the prompts on a bento box gave me the most varied results. The first came out as surrealist art, a common result for Misjourney, with a mini bird mask, whole raw fish, and split-handled spoon. The second likely could exist in real life, with some styling. It’s half English teatime, half Japanese. The third incorporates the Snow Queen character as another food in the box next to the quail eggs and enoki mushrooms, but it’s unclear what she’s made of. It looks like two mussel shells are sitting in her hair, but that’s all I’ve got. So, number 2 wins the prize in this case.
In later posts, I’ll be looking more at the intricacies of using –sref and –cref.