Top 3 Photorealistic Stable Diffusion Models — How To Prompt The Best Images
The Experiment
Today, I conducted an experiment focused on Stable Diffusion models. Recently, I’ve been delving deeply into this subject, examining factors such as file size and format (Ckpt or SafeTensor) and each model’s optimizability. Additionally, I sought to determine which models produced the best results for my specific project goals.
The Process
Having worked as a fashion photographer and cinematographer in the past, I always prioritize understanding what the camera is doing in each scene. This includes considering its settings and how it relates to the shot, which are critical elements to me. When working with Stable, I bring this same approach and develop the visual story first before creating any series of prompts.
In my latest series, Warzone, I wanted to move beyond simply having the model assume a sexy pose in a military outfit next to a tank or holding a weapon. Instead, I aimed to convey the experience of being caught in a WW2 city or village bombing, capturing the intensity and emotions of the situation. To achieve this, I carefully crafted the visuals to tell a compelling story that showcases the character’s experience amidst the chaos.
So I ask myself these 5 questions.
1. What were they doing when the bombing occurred?
2. Where were they located in relation to the bombing site?
3. What attire were they wearing at the time of the bombing, if any?
4. How did the bombing impact them physically? Are there any visible injuries, rubble, or dirt on them or in their hair? Did it result in any harm or damage to them or their clothing?
5. Most importantly, how are they engaging with the camera in response to the bombing?
After considering these five questions and outlining the visual storyline, I will divide it into three sections. To illustrate, I will use examples from my Warzone project.
The Characters Story
What is the character doing
A young French woman is standing barefoot in the rubble-strewn street of a bombed-out city during World War II. She is barely visible in the dimly lit, dark street. She is wearing normal attire for a French woman in 1945. A leather satchel hangs on her shoulder. She is seen assuming multiple poses and positions. The attractive 24-year-old woman is assuming various looks in contrast to the background. Her short hair in a messy bun and her skin shiny and sweaty.
The Background
What characteristics are detailed in the world around the character?
In the background, destroyed cars, enemy tanks, and crashed planes. Creating a stark contrast against the dark, hazy smoke filled sky, smoky surroundings. Smoke stains and ash can be seen on the buildings around the impact zone. Burning buildings, shops, and cars are strewn about the field. Puddles of water can be seen on the muddy, rubble strewn streets.
NOTE: This is also where I would detail if other characters are seen in the background. For this example there aren’t. We assume that she is alone and everyone else is dead.
The Camera
What is the camera doing that really finishes this scene?
The image is ultra-detailed, with a hyper-realistic quality that captures the intricacies of every texture, from the grime and dirt on the woman’s skin to the smoldering embers of the destroyed buildings behind and around her. This masterpiece of photography is a true work of art, rendered in full color and ultra-high resolutions of 32k, 16k, and 8k, with cinematic lighting, DLSS, Ray Tracing.
Capture the WWII scene with a 35mm HASSELBLAD 1600F NR. 0001 film camera, use an f/5.6 lens. Set the aperture to f/5.6 to achieve a sharp image. In order to avoid overexposure, set the ISO to 100, which will reduce the amount of light entering the lens.
And I always like to finish out the scene description telling the Ai who I think this scene should model after. So I’ll say something like:
And photography in the style of Greg Rutkowski and Stanley Artgerm Lau.
Positive Prompts
As many of you well know, the key to high quality outputs with Stable is giving lots of detail. The obvious, best way to do this is with your positive prompts. For this series, these are many the prompts that I used:
Zoomed out, 35mm film, full color, night-time, Full Body midnight Photography showing legs and feet, beautiful feet, skinny French woman, beautiful, ultra detailed, ultra high resolution, hyper realistic photography, 32k resolution, 16k resolution, 8k resolution, 24 year old female, super real life night photo, World War Two, WWII, World War II, war, detailed, bombed city, fire, smoke, embers, Hyper realistic photography, super resolution, super real life photo, ultra detailed, ultra high resolution, hyper realistic night photography, normal, highly detailed, intricate, sharp focus, midnight lighting, attractive, high quality, masterpiece, award winning, different poses, different positions, different looks, multiple backgrounds, natural makeup, female, zoomed out full body photography, Short hair, bruised, cute, woman next door, uncensored, 1945, 1945_photo, 1945_period, 1945_era, vintage-photo, dark, underground, dirt, grime, rubble, war, destruction, uncensored, looking away, sweaty, shiny skin, facing the camera, side view,
By including as much detail as I can in the prompts, I can ensure that every theme of this series I produce will have a similar nature, look and feel.
Negative Prompts
The negative prompts require a more nuanced approach. While creating negative prompts, I aim to explicitly mention all the elements that I don’t want to see in the image.
However, unlike the Positive Prompts section, I prefer to use less detailed prompts in the Negative Prompts section. In my experience, it’s better to use broader themes rather than specific details. For instance, some of the negative prompts I may use could be:
sun, sunlight, daylight, day-time, sun-ray, light, day-light, day-time, day, sepia, black-and-white, black-&-white, white-and-black, gray-scale, stars, moon, starry, large-breasts, big-breasts, huge-breasts, massive-breasts, giant-breasts, gradient, noise, poorly Rendered face, poorly drawn face, poor facial details, poorly rendered hands, low resolution, head cropped, frames, frame, framed, Images cut out at the top, left, right, bottom, bad composition, mutated body parts, blurry image, disfigured, over saturated, username, watermark, signature, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, unattractive, morbid, mutated hands, amateur, cursed, dull, boring, weird, beginner
Here’s a possible rewrite:
When I use thematic negative prompts instead of specific ones, I can broaden the scope of things that the AI avoids in the generated image. However, it can be a challenging task because I’ve noticed that the less specific negative prompts I use, the higher the quality of the final image. This approach helps to minimize the presence of artifacts, malformities, and aberrations in the image.
Ok, the brass tacks!
After going through this entire process, I tested this method on probably 100 different SD models. The results were widely varying but the few models that I felt had the absolute best outcome were these:
Deliberate V2
My favorite model is Deliberate v.2, which I use as my daily workhorse. I’ve created over 7,000 images using this model. While it may not be the most photo-realistic option available, I find that it excels in capturing finer details and produces consistent image styles. It’s worth noting that this model is exceptional in generating Manga art styles.
Pros: One of the strengths of Deliberate is its ability to effectively capture intricate details, lighting, reflections, and environmental characteristics in digital art.
Cons: However, a major drawback of using Deliberate is that it requires a considerable amount of iteration. Additionally, the model seems to struggle with hands and adding extra body parts. It’s worth noting that this issue is not unique to Deliberate but rather a consistent trend with most SD models, although it seems to be more noticeable with Deliberate.
All in all though, It still find it to be the best one I’ve used even if it does require a little extra “finessing”.
Realistic Vision V2.0
Realistic Vision V2.0 is a model for generating images and it is currently ranked number two on the list. Although I have only been using it for a few weeks, I am already impressed with its performance. In fact, during the first round of images I used it with, I immediately noticed that the results were exceptional. One of the things I appreciate about this model is that it produces very realistic images based on the camera settings. However, the standout characteristic of Realistic Vision V2.0 is the level of detail it can generate for fabrics and textures, which is unparalleled by any other model I have used. For instance, the waxed canvas jacket in the image above has incredibly raw and realistic lighting, shadows, and highlights that make the clothing look like it was actually photographed.
Pros: One of the major advantages of this model is its ability to produce high-quality textures in fabrics and textiles. The amount of detail it adds to the skin’s normal map is unparalleled and results in a realistic and visually appealing final product.
Cons: Similar to other similar models, this one also struggles with creating realistic representations of hands and fingers, but this is a common challenge. Another minor issue worth noting is that the model’s image composition is often too consistent, which can make the final product feel repetitive. However, this is not necessarily a major drawback.
All in all Realistic Vision V2.0 is a fantastic model and will produce high-quality, beautiful portraits.
Protogen x3.4
https://civitai.com/models/3666/protogen-x34-photorealism-official-release
Out of the list of available models, Protogen x3.4 takes the third spot. To be frank, I must say that I am not fond of it. While Protogen has been around since the beginning and has gone through numerous iterations, I find that it doesn’t perform as well as the first two models we have discussed. While it does produce decent images, it falls short in comparison to the other models when it comes to image quality.
Pros: Based on my experience using this model, there are some notable pros and cons to consider. On the positive side, the model is extremely fast, which can be a significant advantage in certain situations where time is critical. However, in terms of overall quality, I would rate it as a 7 out of 10 on a subjective scale. While this is a decent score, it’s worth noting that there are some limitations to the model’s performance.
Cons: One of the main cons is that the texture, detail, and overall quality of the output can be lacking in some cases. This may not be noticeable in all situations, but for certain use cases where high-quality results are essential, this could be a significant issue. It’s possible that more powerful machines could help improve the performance of the model and mitigate some of these limitations.
When it comes to image processing software, Protogen x3.4 is a great choice for those with lower performance machines. It offers solid performance and is relatively lightweight compared to alternatives like Deliberate or Realistic Vision. However, for high-performance machines like mine, I find that Protogen x3.4 falls short in terms of producing images with the level of depth and detail that I’m looking for.
Conclusion
In conclusion, I’ve found that the Deliberate V2 and Realistic Vision V2.0 models are exceptional in generating high-quality, visually appealing images with intricate details and textures. While Protogen x3.4 may be a great choice for those with lower performance machines, it falls short in terms of producing images with the level of depth and detail that high-performance machines can achieve. By carefully crafting the visuals and using thematic prompts, you can create a compelling visual storyline that captures the intensity and emotions of the situation. Overall, using specific camera settings and themes can produce incredible results, especially when using models like Deliberate and Realistic Vision.