Flux.1-Dev - Photorealistic (and Cute) Images
Did you have trouble making great images with AI? Maybe you were using the wrong model.
When I first saw images generated by the Flux.1 models from Black Forest Labs1, I had to pause and wonder whether they were AI-generated or crafted by a human. The most exciting part? Some of these models come with open weights, so you can try them out too!
The FLUX.1 models include three variants, each designed for different needs:
- FLUX.1 [pro]: Delivers the highest quality, but the weights aren't available. It's intended for commercial use.
- FLUX.1 [dev]: Offers great quality with open weights, perfect for non-commercial projects.
- FLUX.1 [schnell]: Provides decent quality with faster inference speeds, ideal for local deployment—perfect for developers working on personal projects.
At the core of these models is a 12 billion parameter rectified flow transformer architecture. This architecture incorporates advanced features like rotary positional embeddings and parallel attention layers, enhancing both visual quality and computational efficiency.
According to the team at Black Forest Labs, the FLUX.1 models excel in several key areas:
- Prompt adherence: They follow user instructions with high accuracy.
- Image quality: They generate detailed and visually appealing outputs.
- Output diversity: They offer a wide range of style options.
FLUX.1 integrates smoothly with Hugging Face's Diffusers2, making it easy to create stunning images with just a bit of prompt expertise. Let's dive in and see how you can use the FLUX.1-dev model to generate photorealistic and charming images.
Join the AI BootCamp!
Ready to dive into the world of AI and Machine Learning? Join the AI BootCamp to transform your career with the latest skills and hands-on project experience. Learn about LLMs, ML best practices, and much more!
Setup
Want to follow along? All code for the bootcamp is available at this Github repository (opens in a new tab)
To run our experiments, we'll need just two libraries: Diffusers and PEFT. Let's install them:
pip install -Uqqq pip --progress-bar off
pip install -qqq git+https://github.com/huggingface/diffusers.git@d8a16635f47ac455abd61879bcc6be32dfeaa561
pip install -qqq peft==0.12.0 --progress-bar off
I ran the experiments on Google Colab using an Nvidia A100 (40GB) GPU. The code should work on any machine with a sufficiently powerful GPU.
The Diffusers library is installed from a specific commit hash to ensure compatibility with the LoRA adapter we'll use for the FLUX.1-dev model.
Let's begin by importing the necessary libraries and setting a seed for reproducibility:
import matplotlib
import matplotlib.pyplot as plt
import torch
from diffusers import FluxPipeline
SEED = 42
The dev
version of the FLUX.1 model is available on the Hugging Face model hub. We can load it using the FluxPipeline.from_pretrained
method and move it to the GPU:
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
)
pipe.to("cuda")
Generating Images
To generate images, we'll create a helper function called generate_images
. This function will take a prompt and a few optional parameters, returning a list of images generated by the model:
def generate_images(
prompt: str,
guidance_scale: float = 3.5,
n_steps: int = 30,
lora_scale: float = 1.0,
n_images: int = 1,
):
return pipe(
prompt=prompt,
width=1024,
height=768,
guidance_scale=guidance_scale,
output_type="pil",
num_inference_steps=n_steps,
max_sequence_length=512,
num_images_per_prompt=n_images,
generator=torch.Generator("cpu").manual_seed(SEED),
joint_attention_kwargs={"scale": lora_scale},
).images
All our images will be generated at a size of 1024 (width) x 768 (height) pixels. The generator
ensures that the SEED
value is respected, making the results reproducible.
Parameters
prompt = """Create a highly detailed and realistic portrait of a beautiful woman with a warm, inviting smile.
Her expression should convey a subtle, yet captivating sense of allure, with her gaze directed straight at the camera.
Her eyes should sparkle with a mix of charm and confidence, drawing the viewer in.
The background should be blurred, ensuring full focus remains on her expressive face.
"""
Inference Steps
Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio.
num_inference_steps
- the number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
steps = [5, 10, 15, 20, 25, 30]
images = []
for n_steps in steps:
images.append(generate_images(prompt, n_steps=n_steps)[0])
Guidance
Higher guidance scale encourages to generate images that are closely linked to the text
prompt
, usually at the expense of lower image quality.
guidances = [0, 2, 4, 6, 8, 10]
images = []
for guidance in guidances:
images.append(generate_images(prompt, guidance_scale=guidance)[0])
I promised (as per the title) some cute images too. Let's see how the model performs with a prompt for a cute hydra:
prompt = """
A mythical, cute, cartoonish hydra with a single, large head, covered in soft purple scales.
The hydra has big, expressive eyes and a wide, friendly smile, exuding a cuddly and approachable demeanor.
The body is chubby and playful, with short, rounded limbs and a small, wagging tail.
The background features a whimsical, enchanted forest.
"""
Output Diversity
The authors of the FLUX.1 models claim that their models generate different styles of images. Let's test this by generating images for a prompt about penguins:
prompt = """
Two adorable penguins holding hands on a sunny beach.
One penguin has a small bowtie, while the other wears a colorful sun hat.
They stand on soft golden sand with gentle ocean waves lapping in the background.
The sky is bright and clear, with a few fluffy clouds, and a seashell lies nearby, adding to the playful, heartwarming scene.
"""
images = generate_images(prompt, n_images=4)
The style of the penguins varies across the images, seems like the authors were right.
Photorealistic Adapter
The guys at XLabs AI have published a LoRA adapter for the FLUX.1-dev model. This adapter, called RealismLoRA
, enhances the model's ability to generate photorealistic images and it is published on HuggingFace model hub3. Let's load the adapter on top of the FLUX.1-dev model:
pipe.load_lora_weights(
pretrained_model_name_or_path_or_dict="XLabs-AI/flux-RealismLora",
weight_name="lora.safetensors",
)
Let's try the same prompt that we used for generating the woman at the start:
prompt = """Create a highly detailed and realistic portrait of a beautiful woman with a warm, inviting smile.
Her expression should convey a subtle, yet captivating sense of allure, with her gaze directed straight at the camera.
Her eyes should sparkle with a mix of charm and confidence, drawing the viewer in.
The background should be blurred, ensuring full focus remains on her expressive face.
"""
Notive the differences? They are suble but the image generated with the RealismLoRA
adapter has (for example) much fuller hair and imperfect skin texture.
You can vary the lora_scale
parameter to control the level/weight of the LoRA adapter (photorealism in this case). Here's an example:
prompt = """
A baby blue and turquoise excavator parked at a bustling Slavic street repair site.
The excavator has a friendly, compact design with oversized, rounded features.
The surrounding scene includes cobblestone streets, traditional Slavic architecture with colorful facades,
and construction tools scattered around. The background shows a vibrant urban setting with people and
vintage buildings, adding a lively, authentic touch
"""
scales = [0.0, 0.25, 0.5, 1.0]
images = []
for scale in scales:
images.append(generate_images(prompt, lora_scale=scale)[0])
Compare the level of detail between lora_scale=0.0
and lora_scale=1.0
. The latter has much sharper details.
Let's try a few more prompts:
prompt = """A stunning modern oceanfront house with sleek architecture and expansive glass windows,
offering a breathtaking view of the sparkling water. The house features clean lines, an open layout, and a spacious deck.
Surrounding it is lush, vibrant greenery with flourishing flora under a bright, sunny sky.
The scene captures a warm, inviting atmosphere with the shimmering ocean in the background.
"""
prompt = """A modern Scandinavian-inspired bedroom with a minimalistic design featuring white furniture.
The room is bathed in natural sunlight streaming through two large windows, creating a bright, airy atmosphere.
Incorporate subtle blue accents in decorative pillows, a rug, or a piece of wall art to add pops of color.
The space is uncluttered, with sleek lines and an emphasis on functionality.
"""
Stunning, right? Let's finish with something really cute:
prompt = """A charming baby unicorn avatar for a habit tracker app.
The unicorn is small, with a soft pastel color palette featuring light pink and lavender.
It has a sparkling, multicolored mane and a golden horn, with big, expressive eyes and a friendly smile.
The unicorn's pose is playful and engaging, perfect for motivating users and adding a touch of magic to their daily tracking.
"""
Conclusion
If you're anything like me, you're probably amazed by the quality of the images generated by the FLUX.1-dev model. The ability to create photorealistic and captivating images with just a few prompts truly showcases the power of AI models like this one. The open weights make it easy to experiment and seamlessly integrate the model into your own projects. So go ahead—give it a try!
Join the The State of AI Newsletter
Every week, receive a curated collection of cutting-edge AI developments, practical tutorials, and analysis, empowering you to stay ahead in the rapidly evolving field of AI.
I won't send you any spam, ever!