Recent advancements in diffusion models have significantly improved image generation, but challenges remain in synthesizing pixel-based human-drawn sketches, a unique form of abstract expression.
StableSketcher is a novel framework designed to enhance prompt fidelity in sketch generation using diffusion models.
1. Fine-Tuned Variational Autoencoder (VAE)
Optimizes latent decoding to better capture sketch characteristics.
2. Reinforcement Learning with VQA-based Reward Function
Enhances text-image alignment and semantic consistency in generated sketches.
3. A New Benchmark Dataset
First dataset with instance-level sketch-caption-QA pairs
Addresses limitations of existing datasets that rely on image-label pairs.
Overview of StableSketcher and SketchDUO.