Human-drawn style Sketch generation

Enhancing Diffusion Model for Pixel-based Human-drawn Style Sketch Generation using Visual Question Answering Feedback

Recent advancements in diffusion models have significantly improved image generation, but challenges remain in synthesizing pixel-based human-drawn sketches, a unique form of abstract expression.

StableSketcher is a novel framework designed to enhance prompt fidelity in sketch generation using diffusion models.

1. Fine-Tuned Variational Autoencoder (VAE)

Optimizes latent decoding to better capture sketch characteristics.

2. Reinforcement Learning with VQA-based Reward Function

Enhances text-image alignment and semantic consistency in generated sketches.

3. A New Benchmark Dataset

First dataset with instance-level sketch-caption-QA pairs
Addresses limitations of existing datasets that rely on image-label pairs.

Overview of StableSketcher and SketchDUO.

Page updated

Google Sites

Report abuse

Human-drawn style Sketch generation

Enhancing Diffusion Model for Pixel-based Human-drawn Style Sketch Generation using Visual Question Answering Feedback

Dongguk University Machine Learning Lab.