| # Diffusion Models App | |
| A Python application that uses Hugging Face inference endpoints and on-device models for text-to-image and image-to-image generation with a Gradio UI and API endpoints. | |
| ## Features | |
| - Text-to-image generation | |
| - Image-to-image transformation with optional prompt | |
| - ControlNet depth-based image transformation | |
| - Gradio UI for interactive use | |
| - API endpoints for integration with other applications | |
| - Configurable models via text input | |
| - Default values for prompts, negative prompts, and models | |
| ## Project Structure | |
| - `main.py` - Entry point that can run both UI and API | |
| - `app.py` - Gradio UI implementation | |
| - `api.py` - FastAPI server for API endpoints | |
| - `inference.py` - Core functionality for HF inference | |
| - `controlnet_pipeline.py` - ControlNet depth model pipeline | |
| - `config.py` - Configuration and settings | |
| - `requirements.txt` - Dependencies | |
| ## Setup & Usage | |
| ### Local Development | |
| 1. Clone the repository | |
| 2. Create a `.env` file with your Hugging Face token (copy from `.env.example`) | |
| 3. Install dependencies: `pip install -r requirements.txt` | |
| 4. Run the application: `python main.py` | |
| ### Hugging Face Spaces Deployment | |
| 1. Never commit the `.env` file with your token to the repository! | |
| 2. Instead, add your HF_TOKEN as a secret in the Spaces UI: | |
| - Go to your Space's Settings tab | |
| - Navigate to Repository Secrets | |
| - Add a secret named `HF_TOKEN` with your token as the value | |
| 3. The application will automatically use this secret in the Spaces environment | |
| ## Running Options | |
| - Run both UI and API: `python main.py` | |
| - Run only the API: `python main.py --mode api` | |
| - Run only the UI: `python main.py --mode ui` | |
| ## API Endpoints | |
| - `POST /text-to-image` - Generate an image from text | |
| - `POST /image-to-image` - Transform an image with optional prompt | |
| ## Default Values | |
| The application includes defaults for: | |
| - Sample prompts for text-to-image and image-to-image | |
| - Negative prompts to exclude unwanted elements | |
| - Pre-filled model names for both text-to-image and image-to-image | |
| These defaults are applied to both the Gradio UI and API endpoints for consistency. | |
| ## ControlNet Implementation | |
| The application now supports running a ControlNet depth model directly on the Hugging Face Spaces GPU using the `spaces.GPU` decorator. This feature allows for: | |
| 1. **On-device processing**: Instead of relying solely on remote inference endpoints, the app can now perform image transformations using the local GPU. | |
| 2. **Depth-based transformations**: The ControlNet implementation extracts depth information from the input image, allowing for more structure-preserving transformations. | |
| 3. **Integration with existing workflow**: The ControlNet option is seamlessly integrated into the image-to-image tab via a simple checkbox. | |
| ### How it works: | |
| 1. When a user uploads an image and enables the ControlNet option, the app processes the image through a depth estimator. | |
| 2. The depth map is then used by the ControlNet model to guide the image generation process. | |
| 3. The `spaces.GPU` decorator ensures that these operations run on the GPU for optimal performance. | |
| 4. The resulting image maintains the spatial structure of the original while applying the creative transformation specified in the prompt. | |
| The implementation uses: | |
| - `stable-diffusion-v1-5` as the base model | |
| - `lllyasviel/sd-controlnet-depth` as the ControlNet model | |
| - The HuggingFace Transformers depth estimation pipeline | |
| ## Environment Variables | |
| - `HF_TOKEN` - Your Hugging Face API token | |
| - `API_HOST` - Host for the API server (default: 0.0.0.0) | |
| - `API_PORT` - Port for the API server (default: 8000) | |
| - `GRADIO_HOST` - Host for the Gradio UI (default: 0.0.0.0) | |
| - `GRADIO_PORT` - Port for the Gradio UI (default: 7860) | |