--- license: apache-2.0 datasets: - BLIP3o/BLIP3o-Pretrain-Long-Caption - BLIP3o/BLIP3o-Pretrain-Short-Caption - BLIP3o/BLIP3o-Pretrain-JourneyDB - UCSC-VLAA/GPT-Image-Edit-1.5M - BLIP3o/BLIP3o-60k - FreedomIntelligence/ShareGPT-4o-Image base_model: - OpenGVLab/InternVL3-1B --- This repository contains the model (1B version) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing. UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a **two-stage and self-distillation training** for reconstruction, we empower CLIP to achieve excellent reconstruction results **without compromising its original understanding abilities**. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks. For more details, please refer to the original paper and the GitHub repository: Paper: https://www.arxiv.org/abs/2507.23278 GitHub: https://github.com/nnnth/UniLIP