Diffusers documentation
LongCat-Image
LongCat-Image
We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.
Key Features
- 🌟 Exceptional Efficiency and Performance: With only 6B parameters, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
- 🌟 Superior Editing Performance: LongCat-Image-Edit model achieves state-of-the-art performance among open-source models, delivering leading instruction-following and image quality with superior visual consistency.
- 🌟 Powerful Chinese Text Rendering: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
- 🌟 Remarkable Photorealism: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.
- 🌟 Comprehensive Open-Source Ecosystem: We provide a complete toolchain, from intermediate checkpoints to full training code, significantly lowering the barrier for further research and development.
For more details, please refer to the comprehensive LongCat-Image Technical Report
Usage Example
import torch
import diffusers
from diffusers import LongCatImagePipeline
weight_dtype = torch.bfloat16
pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype=torch.bfloat16 )
pipe.to('cuda')
# pipe.enable_model_cpu_offload()
prompt = '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。'
image = pipe(
prompt,
height=768,
width=1344,
guidance_scale=4.0,
num_inference_steps=50,
num_images_per_prompt=1,
generator=torch.Generator("cpu").manual_seed(43),
enable_cfg_renorm=True,
enable_prompt_rewrite=True,
).images[0]
image.save(f'./longcat_image_t2i_example.png')This pipeline was contributed by LongCat-Image Team. The original codebase can be found here.
Available models:
| Models | Type | Description | Download Link |
|---|---|---|---|
| LongCat‑Image | Text‑to‑Image | Final Release. The standard model for out‑of‑the‑box inference. | 🤗 Huggingface |
| LongCat‑Image‑Dev | Text‑to‑Image | Development. Mid-training checkpoint, suitable for fine-tuning. | 🤗 Huggingface |
| LongCat‑Image‑Edit | Image Editing | Specialized model for image editing. | 🤗 Huggingface |
LongCatImagePipeline
class diffusers.LongCatImagePipeline
< source >( scheduler: FlowMatchEulerDiscreteScheduler vae: AutoencoderKL text_encoder: Qwen2_5_VLForConditionalGeneration tokenizer: Qwen2Tokenizer text_processor: Qwen2VLProcessor transformer: LongCatImageTransformer2DModel )
The pipeline for text-to-image generation.
- all
- call
LongCatImagePipelineOutput
class diffusers.pipelines.longcat_image.LongCatImagePipelineOutput
< source >( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )
Output class for Stable Diffusion pipelines.