Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation Paper • 2512.02457 • Published 8 days ago • 13
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback Paper • 2510.16888 • Published Oct 19 • 21
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25, 2024 • 19
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Paper • 2505.23606 • Published May 29 • 14
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7 • 82
RelationBooth: Towards Relation-Aware Customized Object Generation Paper • 2410.23280 • Published Oct 30, 2024 • 1
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer Paper • 2503.17350 • Published Mar 21 • 1
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Paper • 2503.14941 • Published Mar 19 • 5
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Paper • 2310.03128 • Published Oct 4, 2023 • 1
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark Paper • 2402.04788 • Published Feb 7, 2024
The Best of Both Worlds: Toward an Honest and Helpful Large Language Model Paper • 2406.00380 • Published Jun 1, 2024
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents Paper • 2406.10819 • Published Jun 16, 2024 • 2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models Paper • 2406.18966 • Published Jun 27, 2024
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models Paper • 2306.11507 • Published Jun 20, 2023
LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected? Paper • 2401.05952 • Published Jan 11, 2024
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 45
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge Paper • 2410.02736 • Published Oct 3, 2024 • 1
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 166