--- license: cc-by-nc-4.0 language: - en base_model: - facebook/metaclip-2-worldwide-s16 pipeline_tag: image-classification library_name: transformers tags: - text-generation-inference - open-scene --- ![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Cwn-cWX3RDmAhywdLgocX.png) # **MetaCLIP-2-Open-Scene** > **MetaCLIP-2-Open-Scene** is an image classification vision-language encoder model fine-tuned from **[facebook/metaclip-2-worldwide-s16](https://huggingface.co/facebook/metaclip-2-worldwide-s16)** for a single-label classification task. > It is designed to identify and categorize various natural and urban scenes using the **MetaClip2ForImageClassification** architecture. >[!note] MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062 ``` Classification Report: precision recall f1-score support buildings 0.9644 0.9703 0.9673 2625 forest 0.9948 0.9978 0.9963 2694 glacier 0.9531 0.9427 0.9479 2671 mountain 0.9470 0.9512 0.9491 2723 sea 0.9909 0.9920 0.9915 2758 street 0.9728 0.9694 0.9711 2874 accuracy 0.9706 16345 macro avg 0.9705 0.9706 0.9705 16345 weighted avg 0.9706 0.9706 0.9706 16345 ``` ![download](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/gJUwvNsxBQAh30FprXlyP.png) The model classifies images into six open-scene categories: * **Class 0:** "buildings" * **Class 1:** "forest" * **Class 2:** "glacier" * **Class 3:** "mountain" * **Class 4:** "sea" * **Class 5:** "street" # **Run with Transformers** ```python !pip install -q transformers torch pillow gradio ``` ```python import gradio as gr from transformers import AutoImageProcessor from transformers import AutoModelForImageClassification from transformers.image_utils import load_image from PIL import Image import torch # Load model and processor model_name = "prithivMLmods/MetaCLIP-2-Open-Scene" model = AutoModelForImageClassification.from_pretrained(model_name) processor = AutoImageProcessor.from_pretrained(model_name) def scene_classification(image): """Predicts the type of scene represented in an image.""" image = Image.fromarray(image).convert("RGB") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() labels = { "0": "buildings", "1": "forest", "2": "glacier", "3": "mountain", "4": "sea", "5": "street" } predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))} return predictions # Create Gradio interface iface = gr.Interface( fn=scene_classification, inputs=gr.Image(type="numpy"), outputs=gr.Label(label="Prediction Scores"), title="Open Scene Classification", description="Upload an image to classify the scene type (e.g., forest, sea, street, mountain, etc.)." ) # Launch the app if __name__ == "__main__": iface.launch() ``` # **Sample Inference:** ![Screenshot 2025-11-13 at 19-39-55 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/9vHPyQsv3FeduOU1s4A6r.png) ![Screenshot 2025-11-13 at 19-37-07 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/H_5f4qIB5XYQuOyeNGh_P.png) ![Screenshot 2025-11-13 at 19-37-50 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/KM0hBDUFF5-zU9tCXoU1w.png) ![Screenshot 2025-11-13 at 19-38-37 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/ErSRLvFr3fLGUTO86klOT.png) ![Screenshot 2025-11-13 at 19-39-24 Open Scene Classification](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/_NYjjKZXQx8fJGOyjhY4r.png) # **Intended Use:** The **MetaCLIP-2-Open-Scene** model is designed to classify a wide range of natural and urban environments. Potential use cases include: * **Geographical Image Analysis:** Categorizing landscapes for environmental and mapping studies. * **Tourism and Travel Applications:** Automatically tagging scenic photos for organization and recommendations. * **Autonomous Systems:** Supporting navigation and perception in robotics and self-driving systems. * **Environmental Monitoring:** Detecting and classifying geographic features for research. * **Media and Photography:** Assisting in photo organization and content-based retrieval.