Update README.md

8d89c9f verified about 1 month ago

4.77 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	base_model:
	- facebook/metaclip-2-worldwide-s16
	pipeline_tag: image-classification
	library_name: transformers
	tags:
	- text-generation-inference
	- open-scene
	---

	![1](https://huggingface.co/static-proxy/cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Cwn-cWX3RDmAhywdLgocX.png)

	# MetaCLIP-2-Open-Scene

	> MetaCLIP-2-Open-Scene is an image classification vision-language encoder model fine-tuned from [facebook/metaclip-2-worldwide-s16](https://huggingface.co/facebook/metaclip-2-worldwide-s16) for a single-label classification task.
	> It is designed to identify and categorize various natural and urban scenes using the MetaClip2ForImageClassification architecture.

	>[!note]
	MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062

	```
	Classification Report:
	precision recall f1-score support

	buildings 0.9644 0.9703 0.9673 2625
	forest 0.9948 0.9978 0.9963 2694
	glacier 0.9531 0.9427 0.9479 2671
	mountain 0.9470 0.9512 0.9491 2723
	sea 0.9909 0.9920 0.9915 2758
	street 0.9728 0.9694 0.9711 2874

	accuracy 0.9706 16345
	macro avg 0.9705 0.9706 0.9705 16345
	weighted avg 0.9706 0.9706 0.9706 16345
	```

	![download](https://huggingface.co/static-proxy/cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/gJUwvNsxBQAh30FprXlyP.png)

	The model classifies images into six open-scene categories:

	* Class 0: "buildings"
	* Class 1: "forest"
	* Class 2: "glacier"
	* Class 3: "mountain"
	* Class 4: "sea"
	* Class 5: "street"

	# Run with Transformers

	```python
	!pip install -q transformers torch pillow gradio
	```

	```python
	import gradio as gr
	from transformers import AutoImageProcessor
	from transformers import AutoModelForImageClassification
	from transformers.image_utils import load_image
	from PIL import Image
	import torch

	# Load model and processor
	model_name = "prithivMLmods/MetaCLIP-2-Open-Scene"
	model = AutoModelForImageClassification.from_pretrained(model_name)
	processor = AutoImageProcessor.from_pretrained(model_name)

	def scene_classification(image):
	"""Predicts the type of scene represented in an image."""
	image = Image.fromarray(image).convert("RGB")
	inputs = processor(images=image, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

	labels = {
	"0": "buildings",
	"1": "forest",
	"2": "glacier",
	"3": "mountain",
	"4": "sea",
	"5": "street"
	}
	predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}

	return predictions

	# Create Gradio interface
	iface = gr.Interface(
	fn=scene_classification,
	inputs=gr.Image(type="numpy"),
	outputs=gr.Label(label="Prediction Scores"),
	title="Open Scene Classification",
	description="Upload an image to classify the scene type (e.g., forest, sea, street, mountain, etc.)."
	)

	# Launch the app
	if __name__ == "__main__":
	iface.launch()
	```

	# Sample Inference:

	![Screenshot 2025-11-13 at 19-39-55 Open Scene Classification](https://huggingface.co/static-proxy/cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/9vHPyQsv3FeduOU1s4A6r.png)
	![Screenshot 2025-11-13 at 19-37-07 Open Scene Classification](https://huggingface.co/static-proxy/cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/H_5f4qIB5XYQuOyeNGh_P.png)
	![Screenshot 2025-11-13 at 19-37-50 Open Scene Classification](https://huggingface.co/static-proxy/cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/KM0hBDUFF5-zU9tCXoU1w.png)
	![Screenshot 2025-11-13 at 19-38-37 Open Scene Classification](https://huggingface.co/static-proxy/cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/ErSRLvFr3fLGUTO86klOT.png)
	![Screenshot 2025-11-13 at 19-39-24 Open Scene Classification](https://huggingface.co/static-proxy/cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/_NYjjKZXQx8fJGOyjhY4r.png)

	# Intended Use:

	The MetaCLIP-2-Open-Scene model is designed to classify a wide range of natural and urban environments.
	Potential use cases include:

	* Geographical Image Analysis: Categorizing landscapes for environmental and mapping studies.
	* Tourism and Travel Applications: Automatically tagging scenic photos for organization and recommendations.
	* Autonomous Systems: Supporting navigation and perception in robotics and self-driving systems.
	* Environmental Monitoring: Detecting and classifying geographic features for research.
	* Media and Photography: Assisting in photo organization and content-based retrieval.