Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

README.md +88 -0
config.json +24 -0
model.safetensors +3 -0
preprocessor_config.json +23 -0
special_tokens_map.json +23 -0
spiece.model +3 -0
tokenizer.json +0 -0
tokenizer_config.json +33 -0

README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+---
+license: apache-2.0
+tags:
+- Vision
+- Multi-model
+- Vision-Language
+- Remote-sensing
+widget:
+- src: >-
+    https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png
+  candidate_labels: playing music, playing sports
+  example_title: Cat & Dog
+---
+# Git-RSCLIP
+Git-RSCLIP model is pre-trained on the Git-10M dataset (a global-scale remote sensing image-text pair dataset, consisting of 10 million image-text pairs) at size 256x256, first released in [this repository](https://github.com/chen-yang-liu/Text2Earth). It employs a similar structure to the [SigLIP](https://arxiv.org/abs/2303.15343) from Google.
+## Intended uses & limitations
+You can use the raw model for tasks like zero-shot image classification and image-text retrieval.
+### How to use
+Here is how to use this model to perform zero-shot image classification:
+```python
+from PIL import Image
+import requests
+from transformers import AutoProcessor, AutoModel
+import torch
+model = AutoModel.from_pretrained("lcybuaa/Git-RSCLIP")
+processor = AutoProcessor.from_pretrained("lcybuaa/Git-RSCLIP")
+url = "https://github.com/Chen-Yang-Liu/PromptCC/blob/main/Example/B/train_000051.png?raw=true"
+image = Image.open(requests.get(url, stream=True).raw)
+texts = ["a remote sensing image of river", "a remote sensing image of houses and roads"]
+inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+logits_per_image = outputs.logits_per_image
+probs = torch.sigmoid(logits_per_image) # these are the probabilities
+top5_indices = torch.argsort(probs, descending=True)[:, :5].cpu().numpy()
+top1_indices = top5_indices[:, 0]
+print(f"the image 0 is '{top1_indices[0]}'")
+```
+For more code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/siglip.html#).
+## Training procedure
+### Training data
+Git-RSCLIP is pre-trained on the Git-10M dataset (a global-scale remote sensing image-text pair dataset, consisting of 10 million image-text pairs) [(Liu et al., 2024)](https://github.com/chen-yang-liu/Text2Earth).
+### Preprocessing
+Images are resized/rescaled to the same resolution (256x256) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
+Texts are tokenized and padded to the same length (64 tokens).
+## Evaluation results
+Evaluation of Git-RSCLIP compared to other CLIP is shown below (taken from the paper).
+<img src="https://huggingface.co/lcybuaa/Git-RSCLIP/resolve/main/Git-RSCLIP.png"
+alt="drawing" width="600"/>
+### BibTeX entry and citation info
+```bibtex
+@misc{liu2025text2earthunlockingtextdrivenremote,
+      title={Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model},
+      author={Chenyang Liu and Keyan Chen and Rui Zhao and Zhengxia Zou and Zhenwei Shi},
+      year={2025},
+      eprint={2501.00895},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2501.00895},
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "architectures": [
+    "SiglipModel"
+  ],
+  "initializer_factor": 1.0,
+  "model_type": "siglip",
+  "text_config": {
+    "hidden_size": 1024,
+    "intermediate_size": 4096,
+    "model_type": "siglip_text_model",
+    "num_attention_heads": 16,
+    "num_hidden_layers": 24
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.37.0.dev0",
+  "vision_config": {
+    "hidden_size": 1024,
+    "image_size": 256,
+    "intermediate_size": 4096,
+    "model_type": "siglip_vision_model",
+    "num_attention_heads": 16,
+    "num_hidden_layers": 24
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3cfce8058d573fa2dae91e0b872e2af724ae9337a923ff8b674a2aed4bd92750
+size 1304397540

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "SiglipImageProcessor",
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "processor_class": "SiglipProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 256,
+    "width": 256
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": true,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": true,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": true,
+    "single_word": false
+  }
+}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1e5036bed065526c3c212dfbe288752391797c4bb1a284aa18c9a0b23fcaf8ec
+size 798330

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "added_tokens_decoder": {
+    "1": {
+      "content": "</s>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<unk>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": "</s>",
+  "model_input_names": [
+    "input_ids"
+  ],
+  "model_max_length": 64,
+  "pad_token": "</s>",
+  "processor_class": "SiglipProcessor",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "SiglipTokenizer",
+  "unk_token": "<unk>"
+}