linh101201 commited on
Commit
f284577
·
verified ·
1 Parent(s): b4e1e67

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -4
README.md CHANGED
@@ -1,11 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
2
  import torch
3
 
4
- model = AutoModelForSequenceClassification.from_pretrained("linh101201/scibert-concept-annotation", num_labels=2).to("cuda")
5
- tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")
 
 
 
 
 
 
 
 
6
 
7
- inputs = tokenizer("Large Language Model in Law Documents Hub", "natural language processing", return_tensors="pt").to("cuda")
8
 
9
  with torch.no_grad():
10
  logits = model(**inputs).logits
11
- print(logits)
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ tags:
6
+ - scibert
7
+ - concept-annotation
8
+ - nlp
9
+ - sequence-classification
10
+
11
+ metrics:
12
+ - accuracy
13
+ pipeline_tag: text-classification
14
+ ---
15
+
16
+ # SciBERT Concept Annotation
17
+
18
+ This model is a fine-tuned version of SciBERT for **Concept Annotation**. It classifies the relationship between a document text and a specific concept/term using sequence classification.
19
+
20
+ ## Model Description
21
+ - **Model type:** SciBERT (BERT-based)
22
+ - **Language(s):** English
23
+ - **License:** Apache 2.0
24
+ - **Fine-tuned from model:** `allenai/scibert_scivocab_uncased`
25
+
26
+ ## Usage
27
+
28
+ You can use this model directly with a custom inference script. Note that while the model weights are hosted here, it is designed to work with the `allenai/scibert_scivocab_uncased` tokenizer.
29
+
30
+ ### Example Code
31
+
32
+ ```python
33
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
34
  import torch
35
 
36
+ # Load model and tokenizer
37
+ model_id = "linh101201/scibert-concept-annotation"
38
+ tokenizer_id = "allenai/scibert_scivocab_uncased"
39
+
40
+ model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=2).to("cuda")
41
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)
42
+
43
+ # Example inputs: Document text and the Concept to annotate
44
+ text = "Large Language Model in Law Documents Hub"
45
+ concept = "natural language processing"
46
 
47
+ inputs = tokenizer(text, concept, return_tensors="pt").to("cuda")
48
 
49
  with torch.no_grad():
50
  logits = model(**inputs).logits
51
+ # Apply softmax to get probabilities
52
+ probs = torch.nn.functional.softmax(logits, dim=-1)
53
+ print(f"Logits: {logits}")
54
+ print(f"Probabilities: {probs}")