Instructions to use NeuML/pubmedbert-base-embeddings with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use NeuML/pubmedbert-base-embeddings with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("NeuML/pubmedbert-base-embeddings") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use NeuML/pubmedbert-base-embeddings with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("NeuML/pubmedbert-base-embeddings") model = AutoModel.from_pretrained("NeuML/pubmedbert-base-embeddings") - Inference
- Notebooks
- Google Colab
- Kaggle
Training Data
Hi!
Love the model and am working with it for my phd. Would it possible for you to share the training dataset? I would like to train a modern Bert model with a larger context window with the same objective.
Thanks!
Jonatan
Thank you, I appreciate it!
The dataset is just a random sample of PubMed title/abstract pairs, so I don't think it's hard to reproduce and probably could even be improved upon with good dataset engineering/analysis/parameter tuning. Then for each randomly selected article, a similar title is found. PaperETL can handle all the PubMed article processing.
There is also another model that uses a ModernBERT fine-tuned model as the base: https://huggingface.co/NeuML/bioclinical-modernbert-base-embeddings
Perfect! In that case I'll just use that model instead.
Thanks!