BCE-Vir-Prediction
A virus epitope prediction tool based on ESM (Evolutionary Scale Modeling). This tool uses a pre-trained ESM classification model to perform sliding window predictions on protein sequences, identifying potential antigen epitopes and functional domains.
Features
- Epitope Prediction (
bcepre_predict_logits.py): Uses a pre-trained ESM classification model to split protein sequences with sliding windows, performs classification predictions on each subsequence (e.g., whether it is an antigen epitope, functional domain, etc.), and saves prediction results along with corresponding logits values. - Amino Acid Probability Prediction (
bcepre_predict_softmax.py): Converts sliding window prediction results into probability values aggregated by amino acid position, outputting a results table containing amino acid types, epitope probabilities, and coverage counts.
Model
The pre-trained model can be downloaded from Hugging Face:
Model Repository: jackkuo/BCE-Vir-Prediction_model
Code Repository: JackKuo666/BCE-Vir-Prediction
Model Download Instructions
This folder is used to store the trained ESM model files.
How to Download the Model
Method 1: Using Hugging Face Hub (Recommended)
Use the huggingface_hub library to download the model:
pip install huggingface_hub
Then run the following Python code:
from huggingface_hub import snapshot_download
# Download the model to the current folder
snapshot_download(
repo_id="jackkuo/BCE-Vir-Prediction_model",
local_dir="./",
local_dir_use_symlinks=False
)
Or use huggingface-cli in the command line:
huggingface-cli download jackkuo/BCE-Vir-Prediction_model --local-dir ./ --local-dir-use-symlinks False
Method 2: Using Git LFS
If Git LFS is installed, you can clone directly:
git lfs install
git clone https://huggingface.co/jackkuo/BCE-Vir-Prediction_model .
Method 3: Manual Download
Visit the model page: https://huggingface.co/jackkuo/BCE-Vir-Prediction_model
Select the required files from the file list to download and save them to this folder.
Model File Structure
After downloading, this folder should contain the following files:
config.json- Model configuration filemodel.safetensors- Model weights file (in safetensors format)tokenizer_config.json- Tokenizer configuration filevocab.txt- Vocabulary filespecial_tokens_map.json- Special tokens mapping file
Usage
Step 1: Download the Model
First, download the pre-trained model to the trained_esm_model folder.
Step 2: Prepare Input Files
Place the protein sequence file (FASTA format) to be predicted in the example_data folder, or modify the input file path in the script.
Step 3: Run Epitope Prediction
Run the bcepre_predict_logits.py script for epitope prediction:
python bcepre_predict_logits.py
This script will:
- Read the protein sequence file in FASTA format
- Split the sequence using sliding windows (default minimum window size is 5)
- Perform classification predictions on each subsequence
- Output a CSV file containing the following fields:
sequence: Subsequencewindow_size: Window sizeprediction: Predicted classlogit_0,logit_1, ...: Logits values for each class
Output files are saved in the predictions/ folder by default.
Step 4: Calculate Amino Acid Position Probabilities
Run the bcepre_predict_softmax.py script to convert prediction results into aggregated probabilities by amino acid position:
python bcepre_predict_softmax.py
This script will:
- Read the CSV file generated by
bcepre_predict_logits.py - Calculate epitope probability for each subsequence (using softmax function)
- Aggregate probability values by amino acid position
- Output a CSV file containing the following fields:
position: Amino acid position (starting from 1)amino_acid: Amino acid typeprobability: Epitope probability at this position (average of all window predictions covering this position)coverage: Number of windows covering this position
License
This project is licensed under the MIT License. See the LICENSE file for details.
Citation
If you use this tool for research, please cite the relevant models and code repositories.
Contact
For questions or suggestions, please contact us through GitHub Issues.