|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- protein-generation |
|
|
- antimicrobial-peptides |
|
|
- flow-matching |
|
|
- protein-design |
|
|
- esm |
|
|
- amp |
|
|
library_name: pytorch |
|
|
--- |
|
|
|
|
|
# FlowFinal: AMP Flow Matching Model |
|
|
|
|
|
FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Model Type**: Flow Matching for Protein Generation |
|
|
- **Domain**: Antimicrobial Peptide (AMP) Generation |
|
|
- **Base Model**: ESM-2 (650M parameters) |
|
|
- **Architecture**: Transformer-based flow matching with classifier-free guidance (CFG) |
|
|
- **Training Data**: Curated AMP dataset with ~7K sequences |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- **Classifier-Free Guidance (CFG)**: Enables controlled generation with different conditioning strengths |
|
|
- **ESM-2 Integration**: Leverages pre-trained protein language model embeddings |
|
|
- **Compression Architecture**: Efficient 16x compression of ESM-2 embeddings (1280 β 80 dimensions) |
|
|
- **Multiple CFG Scales**: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance |
|
|
|
|
|
## Model Components |
|
|
|
|
|
### Core Architecture |
|
|
- `final_flow_model.py`: Main flow matching model implementation |
|
|
- `compressor_with_embeddings.py`: Embedding compression/decompression modules |
|
|
- `final_sequence_decoder.py`: ESM-2 embedding to sequence decoder |
|
|
|
|
|
### Trained Weights |
|
|
- `final_compressor_model.pth`: Trained compressor (315MB) |
|
|
- `final_decompressor_model.pth`: Trained decompressor (158MB) |
|
|
- `amp_flow_model_final_optimized.pth`: Main flow model checkpoint |
|
|
|
|
|
### Generated Samples (Today's Results) |
|
|
- Generated AMP sequences with different CFG scales |
|
|
- HMD-AMP validation results showing 8.8% AMP prediction rate |
|
|
|
|
|
## Performance Results |
|
|
|
|
|
### HMD-AMP Validation (80 sequences tested) |
|
|
- **Total AMPs Predicted**: 7/80 (8.8%) |
|
|
- **By CFG Configuration**: |
|
|
- No CFG: 1/20 (5.0%) |
|
|
- Weak CFG: 2/20 (10.0%) |
|
|
- Strong CFG: 4/20 (20.0%) β Best performance |
|
|
- Very Strong CFG: 0/20 (0.0%) |
|
|
|
|
|
### Best Performing Sequences |
|
|
1. `ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA` (No CFG) |
|
|
2. `EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL` (Weak CFG) |
|
|
3. `IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI` (Strong CFG) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from generate_amps import AMPGenerator |
|
|
|
|
|
# Initialize generator |
|
|
generator = AMPGenerator( |
|
|
model_path="amp_flow_model_final_optimized.pth", |
|
|
device='cuda' |
|
|
) |
|
|
|
|
|
# Generate AMP samples |
|
|
samples = generator.generate_amps( |
|
|
num_samples=20, |
|
|
num_steps=25, |
|
|
cfg_scale=7.5 # Strong CFG recommended |
|
|
) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Optimizer**: AdamW with cosine annealing |
|
|
- **Learning Rate**: 4e-4 (final) |
|
|
- **Epochs**: 2000 |
|
|
- **Final Loss**: 1.318 |
|
|
- **Training Time**: 2.3 hours on H100 |
|
|
- **Dataset Size**: 6,983 samples |
|
|
|
|
|
## Files Structure |
|
|
|
|
|
``` |
|
|
FlowFinal/ |
|
|
βββ models/ |
|
|
β βββ final_compressor_model.pth |
|
|
β βββ final_decompressor_model.pth |
|
|
β βββ amp_flow_model_final_optimized.pth |
|
|
βββ generated_samples/ |
|
|
β βββ generated_sequences_20250829.fasta |
|
|
β βββ hmd_amp_detailed_results.csv |
|
|
βββ src/ |
|
|
β βββ final_flow_model.py |
|
|
β βββ compressor_with_embeddings.py |
|
|
β βββ final_sequence_decoder.py |
|
|
β βββ generate_amps.py |
|
|
βββ README.md |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use FlowFinal in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{flowfinal2025, |
|
|
title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation}, |
|
|
author={Edward Sun}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/esunAI/FlowFinal} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the MIT License. |
|
|
|