--- license: mit tags: - protein-generation - antimicrobial-peptides - flow-matching - protein-design - esm - amp library_name: pytorch --- # FlowFinal: AMP Flow Matching Model FlowFinal is a state-of-the-art flow matching model for generating antimicrobial peptides (AMPs). The model uses continuous normalizing flows to generate protein sequences in the ESM-2 embedding space. ## Model Description - **Model Type**: Flow Matching for Protein Generation - **Domain**: Antimicrobial Peptide (AMP) Generation - **Base Model**: ESM-2 (650M parameters) - **Architecture**: Transformer-based flow matching with classifier-free guidance (CFG) - **Training Data**: Curated AMP dataset with ~7K sequences ## Key Features - **Classifier-Free Guidance (CFG)**: Enables controlled generation with different conditioning strengths - **ESM-2 Integration**: Leverages pre-trained protein language model embeddings - **Compression Architecture**: Efficient 16x compression of ESM-2 embeddings (1280 → 80 dimensions) - **Multiple CFG Scales**: Support for no conditioning (0.0), weak (3.0), strong (7.5), and very strong (15.0) guidance ## Model Components ### Core Architecture - `final_flow_model.py`: Main flow matching model implementation - `compressor_with_embeddings.py`: Embedding compression/decompression modules - `final_sequence_decoder.py`: ESM-2 embedding to sequence decoder ### Trained Weights - `final_compressor_model.pth`: Trained compressor (315MB) - `final_decompressor_model.pth`: Trained decompressor (158MB) - `amp_flow_model_final_optimized.pth`: Main flow model checkpoint ### Generated Samples (Today's Results) - Generated AMP sequences with different CFG scales - HMD-AMP validation results showing 8.8% AMP prediction rate ## Performance Results ### HMD-AMP Validation (80 sequences tested) - **Total AMPs Predicted**: 7/80 (8.8%) - **By CFG Configuration**: - No CFG: 1/20 (5.0%) - Weak CFG: 2/20 (10.0%) - Strong CFG: 4/20 (20.0%) ← Best performance - Very Strong CFG: 0/20 (0.0%) ### Best Performing Sequences 1. `ILVLVLARRIVGVIVAKVVLYAIVRSVVAAAKSISAVTVAKVTVFFQTTA` (No CFG) 2. `EDLSKAKAELQRYLLLSEIVSAFTALTRFYVVLTKIFQIRVKLIAVGQIL` (Weak CFG) 3. `IKLSRIAGIIVKRIRVASGDAQRLITASIGFTLSVVLAARFITIILGIVI` (Strong CFG) ## Usage ```python from generate_amps import AMPGenerator # Initialize generator generator = AMPGenerator( model_path="amp_flow_model_final_optimized.pth", device='cuda' ) # Generate AMP samples samples = generator.generate_amps( num_samples=20, num_steps=25, cfg_scale=7.5 # Strong CFG recommended ) ``` ## Training Details - **Optimizer**: AdamW with cosine annealing - **Learning Rate**: 4e-4 (final) - **Epochs**: 2000 - **Final Loss**: 1.318 - **Training Time**: 2.3 hours on H100 - **Dataset Size**: 6,983 samples ## Files Structure ``` FlowFinal/ ├── models/ │ ├── final_compressor_model.pth │ ├── final_decompressor_model.pth │ └── amp_flow_model_final_optimized.pth ├── generated_samples/ │ ├── generated_sequences_20250829.fasta │ └── hmd_amp_detailed_results.csv ├── src/ │ ├── final_flow_model.py │ ├── compressor_with_embeddings.py │ ├── final_sequence_decoder.py │ └── generate_amps.py └── README.md ``` ## Citation If you use FlowFinal in your research, please cite: ```bibtex @misc{flowfinal2025, title={FlowFinal: Flow Matching for Antimicrobial Peptide Generation}, author={Edward Sun}, year={2025}, url={https://huggingface.co/esunAI/FlowFinal} } ``` ## License This model is released under the MIT License.