Attention-Seeker-V1
Attention-Seeker-V1 is a Transformer-from-scratch implementation designed as a high-fidelity recreation of the "Attention Is All You Need" architecture. This model serves as the backend for an interactive React-based visualization tool, aiming to peel back the "black box" of Neural Machine Translation (NMT).
Model Details
Model Description
- Developed by: Kinjal Chakraborty
- Model type: Standard Encoder-Decoder Transformer
- Language(s): English to French
- License: MIT
- Task: Sequence-to-Sequence Translation
Technical Specifications
- Layers: 6 Encoder layers, 6 Decoder layers
- Attention Heads: 8
- Embedding Dimension ($d_{model}$): 512
- Max Sequence Length: 5000 tokens
- Dropout: 0.1
- Vocabulary Size: 30,000 tokens
- Tokenizer: Word-level tokenizer with whitespace preprocessing. (Note: Experimental BPE and Regex tokenizers were also developed as part of this project's research phase).
Uses
Intended Use
This model is primarily an educational tool. It is optimized for use with the Attention-Seeker Frontend to visualize:
- Multi-Head Attention weights
- Encoder-Decoder cross-attention
- Positional Encodings and Layer Normalization effects
Out-of-Scope Use
Due to compute-constrained training (1 epoch), this model is a proof-of-concept. It should not be used for production-grade translation or sensitive localization tasks.
Training Details
Training Data
- Dataset:
Helsinki-NLP/opus_books(English-French subset) - Size: ~127,000 sentence pairs
Training Procedure
- Hardware: NVIDIA GeForce RTX 4060 Laptop GPU
- Training Time: ~2.5 Hours
- Optimizer: Adam
- Learning Rate: 1e-4 (Fixed)
- Batch Size: 4
- Epochs: 1 (Proof of Concept)
- Loss Function: Cross-Entropy Loss
Evaluation
Sample Translation
| Input (English) | Output (French) |
|---|---|
| hello how are you? | comment vous êtes - vous ? [EOS] |
How to Get Started
To load this model into the Attention-Seeker Python inference engine, found on github and access and interact with the front-end.
https://github.com/nullPointer0x43/Attention-Seeker
Or load and run the docker images found on: