Attention-Seeker-V1

Attention-Seeker-V1 is a Transformer-from-scratch implementation designed as a high-fidelity recreation of the "Attention Is All You Need" architecture. This model serves as the backend for an interactive React-based visualization tool, aiming to peel back the "black box" of Neural Machine Translation (NMT).

Model Details

Model Description

  • Developed by: Kinjal Chakraborty
  • Model type: Standard Encoder-Decoder Transformer
  • Language(s): English to French
  • License: MIT
  • Task: Sequence-to-Sequence Translation

Technical Specifications

  • Layers: 6 Encoder layers, 6 Decoder layers
  • Attention Heads: 8
  • Embedding Dimension ($d_{model}$): 512
  • Max Sequence Length: 5000 tokens
  • Dropout: 0.1
  • Vocabulary Size: 30,000 tokens
  • Tokenizer: Word-level tokenizer with whitespace preprocessing. (Note: Experimental BPE and Regex tokenizers were also developed as part of this project's research phase).

Uses

Intended Use

This model is primarily an educational tool. It is optimized for use with the Attention-Seeker Frontend to visualize:

  • Multi-Head Attention weights
  • Encoder-Decoder cross-attention
  • Positional Encodings and Layer Normalization effects

Out-of-Scope Use

Due to compute-constrained training (1 epoch), this model is a proof-of-concept. It should not be used for production-grade translation or sensitive localization tasks.

Training Details

Training Data

  • Dataset: Helsinki-NLP/opus_books (English-French subset)
  • Size: ~127,000 sentence pairs

Training Procedure

  • Hardware: NVIDIA GeForce RTX 4060 Laptop GPU
  • Training Time: ~2.5 Hours
  • Optimizer: Adam
  • Learning Rate: 1e-4 (Fixed)
  • Batch Size: 4
  • Epochs: 1 (Proof of Concept)
  • Loss Function: Cross-Entropy Loss

Evaluation

Sample Translation

Input (English) Output (French)
hello how are you? comment vous êtes - vous ? [EOS]

How to Get Started

To load this model into the Attention-Seeker Python inference engine, found on github and access and interact with the front-end.

https://github.com/nullPointer0x43/Attention-Seeker

Or load and run the docker images found on:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train kinjal2/Attention-Seeker-Model