VibecoderMcSwaggins commited on
Commit
9c9d382
Β·
1 Parent(s): 9cb4917

docs: Add CRITICAL HuggingFace Free Tier section to all agent context files

Browse files

All three files (AGENTS.md, GEMINI.md, CLAUDE.md) now have identical,
prominent documentation about the HuggingFace Free Tier architecture:

- Native Serverless (< 30B) vs Inference Providers (70B+)
- Why large models fail (Novita 500, Hyperbolic 401)
- The rule: Free Tier MUST use < 30B models
- Current safe models table

This prevents future confusion about model selection.

Files changed (3) hide show
  1. AGENTS.md +35 -11
  2. CLAUDE.md +35 -12
  3. GEMINI.md +35 -11
AGENTS.md CHANGED
@@ -100,20 +100,44 @@ DeepBonerError (base)
100
  └── EmbeddingError
101
  ```
102
 
103
- ## LLM Model Defaults (November 2025)
104
 
105
- Given the rapid advancements, as of November 29, 2025, the DeepBoner project uses the following default LLM models in its configuration (`src/utils/config.py`):
106
 
107
- - **OpenAI:** `gpt-5`
108
- - Current flagship model (November 2025). Requires Tier 5 access.
109
- - **Anthropic:** `claude-sonnet-4-5-20250929`
110
- - This is the mid-range Claude 4.5 model, released on September 29, 2025.
111
- - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
112
- - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-72B-Instruct`
113
- - Changed from Llama-3.1-70B (Dec 2025) due to HuggingFace routing Llama to Hyperbolic provider which has unreliable "staging mode" auth.
114
- - Qwen 2.5 72B offers comparable quality and works reliably via HuggingFace's native infrastructure.
115
 
116
- It is crucial to keep these defaults updated as the LLM landscape evolves.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
  ## Testing
119
 
 
100
  └── EmbeddingError
101
  ```
102
 
103
+ ## LLM Model Defaults (December 2025)
104
 
105
+ Default models in `src/utils/config.py`:
106
 
107
+ - **OpenAI:** `gpt-5` - Flagship model (requires Tier 5 access)
108
+ - **Anthropic:** `claude-sonnet-4-5-20250929` - Mid-range Claude 4.5
109
+ - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
 
 
 
 
 
110
 
111
+ ---
112
+
113
+ ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture
114
+
115
+ **THIS IS IMPORTANT - READ BEFORE CHANGING THE FREE TIER MODEL**
116
+
117
+ HuggingFace has TWO execution paths for inference:
118
+
119
+ | Path | Host | Reliability | Model Size |
120
+ |------|------|-------------|------------|
121
+ | **Native Serverless** | HuggingFace infrastructure | βœ… High | < 30B params |
122
+ | **Inference Providers** | Third-party (Novita, Hyperbolic) | ❌ Unreliable | 70B+ params |
123
+
124
+ **The Trap:** When you request a large model (70B+) without a paid API key, HuggingFace **silently routes** the request to third-party providers. These providers have:
125
+ - 500 Internal Server Errors (Novita - current)
126
+ - 401 "Staging Mode" auth failures (Hyperbolic - past)
127
+
128
+ **The Rule:** Free Tier MUST use models < 30B to stay on native infrastructure.
129
+
130
+ **Current Safe Models (Dec 2025):**
131
+ | Model | Size | Status |
132
+ |-------|------|--------|
133
+ | `Qwen/Qwen2.5-7B-Instruct` | 7B | βœ… **DEFAULT** - Native, reliable |
134
+ | `mistralai/Mistral-Nemo-Instruct-2407` | 12B | βœ… Native, reliable |
135
+ | `Qwen/Qwen2.5-72B-Instruct` | 72B | ❌ Routed to Novita (500 errors) |
136
+ | `meta-llama/Llama-3.1-70B-Instruct` | 70B | ❌ Routed to Hyperbolic (401 errors) |
137
+
138
+ **See:** `HF_FREE_TIER_ANALYSIS.md` for full analysis.
139
+
140
+ ---
141
 
142
  ## Testing
143
 
CLAUDE.md CHANGED
@@ -107,21 +107,44 @@ DeepBonerError (base)
107
  - **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
108
  - **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
109
 
110
- ## LLM Model Defaults (November 2025)
111
 
112
- Given the rapid advancements, as of November 29, 2025, the DeepBoner project uses the following default LLM models in its configuration (`src/utils/config.py`):
113
 
114
- - **OpenAI:** `gpt-5`
115
- - Current flagship model (November 2025). Requires Tier 5 access.
116
- - **Anthropic:** `claude-sonnet-4-5-20250929`
117
- - This is the mid-range Claude 4.5 model, released on September 29, 2025.
118
- - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
119
- - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct`
120
- - Large models (70B+) are routed to third-party providers (Novita, Hyperbolic) with unreliable free tiers.
121
- - Qwen 2.5 7B is small enough to run on HuggingFace's native serverless infrastructure.
122
- - See `HF_FREE_TIER_ANALYSIS.md` for detailed analysis.
123
 
124
- It is crucial to keep these defaults updated as the LLM landscape evolves.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
  ## Git Workflow
127
 
 
107
  - **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
108
  - **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
109
 
110
+ ## LLM Model Defaults (December 2025)
111
 
112
+ Default models in `src/utils/config.py`:
113
 
114
+ - **OpenAI:** `gpt-5` - Flagship model (requires Tier 5 access)
115
+ - **Anthropic:** `claude-sonnet-4-5-20250929` - Mid-range Claude 4.5
116
+ - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
 
 
 
 
 
 
117
 
118
+ ---
119
+
120
+ ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture
121
+
122
+ **THIS IS IMPORTANT - READ BEFORE CHANGING THE FREE TIER MODEL**
123
+
124
+ HuggingFace has TWO execution paths for inference:
125
+
126
+ | Path | Host | Reliability | Model Size |
127
+ |------|------|-------------|------------|
128
+ | **Native Serverless** | HuggingFace infrastructure | βœ… High | < 30B params |
129
+ | **Inference Providers** | Third-party (Novita, Hyperbolic) | ❌ Unreliable | 70B+ params |
130
+
131
+ **The Trap:** When you request a large model (70B+) without a paid API key, HuggingFace **silently routes** the request to third-party providers. These providers have:
132
+ - 500 Internal Server Errors (Novita - current)
133
+ - 401 "Staging Mode" auth failures (Hyperbolic - past)
134
+
135
+ **The Rule:** Free Tier MUST use models < 30B to stay on native infrastructure.
136
+
137
+ **Current Safe Models (Dec 2025):**
138
+ | Model | Size | Status |
139
+ |-------|------|--------|
140
+ | `Qwen/Qwen2.5-7B-Instruct` | 7B | βœ… **DEFAULT** - Native, reliable |
141
+ | `mistralai/Mistral-Nemo-Instruct-2407` | 12B | βœ… Native, reliable |
142
+ | `Qwen/Qwen2.5-72B-Instruct` | 72B | ❌ Routed to Novita (500 errors) |
143
+ | `meta-llama/Llama-3.1-70B-Instruct` | 70B | ❌ Routed to Hyperbolic (401 errors) |
144
+
145
+ **See:** `HF_FREE_TIER_ANALYSIS.md` for full analysis.
146
+
147
+ ---
148
 
149
  ## Git Workflow
150
 
GEMINI.md CHANGED
@@ -82,20 +82,44 @@ Settings via pydantic-settings from `.env`:
82
  - `MAX_ITERATIONS`: 1-50, default 10
83
  - `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
84
 
85
- ## LLM Model Defaults (November 2025)
86
 
87
- Given the rapid advancements, as of November 29, 2025, the DeepBoner project uses the following default LLM models in its configuration (`src/utils/config.py`):
88
 
89
- - **OpenAI:** `gpt-5`
90
- - Current flagship model (November 2025). Requires Tier 5 access.
91
- - **Anthropic:** `claude-sonnet-4-5-20250929`
92
- - This is the mid-range Claude 4.5 model, released on September 29, 2025.
93
- - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
94
- - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-72B-Instruct`
95
- - Changed from Llama-3.1-70B (Dec 2025) due to HuggingFace routing Llama to Hyperbolic provider which has unreliable "staging mode" auth.
96
- - Qwen 2.5 72B offers comparable quality and works reliably via HuggingFace's native infrastructure.
97
 
98
- It is crucial to keep these defaults updated as the LLM landscape evolves.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
  ## Development Conventions
101
 
 
82
  - `MAX_ITERATIONS`: 1-50, default 10
83
  - `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
84
 
85
+ ## LLM Model Defaults (December 2025)
86
 
87
+ Default models in `src/utils/config.py`:
88
 
89
+ - **OpenAI:** `gpt-5` - Flagship model (requires Tier 5 access)
90
+ - **Anthropic:** `claude-sonnet-4-5-20250929` - Mid-range Claude 4.5
91
+ - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-7B-Instruct` - See critical note below
 
 
 
 
 
92
 
93
+ ---
94
+
95
+ ## ⚠️ CRITICAL: HuggingFace Free Tier Architecture
96
+
97
+ **THIS IS IMPORTANT - READ BEFORE CHANGING THE FREE TIER MODEL**
98
+
99
+ HuggingFace has TWO execution paths for inference:
100
+
101
+ | Path | Host | Reliability | Model Size |
102
+ |------|------|-------------|------------|
103
+ | **Native Serverless** | HuggingFace infrastructure | βœ… High | < 30B params |
104
+ | **Inference Providers** | Third-party (Novita, Hyperbolic) | ❌ Unreliable | 70B+ params |
105
+
106
+ **The Trap:** When you request a large model (70B+) without a paid API key, HuggingFace **silently routes** the request to third-party providers. These providers have:
107
+ - 500 Internal Server Errors (Novita - current)
108
+ - 401 "Staging Mode" auth failures (Hyperbolic - past)
109
+
110
+ **The Rule:** Free Tier MUST use models < 30B to stay on native infrastructure.
111
+
112
+ **Current Safe Models (Dec 2025):**
113
+ | Model | Size | Status |
114
+ |-------|------|--------|
115
+ | `Qwen/Qwen2.5-7B-Instruct` | 7B | βœ… **DEFAULT** - Native, reliable |
116
+ | `mistralai/Mistral-Nemo-Instruct-2407` | 12B | βœ… Native, reliable |
117
+ | `Qwen/Qwen2.5-72B-Instruct` | 72B | ❌ Routed to Novita (500 errors) |
118
+ | `meta-llama/Llama-3.1-70B-Instruct` | 70B | ❌ Routed to Hyperbolic (401 errors) |
119
+
120
+ **See:** `HF_FREE_TIER_ANALYSIS.md` for full analysis.
121
+
122
+ ---
123
 
124
  ## Development Conventions
125