Spaces:

nivakaran
/

modelx

Running

App Files Files Community

modelx / README.md

nivakaran

Upload folder using huggingface_hub

4134ab0 verified 3 days ago

preview code

raw

history blame contribute delete

47.5 kB

	---
	title: Roger Intelligence Platform
	emoji: ⚡
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	---

	# 🇱🇰 Roger Intelligence Platform

	Real-Time Situational Awareness for Sri Lanka

	A multi-agent AI system that aggregates intelligence from 50+ data sources to provide risk analysis and opportunity detection for businesses operating in Sri Lanka.

	## 🌐 Live Demo

	\| Component \| URL \|
	\|-----------\|-----\|
	\| Frontend Dashboard \| [https://model-x-frontend-snowy.vercel.app/](https://model-x-frontend-snowy.vercel.app/) \|
	\| Backend API \| [https://nivakaran-Roger.hf.space](https://nivakaran-Roger.hf.space) \|

	---

	## 🎯 Key Features

	✅ 5 Domain Agents + 2 Orchestrators running in parallel:
	- Social Agent - Reddit, Twitter, Facebook, Threads, BlueSky monitoring
	- Political Agent - Gazette, Parliament, District Social Media
	- Economical Agent - CSE Stock Market + Technical Indicators (SMA, EMA, RSI, MACD)
	- Meteorological Agent - DMC Weather + RiverNet + FloodWatch Integration
	- Intelligence Agent - Brand Monitoring + Threat Detection + User-Configurable Targets
	- Combined Agent (Orchestrator) - Fan-out/Fan-in coordination, LLM filtering, feed ranking
	- Data Retrieval Agent - Web scraping orchestration with anti-bot features

	✅ Situational Awareness Dashboard:
	- CEB Power Status - Load shedding / power outage monitoring
	- Fuel Prices - Petrol 92/95, Diesel, Kerosene (CEYPETCO)
	- CBSL Economic Indicators - Inflation, policy rates, forex reserves, USD/LKR
	- Health Alerts - Dengue case tracking, disease outbreak monitoring
	- Commodity Prices - 15 essential goods (rice, sugar, gas, eggs, etc.)
	- Water Supply Status - NWSDB disruption alerts

	✅ ML Anomaly Detection Pipeline (Integrated into Graph):
	- Language-specific BERT models (Sinhala, Tamil, English)
	- Real-time anomaly inference on every graph cycle
	- Clustering (DBSCAN, KMeans, HDBSCAN)
	- Anomaly Detection (Isolation Forest, LOF)
	- MLflow + DagsHub tracking

	✅ Weather Prediction ML Pipeline:
	- LSTM Neural Network (30-day sequences)
	- Predicts: Temperature, Rainfall, Flood Risk, Severity
	- 21 weather stations → 25 districts
	- Airflow DAG runs daily at 4 AM

	✅ Currency Prediction ML Pipeline:
	- GRU Neural Network (optimized for 8GB RAM)
	- Predicts: USD/LKR exchange rate
	- Features: Technical indicators + CSE + Gold + Oil + USD Index
	- MLflow tracking + Airflow DAG at 4 AM

	✅ Stock Price Prediction ML Pipeline:
	- Multi-Architecture: LSTM, GRU, BiLSTM, BiGRU
	- Optuna hyperparameter tuning (30 trials per stock)
	- Per-stock best model selection
	- 10 top CSE stocks (JKH, COMB, DIAL, HNB, etc.)

	✅ RAG-Powered Chatbot:
	- Chat-history aware Q&A
	- Queries all ChromaDB intelligence collections
	- Domain filtering (political, economic, weather, social)
	- Floating chat UI in dashboard

	✅ Trending/Velocity Detection:
	- SQLite-based topic frequency tracking (24-hour rolling window)
	- Momentum calculation: `current_hour / avg_last_6_hours`
	- Spike alerts when topic volume > 3x baseline
	- Integrated into Combined Agent dashboard

	✅ Real-Time Dashboard with:
	- Live Intelligence Feed
	- Floating AI Chatbox
	- Weather Predictions Tab
	- Live Satellite/Weather Map (Windy.com)
	- National Flood Threat Score
	- 30-Year Historical Climate Analysis
	- Trending Topics & Spike Alerts
	- Enhanced Operational Indicators (infrastructure_health, regulatory_activity, investment_climate)
	- Operational Risk Radar
	- ML Anomaly Detection Display
	- Market Predictions with Moving Averages
	- Risk & Opportunity Classification

	✅ Weather Data Scraper for ML Training:
	- Open-Meteo API (free historical data)
	- NASA FIRMS (fire/heat detection)
	- All 25 districts coverage
	- Year-wise CSV export for model training

	✅ Operational Dashboard Metrics:
	- Logistics Friction: Average confidence of mobility/social domain risk events
	- Compliance Volatility: Average confidence of political domain risks
	- Market Instability: Average confidence of market/economical domain risks
	- Opportunity Index: Average confidence of opportunity-classified events

	✅ Multi-District Province-Aware Event Categorization:
	- Events mentioning provinces are displayed in all constituent districts
	- Supports: Western, Southern, Central, Northern, Eastern, Sabaragamuwa, Uva, North Western, North Central provinces
	- Both frontend (MapView, DistrictInfoPanel) and backend are synchronized

	✅ 3-Tier Storage Architecture with Deduplication:
	- Tier 1: SQLite - Fast hash-based exact match (microseconds)
	- Tier 2: ChromaDB - Semantic similarity search with sentence transformers (milliseconds)
	- Tier 3: Neo4j Aura - Knowledge graph for event relationships and entity tracking
	- Unified `StorageManager` orchestrates all backends
	- Deduplication prevents duplicate feeds across all domain agents

	---

	## 🏗️ System Architecture

	```
	┌─────────────────────────────────────────────────────────────────────────┐
	│ Roger Combined Graph │
	│ ┌────────────────────────────────────────────────────────────────┐ │
	│ │ Graph Initiator (Reset) │ │
	│ └────────────────────────────────────────────────────────────────┘ │
	│ │ Fan-Out │
	│ ┌────────────┬────────────┼────────────┬────────────┬────────────┐ │
	│ ▼ ▼ ▼ ▼ ▼ ▼ │
	│ ┌──────┐ ┌──────┐ ┌──────────┐ ┌──────┐ ┌──────────┐ ┌────┐│
	│ │Social│ │Econ │ │Political │ │Meteo │ │Intellig- │ │Data││
	│ │Agent │ │Agent │ │Agent │ │Agent │ │ence Agent│ │Retr││
	│ └──────┘ └──────┘ └──────────┘ └──────┘ └──────────┘ └────┘│
	│ │ │ │ │ │ │ │
	│ └────────────┴────────────┴────────────┴────────────┴────────────┘ │
	│ │ Fan-In │
	│ ┌─────────▼──────────┐ │
	│ │ Feed Aggregator │ │
	│ │ (Rank & Dedupe) │ │
	│ └─────────┬──────────┘ │
	│ ┌─────────▼──────────┐ │
	│ │ Vectorization │ │
	│ │ Agent (Optional) │ │
	│ └─────────┬──────────┘ │
	│ ┌─────────▼──────────┐ │
	│ │ Router (Loop/End) │ │
	│ └────────────────────┘ │
	└─────────────────────────────────────────────────────────────────────────┘
	```

	---

	## 📊 Graph Implementations

	### 1. Combined Agent Graph (`combinedAgentGraph.py`)
	The Mother Graph - Orchestrates all domain agents in parallel.

	```mermaid
	graph TD
	A[Graph Initiator] -->\|Fan-Out\| B[Social Agent]
	A -->\|Fan-Out\| C[Economic Agent]
	A -->\|Fan-Out\| D[Political Agent]
	A -->\|Fan-Out\| E[Meteorological Agent]
	A -->\|Fan-Out\| F[Intelligence Agent]
	A -->\|Fan-Out\| G[Data Retrieval Agent]
	B -->\|Fan-In\| H[Feed Aggregator]
	C --> H
	D --> H
	E --> H
	F --> H
	G --> H
	H --> I[Data Refresher]
	I --> J{Router}
	J -->\|Loop\| A
	J -->\|End\| K[END]
	```

	Key Features:
	- Custom state reducers for parallel execution
	- Feed deduplication with content hashing
	- Loop control with configurable intervals
	- Real-time WebSocket broadcasting

	Architecture Improvements (v2.1):
	- Rate Limiting: Domain-specific rate limits prevent anti-bot detection
	- Twitter: 15 RPM, LinkedIn: 10 RPM, News: 60 RPM
	- Thread-safe semaphores for max concurrent requests
	- Error Handling: Per-agent try/catch prevents cascading failures
	- Failed agents return empty results, others continue
	- Non-Blocking Refresh: 60-second cycle with interruptible sleep
	- `threading.Event.wait()` instead of blocking `time.sleep()`

	### Storage Data Flow

	```
	┌─────────────────────────────────────────────────────────────────────────────┐
	│ DOMAIN AGENTS (Parallel) │
	│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
	│ │ Social │ │Political │ │Economic │ │ Meteo │ │ Intelligence │ │
	│ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │
	│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
	│ └────────────┴────────────┴────────────┴──────────────┘ │
	│ │ Fan-In │
	│ ┌────────────▼─────────────┐ │
	│ │ CombinedAgentNode │ │
	│ │ (LLM Filter + Rank) │ │
	│ └────────────┬─────────────┘ │
	└─────────────────────────────────┼───────────────────────────────────────────┘
	│
	┌─────────────▼──────────────┐
	│ StorageManager │
	│ (3-Tier Deduplication) │
	└─────────────┬──────────────┘
	┌───────────────────────┼──────────────────────────┐
	│ │ │
	▼ ▼ ▼
	┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────────┐
	│ SQLite │ │ ChromaDB │ │ Neo4j Aura │
	│ (Fast Cache) │ │ (Vector Store) │ │ (Knowledge Graph) │
	│ ───────────── │ │ ────────────── │ │ ─────────────────── │
	│ Hash-based │ │ Semantic search │ │ Event relationships │
	│ Exact match │ │ Similarity 0.85 │ │ Domain nodes │
	│ ~microseconds │ │ ~milliseconds │ │ Entity tracking │
	└─────────────────┘ └──────────────────┘ └─────────────────────────┘
	```

	---

	### 2. Political Agent Graph (`politicalAgentGraph.py`)
	3-Module Hybrid Architecture

	\| Module \| Description \| Sources \|
	\|--------\|-------------\|---------\|
	\| Official Sources \| Government data \| Gazette, Parliament Minutes \|
	\| Social Media \| Political sentiment \| Twitter, Facebook, Reddit (National + 25 Districts) \|
	\| Feed Generation \| LLM Processing \| Categorize → Summarize → Format \|

	```
	┌─────────────────────────────────────────────┐
	│ Module 1: Official │ Module 2: Social │
	│ ┌─────────────────┐ │ ┌───────────────┐ │
	│ │ Gazette │ │ │ National │ │
	│ │ Parliament │ │ │ Districts (25)│ │
	│ └─────────────────┘ │ │ World Politics│ │
	│ │ └───────────────┘ │
	└────────────┬───────────┴────────┬──────────┘
	│ Fan-In │
	▼ ▼
	┌────────────────────────────┐
	│ Module 3: Feed Generation │
	│ Categorize → LLM → Format │
	└────────────────────────────┘
	```

	---

	### 3. Economic Agent Graph (`economicalAgentGraph.py`)
	Market Intelligence & Technical Analysis

	\| Component \| Description \|
	\|-----------\|-------------\|
	\| Stock Collector \| CSE market data (200+ stocks) \|
	\| Technical Analyzer \| SMA, EMA, RSI, MACD \|
	\| Trend Detector \| Bullish/Bearish signals \|
	\| Feed Generator \| Risk/Opportunity classification \|

	Indicators Calculated:
	- Simple Moving Average (SMA-20, SMA-50)
	- Exponential Moving Average (EMA-12, EMA-26)
	- Relative Strength Index (RSI)
	- MACD with Signal Line

	---

	### 4. Meteorological Agent Graph (`meteorologicalAgentGraph.py`)
	Weather & Disaster Monitoring + FloodWatch Integration

	```
	┌─────────────────────────────────────┐
	│ DMC Weather Collector │
	│ (Daily forecasts, 25 districts) │
	└─────────────┬───────────────────────┘
	│
	▼
	┌─────────────────────────────────────┐
	│ RiverNet Data Collector │
	│ (River levels, flood monitoring) │
	└─────────────┬───────────────────────┘
	│
	▼
	┌─────────────────────────────────────┐
	│ FloodWatch Historical Data │
	│ (30-year climate analysis) │
	└─────────────┬───────────────────────┘
	│
	▼
	┌─────────────────────────────────────┐
	│ National Threat Calculator │
	│ (Aggregated flood risk 0-100) │
	└─────────────┬───────────────────────┘
	│
	▼
	┌─────────────────────────────────────┐
	│ Alert Generator │
	│ (Severity classification) │
	└─────────────────────────────────────┘
	```

	Alert Levels:
	- 🟢 Normal: Standard conditions
	- 🟡 Advisory: Watch for developments
	- 🟠 Warning: Take precautions
	- 🔴 Critical: Immediate action required

	FloodWatch Features:
	\| Feature \| Description \|
	\|---------\|-------------\|
	\| Historical Analysis \| 30-year climate data (1995-2025) \|
	\| Decadal Comparison \| 3 periods: 1995-2004, 2005-2014, 2015-2025 \|
	\| National Threat Score \| 0-100 aggregated risk from rivers + alerts + season \|
	\| High-Risk Periods \| May-Jun (SW Monsoon), Oct-Nov (NE Monsoon) \|

	---

	### 5. Social Agent Graph (`socialAgentGraph.py`)
	Multi-Platform Social Media Monitoring

	\| Platform \| Data Source \| Coverage \|
	\|----------\|-------------\|----------\|
	\| Reddit \| PRAW API \| r/srilanka, r/colombo \|
	\| Twitter/X \| Nitter scraping \| #SriLanka, #Colombo \|
	\| Facebook \| Profile scraping \| News pages \|
	\| Threads \| Meta API \| Trending topics \|
	\| BlueSky \| AT Protocol \| Political discourse \|

	---

	### 6. Intelligence Agent Graph (`intelligenceAgentGraph.py`)
	Brand & Threat Monitoring + User-Configurable Targets

	```
	┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
	│ Brand Monitor │ │ Threat Scanner │ │ User Targets │
	│ - Company news │ │ - Security │ │ - Custom keys │
	│ - Competitor │ │ - Compliance │ │ - User profiles │
	│ - Market share │ │ - Geopolitical │ │ - Products │
	└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
	│ │ │
	└──────────────────────┼──────────────────────┘
	▼
	┌─────────────────────┐
	│ Intelligence Report │
	│ (Priority ranked) │
	└─────────────────────┘
	```

	User-Configurable Monitoring:
	Users can define custom monitoring targets via the frontend settings panel or API:

	\| Config Type \| Description \| Example \|
	\|-------------\|-------------\|---------\|
	\| Keywords \| Custom search terms \| "Colombo Port", "BOI Investment" \|
	\| Products \| Products to track \| "iPhone 15", "Samsung Galaxy" \|
	\| Profiles \| Social media accounts \| @CompetitorX (Twitter), CompanyY (Facebook) \|

	API Endpoints:
	```bash
	# Get current config
	GET /api/intel/config

	# Update full config
	POST /api/intel/config
	Body: {"user_keywords": ["keyword1"], "user_profiles": {"twitter": ["@account"]}, "user_products": ["Product"]}

	# Add single target
	POST /api/intel/config/add?target_type=keyword&value=Colombo+Port

	# Remove target
	DELETE /api/intel/config/remove?target_type=profile&value=CompetitorX&platform=twitter
	```

	Config File: `src/config/intel_config.json`

	---

	### 7. DATA Retrieval Agent Graph (`dataRetrievalAgentGraph.py`)
	Web Scraping Orchestrator

	Scraping Tools Available:
	- `scrape_news_site` - Generic news scraper
	- `scrape_cse_live` - CSE stock prices
	- `scrape_official_data` - Government portals
	- `scrape_social_media` - Multi-platform

	Anti-Bot Features:
	- Random delays (1-3s)
	- User-agent rotation
	- Retry with exponential backoff
	- Headless browser fallback

	---

	### 8. Vectorization Agent Graph (`vectorizationAgentGraph.py`)
	6-Step Multilingual NLP Pipeline with Anomaly + Trending Detection

	```
	┌─────────────────────────────────────────────────┐
	│ Step 1: Language Detection │
	│ FastText + Unicode script analysis │
	│ Supports: English, Sinhala (සිංහල), Tamil (தமிழ்)│
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Step 2: Text Vectorization │
	│ ┌─────────────┬─────────────┬─────────────────┐ │
	│ │ DistilBERT │ SinhalaBERTo│ Tamil-BERT │ │
	│ │ (English) │ (Sinhala) │ (Tamil) │ │
	│ └─────────────┴─────────────┴─────────────────┘ │
	│ Output: 768-dim vector per text │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Step 3: Anomaly Detection (Isolation Forest) │
	│ - English: ML model inference │
	│ - Sinhala/Tamil: Skipped (incompatible vectors) │
	│ - Outputs anomaly_score (0-1) │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Step 4: Trending Detection │
	│ - Entity extraction (hashtags, proper nouns) │
	│ - Momentum: current_hour / avg_last_6_hours │
	│ - Spike alerts when momentum > 3x │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Step 5: Expert Summary (GroqLLM) │
	│ - Opportunity & threat identification │
	│ - Sentiment analysis │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Step 6: Format Output │
	│ - Includes anomaly + trending in domain_insights│
	└─────────────────────────────────────────────────┘
	```

	Trending Detection API Endpoints:

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/api/trending` \| GET \| Get trending topics & spike alerts \|
	\| `/api/trending/topic/{topic}` \| GET \| Get hourly history for a topic \|
	\| `/api/trending/record` \| POST \| Record a topic mention (testing) \|

	---

	### 10. Weather Prediction Pipeline (`models/weather-prediction/`)
	LSTM-Based Multi-District Weather Forecasting

	```
	┌─────────────────────────────────────────────────┐
	│ Data Source: Tutiempo.net (21 stations) │
	│ Historical data since 1944 │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ LSTM Neural Network │
	│ ┌─────────────────────────────────────────────┐ │
	│ │ Input: 30-day sequence (11 features) │ │
	│ │ Layer 1: LSTM(64) + BatchNorm + Dropout │ │
	│ │ Layer 2: LSTM(32) + BatchNorm + Dropout │ │
	│ │ Output: Dense(3) → temp_max, temp_min, rain │ │
	│ └─────────────────────────────────────────────┘ │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Severity Classifier │
	│ - Combines temp, rainfall, flood risk │
	│ - Outputs: normal/advisory/warning/critical │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Output: 25 District Predictions │
	│ - Temperature (high/low °C) │
	│ - Rainfall (mm + probability) │
	│ - Flood risk (integrated with RiverNet) │
	└─────────────────────────────────────────────────┘
	```

	Usage:
	```bash
	# Run full pipeline
	cd models/weather-prediction
	python main.py --mode full

	# Just predictions
	python main.py --mode predict

	# Train specific station
	python main.py --mode train --station COLOMBO
	```

	---

	### 11. Currency Prediction Pipeline (`models/currency-volatility-prediction/`)
	GRU-Based USD/LKR Exchange Rate Forecasting

	```
	┌─────────────────────────────────────────────────┐
	│ Data Sources (yfinance) │
	│ - USD/LKR exchange rate │
	│ - CSE stock index (correlation) │
	│ - Gold, Oil prices (global factors) │
	│ - USD strength index │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Feature Engineering (25+ features) │
	│ - SMA, EMA, RSI, MACD, Bollinger Bands │
	│ - Volatility, Momentum indicators │
	│ - Temporal encoding (day/month cycles) │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ GRU Neural Network (8GB RAM optimized) │
	│ ┌─────────────────────────────────────────────┐ │
	│ │ Input: 30-day sequence │ │
	│ │ Layer 1: GRU(64) + BatchNorm + Dropout │ │
	│ │ Layer 2: GRU(32) + BatchNorm + Dropout │ │
	│ │ Output: Dense(1) → next_day_rate │ │
	│ └─────────────────────────────────────────────┘ │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Output: USD/LKR Prediction │
	│ - Current & predicted rate │
	│ - Change % and direction │
	│ - Volatility classification (low/medium/high) │
	└─────────────────────────────────────────────────┘
	```

	Usage:
	```bash
	# Run full pipeline
	cd models/currency-volatility-prediction
	python main.py --mode full

	# Just predict
	python main.py --mode predict

	# Train GRU model
	python main.py --mode train --epochs 100
	```

	---

	### 12. RAG Chatbot (`src/rag.py`)
	Chat-History Aware Intelligence Q&A

	```
	┌─────────────────────────────────────────────────┐
	│ MultiCollectionRetriever │
	│ - Connects to ChromaDB intelligence collection │
	│ - Roger_feeds (all agent domain feeds) │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Question Reformulation (History-Aware) │
	│ - Uses last 3-5 exchanges for context │
	│ - Reformulates follow-up questions │
	└─────────────────┬───────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────┐
	│ Groq LLM (llama-3.1-70b-versatile) │
	│ - RAG with source citations │
	│ - Domain-specific analysis │
	└─────────────────────────────────────────────────┘
	```

	Usage:
	```bash
	# CLI mode
	python src/rag.py

	# Or via API
	curl -X POST http://localhost:8000/api/rag/chat \
	-H "Content-Type: application/json" \
	-d '{"message": "What are the latest political events?"}'
	```

	---

	## 🤖 ML Anomaly Detection Pipeline

	Located in `models/anomaly-detection/`

	### Pipeline Components

	\| Component \| File \| Description \|
	\|-----------\|------\|-------------\|
	\| Data Ingestion \| `data_ingestion.py` \| SQLite + CSV fetching \|
	\| Data Validation \| `data_validation.py` \| Schema-based validation \|
	\| Data Transformation \| `data_transformation.py` \| Language detection + BERT vectorization \|
	\| Model Trainer \| `model_trainer.py` \| Optuna + MLflow training \|

	### Clustering Models

	\| Model \| Type \| Use Case \|
	\|-------\|------\|----------\|
	\| DBSCAN \| Density-based \| Noise-robust clustering \|
	\| KMeans \| Centroid-based \| Fast, fixed k clusters \|
	\| HDBSCAN \| Hierarchical density \| Variable density clusters \|
	\| Isolation Forest \| Anomaly detection \| Outlier identification \|
	\| LOF \| Local outlier \| Density-based anomalies \|

	### Training with Optuna

	```python
	# Hyperparameter optimization
	study = optuna.create_study(direction="maximize")
	study.optimize(objective, n_trials=50)
	```

	### MLflow Tracking

	```python
	mlflow.set_tracking_uri("https://dagshub.com/...")
	mlflow.log_params(best_params)
	mlflow.log_metrics(metrics)
	mlflow.sklearn.log_model(model, "model")
	```

	---

	## 🌧️ Weather Data Scraper (`scripts/scrape_weather_data.py`)

	Historical weather data collection for ML model training

	### Data Sources

	\| Source \| API Key? \| Data Available \|
	\|--------\|----------\|----------------\|
	\| Open-Meteo \| ❌ Free \| Historical weather since 1940 \|
	\| NASA FIRMS \| ✅ Optional \| Fire/heat spot detection \|

	### Collected Weather Variables

	- `temperature_2m_max/min/mean`
	- `precipitation_sum`, `rain_sum`
	- `precipitation_hours`
	- `wind_speed_10m_max`, `wind_gusts_10m_max`
	- `wind_direction_10m_dominant`

	### Usage

	```bash
	# Scrape last 30 days (default)
	python scripts/scrape_weather_data.py

	# Scrape specific date range
	python scripts/scrape_weather_data.py --start 2020-01-01 --end 2024-12-31

	# Scrape multiple years for training dataset
	python scripts/scrape_weather_data.py --years 2020,2021,2022,2023,2024

	# Include fire detection data
	python scripts/scrape_weather_data.py --years 2023,2024 --fires

	# Hourly resolution (default is daily)
	python scripts/scrape_weather_data.py --start 2024-01-01 --end 2024-01-31 --resolution hourly
	```

	### Output

	```
	datasets/weather/
	├── weather_daily_2020-01-01_2020-12-31.csv
	├── weather_daily_2021-01-01_2021-12-31.csv
	├── weather_combined.csv (merged file)
	└── fire_detections_20241207.csv
	```

	### Coverage

	All 25 Sri Lankan districts with coordinates:
	- Colombo, Gampaha, Kalutara, Kandy, Matale, Nuwara Eliya
	- Galle, Matara, Hambantota, Jaffna, Kilinochchi, Mannar
	- Vavuniya, Mullaitivu, Batticaloa, Ampara, Trincomalee
	- Kurunegala, Puttalam, Anuradhapura, Polonnaruwa
	- Badulla, Monaragala, Ratnapura, Kegalle

	---

	## 🚀 Quick Start

	### Prerequisites
	- Python 3.11+
	- Node.js 18+
	- Docker Desktop (for Airflow)
	- Groq API Key

	### Installation

	```bash
	# 1. Clone repository
	git clone <your-repo>
	cd Roger-Final

	# 2. Create virtual environment
	python -m venv .venv
	source .venv/bin/activate # Linux/Mac
	.\.venv\Scripts\activate # Windows

	# 3. Install dependencies
	pip install -r requirements.txt

	# 4. Configure environment
	cp .env.template .env
	# Edit .env with your API keys

	# 5. Download ML models
	python models/anomaly-detection/download_models.py

	# 6. Launch all services
	./start_services.sh # Linux/Mac
	.\start_services.ps1 # Windows
	```

	---

	## 🔧 API Endpoints

	### REST API (FastAPI - Port 8000)

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/api/status` \| GET \| System health \|
	\| `/api/dashboard` \| GET \| Risk metrics \|
	\| `/api/feed` \| GET \| Latest events \|
	\| `/api/feeds` \| GET \| All feeds with pagination \|
	\| `/api/feeds/by_district` \| GET \| Feeds filtered by district \|
	\| `/api/rivernet` \| GET \| River monitoring data \|
	\| `/api/predict` \| POST \| Run anomaly predictions \|
	\| `/api/anomalies` \| GET \| Get anomalous feeds \|
	\| `/api/model/status` \| GET \| ML model status \|
	\| `/api/weather/predictions` \| GET \| All district forecasts \|
	\| `/api/weather/predictions/{district}` \| GET \| Single district \|
	\| `/api/weather/model/status` \| GET \| Weather model info \|
	\| `/api/weather/historical` \| GET \| 30-year climate analysis \|
	\| `/api/weather/threat` \| GET \| National flood threat score \|
	\| `/api/currency/prediction` \| GET \| USD/LKR next-day forecast \|
	\| `/api/currency/history` \| GET \| Historical rates \|
	\| `/api/currency/model/status` \| GET \| Currency model info \|
	\| `/api/stocks/predictions` \| GET \| All CSE stock forecasts \|
	\| `/api/stocks/predictions/{symbol}` \| GET \| Single stock prediction \|
	\| `/api/stocks/model/status` \| GET \| Stock models info \|
	\| `/api/rag/chat` \| POST \| Chat with RAG \|
	\| `/api/rag/stats` \| GET \| RAG system stats \|
	\| `/api/rag/clear` \| POST \| Clear chat history \|
	\| `/api/power` \| GET \| CEB power/load shedding status \|
	\| `/api/fuel` \| GET \| Current fuel prices \|
	\| `/api/economy` \| GET \| CBSL economic indicators \|
	\| `/api/health` \| GET \| Health alerts & dengue data \|
	\| `/api/commodities` \| GET \| Essential goods prices \|
	\| `/api/water` \| GET \| Water supply disruptions \|

	### WebSocket
	- `ws://localhost:8000/ws` - Real-time updates

	---

	## ⏰ Airflow Orchestration

	### DAG: `anomaly_detection_training`

	```
	start → check_records → data_ingestion → data_validation
	→ data_transformation → model_training → end
	```

	Triggers:
	- Batch threshold: 1000 new records
	- Daily fallback: Every 24 hours

	Access Dashboard:
	```bash
	cd models/anomaly-detection
	astro dev start
	# Open http://localhost:8080
	```

	### DAG: `weather_prediction_daily`

	```
	ingest_data → train_models → generate_predictions → publish_predictions
	```

	Schedule: Daily at 4:00 AM IST

	Tasks:
	- Scrape Tutiempo.net for latest data
	- Train LSTM models (MLflow tracked)
	- Generate 25-district predictions
	- Save to JSON for API

	### DAG: `currency_prediction_daily`

	```
	ingest_data → train_model → generate_prediction → publish_prediction
	```

	Schedule: Daily at 4:00 AM IST

	Tasks:
	- Fetch USD/LKR + indicators from yfinance
	- Train GRU model (MLflow tracked)
	- Generate next-day prediction
	- Save to JSON for API

	---

	## 📁 Project Structure

	```
	Roger-Ultimate/
	├── src/
	│ ├── graphs/ # LangGraph definitions
	│ │ ├── combinedAgentGraph.py # Mother graph
	│ │ ├── politicalAgentGraph.py
	│ │ ├── economicalAgentGraph.py
	│ │ ├── meteorologicalAgentGraph.py
	│ │ ├── socialAgentGraph.py
	│ │ ├── intelligenceAgentGraph.py
	│ │ ├── dataRetrievalAgentGraph.py
	│ │ └── vectorizationAgentGraph.py # 5-step with anomaly detection
	│ ├── nodes/ # Agent implementations
	│ ├── states/ # State definitions
	│ ├── llms/ # LLM configurations
	│ ├── storage/ # ChromaDB, SQLite, Neo4j stores
	│ ├── rag.py # RAG chatbot
	│ └── utils/
	│ └── utils.py # Tools incl. FloodWatch
	├── scripts/
	│ └── scrape_weather_data.py # Weather data scraper
	├── models/
	│ ├── anomaly-detection/ # ML Anomaly Pipeline
	│ │ ├── src/
	│ │ │ ├── components/ # Pipeline stages
	│ │ │ ├── entity/ # Config/Artifact classes
	│ │ │ ├── pipeline/ # Orchestrators
	│ │ │ └── utils/ # Vectorizer, metrics
	│ │ ├── dags/ # Airflow DAGs
	│ │ ├── data_schema/ # Validation schemas
	│ │ ├── output/ # Trained models
	│ │ └── models_cache/ # Downloaded BERT models
	│ ├── weather-prediction/ # Weather ML Pipeline
	│ │ ├── src/components/ # data_ingestion, model_trainer, predictor
	│ │ ├── dags/ # weather_prediction_dag.py (4 AM)
	│ │ ├── artifacts/ # Trained LSTM models (.h5)
	│ │ └── main.py # CLI entry point
	│ └── currency-volatility-prediction/ # Currency ML Pipeline
	│ ├── src/components/ # data_ingestion, model_trainer, predictor
	│ ├── dags/ # currency_prediction_dag.py (4 AM)
	│ ├── artifacts/ # Trained GRU model
	│ └── main.py # CLI entry point
	├── datasets/
	│ └── weather/ # Scraped weather CSVs
	├── frontend/
	│ └── app/
	│ ├── components/
	│ │ ├── dashboard/
	│ │ │ ├── AnomalyDetection.tsx
	│ │ │ ├── WeatherPredictions.tsx
	│ │ │ ├── CurrencyPrediction.tsx
	│ │ │ ├── NationalThreatCard.tsx # Flood threat score
	│ │ │ ├── HistoricalIntel.tsx # 30-year climate
	│ │ │ └── ...
	│ │ ├── map/
	│ │ │ ├── MapView.tsx
	│ │ │ └── SatelliteView.tsx # Windy.com embed
	│ │ ├── FloatingChatBox.tsx # RAG chat UI
	│ │ └── ...
	│ └── pages/
	│ └── Index.tsx # 7 tabs incl. SATELLITE
	├── main.py # FastAPI backend
	├── start.sh # Startup script
	└── requirements.txt
	```

	---

	## 🔐 Environment Variables

	```env
	# LLM
	GROQ_API_KEY=your_groq_key

	# Neo4j (Knowledge Graph)
	NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
	NEO4J_USERNAME=neo4j
	NEO4J_PASSWORD=your_password
	NEO4J_ENABLED=true
	NEO4J_DATABASE=neo4j

	# ChromaDB (Vector Store)
	CHROMADB_PATH=./data/chromadb
	CHROMADB_COLLECTION=Roger_feeds
	CHROMADB_SIMILARITY_THRESHOLD=0.85

	# SQLite (Fast Cache)
	SQLITE_DB_PATH=./data/cache/feeds.db

	# MLflow (DagsHub)
	MLFLOW_TRACKING_URI=https://dagshub.com/...
	MLFLOW_TRACKING_USERNAME=...
	MLFLOW_TRACKING_PASSWORD=...

	# Pipeline
	BATCH_THRESHOLD=1000
	```

	---

	## 🧪 Testing Framework

	Industry-level testing infrastructure for the agentic AI system.

	### Test Structure

	```
	tests/
	├── conftest.py # Pytest fixtures and configuration
	├── unit/ # Unit tests for individual components
	│ └── test_utils.py
	├── integration/ # Multi-component integration tests
	│ └── test_agent_routing.py
	├── evaluation/ # LLM-as-Judge evaluation tests
	│ ├── agent_evaluator.py # Evaluation harness
	│ ├── adversarial_tests.py # Prompt injection & edge cases
	│ └── golden_datasets/
	│ └── expected_responses.json
	└── e2e/ # End-to-end workflow tests
	└── test_full_pipeline.py
	```

	### LangSmith Integration

	Automatic tracing for all agent decisions when `LANGSMITH_API_KEY` is set.

	```env
	# Add to .env
	LANGSMITH_API_KEY=your_langsmith_api_key
	LANGSMITH_PROJECT=roger-intelligence # Optional, defaults to 'roger-intelligence'
	```

	View traces: [smith.langchain.com](https://smith.langchain.com/)

	### Running Tests

	```bash
	# Run all tests
	python run_tests.py

	# Run specific test suites
	python run_tests.py --unit # Unit tests only
	python run_tests.py --adversarial # Security/adversarial tests
	python run_tests.py --eval # LLM-as-Judge evaluation
	python run_tests.py --e2e # End-to-end tests

	# With coverage report
	python run_tests.py --coverage

	# Enable LangSmith tracing in tests
	python run_tests.py --with-langsmith
	```

	### Agent Evaluation Harness

	The `agent_evaluator.py` implements the LLM-as-Judge pattern:

	\| Metric \| Description \|
	\|--------\|-------------\|
	\| Tool Selection Accuracy \| Did the agent use the correct tools? \|
	\| Response Quality \| Is the response relevant and coherent? \|
	\| BLEU Score \| N-gram text similarity (0-1, higher = better match) \|
	\| Hallucination Detection \| Did the agent fabricate information? \|
	\| Graceful Degradation \| Does it handle failures properly? \|

	```bash
	# Run standalone evaluator
	python tests/evaluation/agent_evaluator.py
	```

	### Adversarial Testing

	Tests for security and robustness:

	\| Test Category \| Description \|
	\|--------------\|-------------\|
	\| Prompt Injection \| Ignore instructions, jailbreak, context switching \|
	\| Out-of-Domain \| Non-SL queries, illegal requests, impossible questions \|
	\| Malformed Input \| Empty, XSS, SQL injection, unicode flood \|
	\| Graceful Degradation \| API timeouts, empty responses, rate limiting \|

	### CI/CD Pipeline

	GitHub Actions workflow (`.github/workflows/test.yml`):

	```yaml
	on: [push, pull_request]

	jobs:
	unit-tests: # Runs on every push
	adversarial-tests: # Security tests on every push
	evaluation-tests: # LLM evaluation on main branch only
	lint: # Code quality checks
	```

	Required Secrets:
	- `LANGSMITH_API_KEY` - For evaluation test logging
	- `GROQ_API_KEY` - For LLM-based evaluation

	---

	## 🐛 Troubleshooting

	### FastText won't install on Windows
	```bash
	# Use pre-built wheel instead
	pip install fasttext-wheel
	```

	### BERT models downloading slowly
	```bash
	# Pre-download all models
	python models/anomaly-detection/download_models.py
	```

	### Airflow not starting
	```bash
	# Ensure Docker is running
	docker info

	# Initialize Astro project
	cd models/anomaly-detection
	astro dev init
	astro dev start
	```

	### NumPy 2.0 / ChromaDB compatibility error
	```bash
	# If you see "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x"
	pip install "numpy<2.0"

	# Or upgrade chromadb to latest
	pip install --upgrade chromadb
	```

	### Keras model loading error ("Could not locate function 'mse'")
	```bash
	# If currency/weather models fail to load with Keras 3.x
	# Retrain the model - it will save in .keras format automatically
	cd models/currency-volatility-prediction
	python main.py --mode train

	# Or for weather
	cd models/weather-prediction
	python main.py --mode train
	```

	---

	## 📄 License

	MIT License - Built for Production

	---

	## 🙏 Acknowledgments

	- Groq - High-speed LLM inference
	- LangGraph - Agent orchestration
	- HuggingFace - SinhalaBERTo, Tamil-BERT, DistilBERT
	- Optuna - Hyperparameter optimization
	- MLflow - Experiment tracking
	- Sri Lankan government for open data sources