---
title: Roger Intelligence Platform
emoji: ⚡
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---

# 🇱🇰 Roger Intelligence Platform

**Real-Time Situational Awareness for Sri Lanka**

A multi-agent AI system that aggregates intelligence from **50+ data sources** to provide risk analysis and opportunity detection for businesses operating in Sri Lanka.

## 🌐 Live Demo

| Component | URL |
|-----------|-----|
| **Frontend Dashboard** | [https://model-x-frontend-snowy.vercel.app/](https://model-x-frontend-snowy.vercel.app/) |
| **Backend API** | [https://nivakaran-Roger.hf.space](https://nivakaran-Roger.hf.space) |

---

## 🎯 Key Features

✅ **5 Domain Agents + 2 Orchestrators** running in parallel:
- **Social Agent** - Reddit, Twitter, Facebook, Threads, BlueSky monitoring
- **Political Agent** - Gazette, Parliament, District Social Media
- **Economical Agent** - CSE Stock Market + Technical Indicators (SMA, EMA, RSI, MACD)
- **Meteorological Agent** - DMC Weather + RiverNet + **FloodWatch Integration**
- **Intelligence Agent** - Brand Monitoring + Threat Detection + **User-Configurable Targets**
- **Combined Agent (Orchestrator)** - Fan-out/Fan-in coordination, LLM filtering, feed ranking
- **Data Retrieval Agent** - Web scraping orchestration with anti-bot features

✅ **Situational Awareness Dashboard**:
- **CEB Power Status** - Load shedding / power outage monitoring
- **Fuel Prices** - Petrol 92/95, Diesel, Kerosene (CEYPETCO)
- **CBSL Economic Indicators** - Inflation, policy rates, forex reserves, USD/LKR
- **Health Alerts** - Dengue case tracking, disease outbreak monitoring
- **Commodity Prices** - 15 essential goods (rice, sugar, gas, eggs, etc.)
- **Water Supply Status** - NWSDB disruption alerts

✅ **ML Anomaly Detection Pipeline** (Integrated into Graph):
- Language-specific BERT models (Sinhala, Tamil, English)
- Real-time anomaly inference on every graph cycle
- Clustering (DBSCAN, KMeans, HDBSCAN)
- Anomaly Detection (Isolation Forest, LOF)
- MLflow + DagsHub tracking

✅ **Weather Prediction ML Pipeline**:
- LSTM Neural Network (30-day sequences)
- Predicts: Temperature, Rainfall, Flood Risk, Severity
- 21 weather stations → 25 districts
- Airflow DAG runs daily at 4 AM

✅ **Currency Prediction ML Pipeline**:
- GRU Neural Network (optimized for 8GB RAM)
- Predicts: USD/LKR exchange rate
- Features: Technical indicators + CSE + Gold + Oil + USD Index
- MLflow tracking + Airflow DAG at 4 AM

✅ **Stock Price Prediction ML Pipeline**:
- Multi-Architecture: LSTM, GRU, BiLSTM, BiGRU
- Optuna hyperparameter tuning (30 trials per stock)
- Per-stock best model selection
- 10 top CSE stocks (JKH, COMB, DIAL, HNB, etc.)

✅ **RAG-Powered Chatbot**:
- Chat-history aware Q&A
- Queries all ChromaDB intelligence collections
- Domain filtering (political, economic, weather, social)
- Floating chat UI in dashboard

✅ **Trending/Velocity Detection**:
- SQLite-based topic frequency tracking (24-hour rolling window)
- Momentum calculation: `current_hour / avg_last_6_hours`
- Spike alerts when topic volume > 3x baseline
- Integrated into Combined Agent dashboard

✅ **Real-Time Dashboard** with:
- Live Intelligence Feed
- Floating AI Chatbox
- Weather Predictions Tab
- **Live Satellite/Weather Map** (Windy.com)
- **National Flood Threat Score**
- **30-Year Historical Climate Analysis**
- **Trending Topics & Spike Alerts**
- **Enhanced Operational Indicators** (infrastructure_health, regulatory_activity, investment_climate)
- Operational Risk Radar
- ML Anomaly Detection Display
- Market Predictions with Moving Averages
- Risk & Opportunity Classification

✅ **Weather Data Scraper for ML Training**:
- Open-Meteo API (free historical data)
- NASA FIRMS (fire/heat detection)
- All 25 districts coverage
- Year-wise CSV export for model training

✅ **Operational Dashboard Metrics**:
- **Logistics Friction**: Average confidence of mobility/social domain risk events
- **Compliance Volatility**: Average confidence of political domain risks
- **Market Instability**: Average confidence of market/economical domain risks
- **Opportunity Index**: Average confidence of opportunity-classified events

✅ **Multi-District Province-Aware Event Categorization**:
- Events mentioning provinces are displayed in all constituent districts
- Supports: Western, Southern, Central, Northern, Eastern, Sabaragamuwa, Uva, North Western, North Central provinces
- Both frontend (MapView, DistrictInfoPanel) and backend are synchronized

✅ **3-Tier Storage Architecture** with Deduplication:
- **Tier 1: SQLite** - Fast hash-based exact match (microseconds)
- **Tier 2: ChromaDB** - Semantic similarity search with sentence transformers (milliseconds)
- **Tier 3: Neo4j Aura** - Knowledge graph for event relationships and entity tracking
- Unified `StorageManager` orchestrates all backends
- Deduplication prevents duplicate feeds across all domain agents

---

## 🏗️ System Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│                       Roger Combined Graph                              │
│  ┌────────────────────────────────────────────────────────────────┐    │
│  │                    Graph Initiator (Reset)                      │    │
│  └────────────────────────────────────────────────────────────────┘    │
│                              │ Fan-Out                                   │
│    ┌────────────┬────────────┼────────────┬────────────┬────────────┐  │
│    ▼            ▼            ▼            ▼            ▼            ▼  │
│ ┌──────┐   ┌──────┐   ┌──────────┐   ┌──────┐   ┌──────────┐   ┌────┐│
│ │Social│   │Econ  │   │Political │   │Meteo │   │Intellig- │   │Data││
│ │Agent │   │Agent │   │Agent     │   │Agent │   │ence Agent│   │Retr││
│ └──────┘   └──────┘   └──────────┘   └──────┘   └──────────┘   └────┘│
│    │            │            │            │            │            │  │
│    └────────────┴────────────┴────────────┴────────────┴────────────┘  │
│                              │ Fan-In                                   │
│                    ┌─────────▼──────────┐                              │
│                    │   Feed Aggregator   │                              │
│                    │  (Rank & Dedupe)    │                              │
│                    └─────────┬──────────┘                              │
│                    ┌─────────▼──────────┐                              │
│                    │  Vectorization     │                          │
│                    │  Agent (Optional)  │                              │
│                    └─────────┬──────────┘                              │
│                    ┌─────────▼──────────┐                              │
│                    │  Router (Loop/End) │                              │
│                    └────────────────────┘                              │
└─────────────────────────────────────────────────────────────────────────┘
```

---

## 📊 Graph Implementations

### 1. Combined Agent Graph (`combinedAgentGraph.py`)
**The Mother Graph** - Orchestrates all domain agents in parallel.

```mermaid
graph TD
    A[Graph Initiator] -->|Fan-Out| B[Social Agent]
    A -->|Fan-Out| C[Economic Agent]
    A -->|Fan-Out| D[Political Agent]
    A -->|Fan-Out| E[Meteorological Agent]
    A -->|Fan-Out| F[Intelligence Agent]
    A -->|Fan-Out| G[Data Retrieval Agent]
    B -->|Fan-In| H[Feed Aggregator]
    C --> H
    D --> H
    E --> H
    F --> H
    G --> H
    H --> I[Data Refresher]
    I --> J{Router}
    J -->|Loop| A
    J -->|End| K[END]
```

**Key Features:**
- Custom state reducers for parallel execution
- Feed deduplication with content hashing
- Loop control with configurable intervals
- Real-time WebSocket broadcasting

**Architecture Improvements (v2.1):**
- **Rate Limiting**: Domain-specific rate limits prevent anti-bot detection
  - Twitter: 15 RPM, LinkedIn: 10 RPM, News: 60 RPM
  - Thread-safe semaphores for max concurrent requests
- **Error Handling**: Per-agent try/catch prevents cascading failures
  - Failed agents return empty results, others continue
- **Non-Blocking Refresh**: 60-second cycle with interruptible sleep
  - `threading.Event.wait()` instead of blocking `time.sleep()`

### Storage Data Flow

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         DOMAIN AGENTS (Parallel)                            │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐      │
│  │ Social   │ │Political │ │Economic  │ │  Meteo   │ │ Intelligence │      │
│  │ Agent    │ │ Agent    │ │ Agent    │ │  Agent   │ │    Agent     │      │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘      │
│       └────────────┴────────────┴────────────┴──────────────┘              │
│                                 │ Fan-In                                    │
│                    ┌────────────▼─────────────┐                            │
│                    │   CombinedAgentNode      │                            │
│                    │   (LLM Filter + Rank)    │                            │
│                    └────────────┬─────────────┘                            │
└─────────────────────────────────┼───────────────────────────────────────────┘
                                  │
                    ┌─────────────▼──────────────┐
                    │      StorageManager        │
                    │   (3-Tier Deduplication)   │
                    └─────────────┬──────────────┘
          ┌───────────────────────┼──────────────────────────┐
          │                       │                          │
          ▼                       ▼                          ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────────┐
│     SQLite      │    │    ChromaDB      │    │      Neo4j Aura         │
│   (Fast Cache)  │    │  (Vector Store)  │    │   (Knowledge Graph)     │
│  ─────────────  │    │  ──────────────  │    │  ───────────────────    │
│  Hash-based     │    │  Semantic search │    │  Event relationships    │
│  Exact match    │    │  Similarity 0.85 │    │  Domain nodes           │
│  ~microseconds  │    │  ~milliseconds   │    │  Entity tracking        │
└─────────────────┘    └──────────────────┘    └─────────────────────────┘
```

---

### 2. Political Agent Graph (`politicalAgentGraph.py`)
**3-Module Hybrid Architecture**

| Module | Description | Sources |
|--------|-------------|---------|
| **Official Sources** | Government data | Gazette, Parliament Minutes |
| **Social Media** | Political sentiment | Twitter, Facebook, Reddit (National + 25 Districts) |
| **Feed Generation** | LLM Processing | Categorize → Summarize → Format |

```
┌─────────────────────────────────────────────┐
│ Module 1: Official     │ Module 2: Social  │
│ ┌─────────────────┐    │ ┌───────────────┐ │
│ │ Gazette         │    │ │ National      │ │
│ │ Parliament      │    │ │ Districts (25)│ │
│ └─────────────────┘    │ │ World Politics│ │
│                        │ └───────────────┘ │
└────────────┬───────────┴────────┬──────────┘
             │       Fan-In       │
             ▼                    ▼
        ┌────────────────────────────┐
        │ Module 3: Feed Generation  │
        │ Categorize → LLM → Format  │
        └────────────────────────────┘
```

---

### 3. Economic Agent Graph (`economicalAgentGraph.py`)
**Market Intelligence & Technical Analysis**

| Component | Description |
|-----------|-------------|
| **Stock Collector** | CSE market data (200+ stocks) |
| **Technical Analyzer** | SMA, EMA, RSI, MACD |
| **Trend Detector** | Bullish/Bearish signals |
| **Feed Generator** | Risk/Opportunity classification |

**Indicators Calculated:**
- Simple Moving Average (SMA-20, SMA-50)
- Exponential Moving Average (EMA-12, EMA-26)
- Relative Strength Index (RSI)
- MACD with Signal Line

---

### 4. Meteorological Agent Graph (`meteorologicalAgentGraph.py`)
**Weather & Disaster Monitoring + FloodWatch Integration**

```
┌─────────────────────────────────────┐
│        DMC Weather Collector        │
│   (Daily forecasts, 25 districts)   │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│        RiverNet Data Collector      │
│   (River levels, flood monitoring)  │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│    FloodWatch Historical Data    │
│   (30-year climate analysis)        │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│    National Threat Calculator    │
│   (Aggregated flood risk 0-100)     │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│        Alert Generator              │
│   (Severity classification)         │
└─────────────────────────────────────┘
```

**Alert Levels:**
- 🟢 Normal: Standard conditions
- 🟡 Advisory: Watch for developments
- 🟠 Warning: Take precautions
- 🔴 Critical: Immediate action required

**FloodWatch Features:**
| Feature | Description |
|---------|-------------|
| **Historical Analysis** | 30-year climate data (1995-2025) |
| **Decadal Comparison** | 3 periods: 1995-2004, 2005-2014, 2015-2025 |
| **National Threat Score** | 0-100 aggregated risk from rivers + alerts + season |
| **High-Risk Periods** | May-Jun (SW Monsoon), Oct-Nov (NE Monsoon) |

---

### 5. Social Agent Graph (`socialAgentGraph.py`)
**Multi-Platform Social Media Monitoring**

| Platform | Data Source | Coverage |
|----------|-------------|----------|
| Reddit | PRAW API | r/srilanka, r/colombo |
| Twitter/X | Nitter scraping | #SriLanka, #Colombo |
| Facebook | Profile scraping | News pages |
| Threads | Meta API | Trending topics |
| BlueSky | AT Protocol | Political discourse |

---

### 6. Intelligence Agent Graph (`intelligenceAgentGraph.py`)
**Brand & Threat Monitoring + User-Configurable Targets**

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Brand Monitor   │    │ Threat Scanner  │    │ User Targets │
│ - Company news  │    │ - Security      │    │ - Custom keys   │
│ - Competitor    │    │ - Compliance    │    │ - User profiles │
│ - Market share  │    │ - Geopolitical  │    │ - Products      │
└────────┬────────┘    └────────┬────────┘    └────────┬────────┘
         │                      │                      │
         └──────────────────────┼──────────────────────┘
                                ▼
                   ┌─────────────────────┐
                   │ Intelligence Report │
                   │ (Priority ranked)   │
                   └─────────────────────┘
```

**User-Configurable Monitoring**:
Users can define custom monitoring targets via the frontend settings panel or API:

| Config Type | Description | Example |
|-------------|-------------|---------|
| **Keywords** | Custom search terms | "Colombo Port", "BOI Investment" |
| **Products** | Products to track | "iPhone 15", "Samsung Galaxy" |
| **Profiles** | Social media accounts | @CompetitorX (Twitter), CompanyY (Facebook) |

**API Endpoints:**
```bash
# Get current config
GET /api/intel/config

# Update full config
POST /api/intel/config
Body: {"user_keywords": ["keyword1"], "user_profiles": {"twitter": ["@account"]}, "user_products": ["Product"]}

# Add single target
POST /api/intel/config/add?target_type=keyword&value=Colombo+Port

# Remove target
DELETE /api/intel/config/remove?target_type=profile&value=CompetitorX&platform=twitter
```

**Config File**: `src/config/intel_config.json`

---

### 7. DATA Retrieval Agent Graph (`dataRetrievalAgentGraph.py`)
**Web Scraping Orchestrator**

**Scraping Tools Available:**
- `scrape_news_site` - Generic news scraper
- `scrape_cse_live` - CSE stock prices
- `scrape_official_data` - Government portals
- `scrape_social_media` - Multi-platform

**Anti-Bot Features:**
- Random delays (1-3s)
- User-agent rotation
- Retry with exponential backoff
- Headless browser fallback

---

### 8. Vectorization Agent Graph (`vectorizationAgentGraph.py`)
**6-Step Multilingual NLP Pipeline with Anomaly + Trending Detection**

```
┌─────────────────────────────────────────────────┐
│ Step 1: Language Detection                       │
│ FastText + Unicode script analysis              │
│ Supports: English, Sinhala (සිංහල), Tamil (தமிழ்)│
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 2: Text Vectorization                       │
│ ┌─────────────┬─────────────┬─────────────────┐ │
│ │ DistilBERT  │ SinhalaBERTo│ Tamil-BERT      │ │
│ │ (English)   │ (Sinhala)   │ (Tamil)         │ │
│ └─────────────┴─────────────┴─────────────────┘ │
│ Output: 768-dim vector per text                 │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 3: Anomaly Detection (Isolation Forest)    │
│ - English: ML model inference                    │
│ - Sinhala/Tamil: Skipped (incompatible vectors) │
│ - Outputs anomaly_score (0-1)                   │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 4: Trending Detection                    │
│ - Entity extraction (hashtags, proper nouns)    │
│ - Momentum: current_hour / avg_last_6_hours     │
│ - Spike alerts when momentum > 3x               │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 5: Expert Summary (GroqLLM)                │
│ - Opportunity & threat identification           │
│ - Sentiment analysis                            │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 6: Format Output                           │
│ - Includes anomaly + trending in domain_insights│
└─────────────────────────────────────────────────┘
```

**Trending Detection API Endpoints:**

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/trending` | GET | Get trending topics & spike alerts |
| `/api/trending/topic/{topic}` | GET | Get hourly history for a topic |
| `/api/trending/record` | POST | Record a topic mention (testing) |

---

### 10. Weather Prediction Pipeline (`models/weather-prediction/`)
**LSTM-Based Multi-District Weather Forecasting**

```
┌─────────────────────────────────────────────────┐
│ Data Source: Tutiempo.net (21 stations)         │
│ Historical data since 1944                       │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ LSTM Neural Network                              │
│ ┌─────────────────────────────────────────────┐ │
│ │ Input: 30-day sequence (11 features)        │ │
│ │ Layer 1: LSTM(64) + BatchNorm + Dropout     │ │
│ │ Layer 2: LSTM(32) + BatchNorm + Dropout     │ │
│ │ Output: Dense(3) → temp_max, temp_min, rain │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Severity Classifier                              │
│ - Combines temp, rainfall, flood risk           │
│ - Outputs: normal/advisory/warning/critical     │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Output: 25 District Predictions                  │
│ - Temperature (high/low °C)                     │
│ - Rainfall (mm + probability)                   │
│ - Flood risk (integrated with RiverNet)        │
└─────────────────────────────────────────────────┘
```

**Usage:**
```bash
# Run full pipeline
cd models/weather-prediction
python main.py --mode full

# Just predictions
python main.py --mode predict

# Train specific station
python main.py --mode train --station COLOMBO
```

---

### 11. Currency Prediction Pipeline (`models/currency-volatility-prediction/`)
**GRU-Based USD/LKR Exchange Rate Forecasting**

```
┌─────────────────────────────────────────────────┐
│ Data Sources (yfinance)                          │
│ - USD/LKR exchange rate                         │
│ - CSE stock index (correlation)                 │
│ - Gold, Oil prices (global factors)             │
│ - USD strength index                            │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Feature Engineering (25+ features)              │
│ - SMA, EMA, RSI, MACD, Bollinger Bands         │
│ - Volatility, Momentum indicators              │
│ - Temporal encoding (day/month cycles)         │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ GRU Neural Network (8GB RAM optimized)          │
│ ┌─────────────────────────────────────────────┐ │
│ │ Input: 30-day sequence                      │ │
│ │ Layer 1: GRU(64) + BatchNorm + Dropout      │ │
│ │ Layer 2: GRU(32) + BatchNorm + Dropout      │ │
│ │ Output: Dense(1) → next_day_rate            │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Output: USD/LKR Prediction                       │
│ - Current & predicted rate                      │
│ - Change % and direction                        │
│ - Volatility classification (low/medium/high)  │
└─────────────────────────────────────────────────┘
```

**Usage:**
```bash
# Run full pipeline
cd models/currency-volatility-prediction
python main.py --mode full

# Just predict
python main.py --mode predict

# Train GRU model
python main.py --mode train --epochs 100
```

---

### 12. RAG Chatbot (`src/rag.py`)
**Chat-History Aware Intelligence Q&A**

```
┌─────────────────────────────────────────────────┐
│ MultiCollectionRetriever                         │
│ - Connects to ChromaDB intelligence collection  │
│ - Roger_feeds (all agent domain feeds)          │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Question Reformulation (History-Aware)          │
│ - Uses last 3-5 exchanges for context           │
│ - Reformulates follow-up questions              │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Groq LLM (llama-3.1-70b-versatile)              │
│ - RAG with source citations                     │
│ - Domain-specific analysis                      │
└─────────────────────────────────────────────────┘
```

**Usage:**
```bash
# CLI mode
python src/rag.py

# Or via API
curl -X POST http://localhost:8000/api/rag/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What are the latest political events?"}'
```

---

## 🤖 ML Anomaly Detection Pipeline

Located in `models/anomaly-detection/`

### Pipeline Components

| Component | File | Description |
|-----------|------|-------------|
| Data Ingestion | `data_ingestion.py` | SQLite + CSV fetching |
| Data Validation | `data_validation.py` | Schema-based validation |
| Data Transformation | `data_transformation.py` | Language detection + BERT vectorization |
| Model Trainer | `model_trainer.py` | Optuna + MLflow training |

### Clustering Models

| Model | Type | Use Case |
|-------|------|----------|
| **DBSCAN** | Density-based | Noise-robust clustering |
| **KMeans** | Centroid-based | Fast, fixed k clusters |
| **HDBSCAN** | Hierarchical density | Variable density clusters |
| **Isolation Forest** | Anomaly detection | Outlier identification |
| **LOF** | Local outlier | Density-based anomalies |

### Training with Optuna

```python
# Hyperparameter optimization
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)
```

### MLflow Tracking

```python
mlflow.set_tracking_uri("https://dagshub.com/...")
mlflow.log_params(best_params)
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(model, "model")
```

---

## 🌧️ Weather Data Scraper (`scripts/scrape_weather_data.py`)

**Historical weather data collection for ML model training**

### Data Sources

| Source | API Key? | Data Available |
|--------|----------|----------------|
| **Open-Meteo** | ❌ Free | Historical weather since 1940 |
| **NASA FIRMS** | ✅ Optional | Fire/heat spot detection |

### Collected Weather Variables

- `temperature_2m_max/min/mean`
- `precipitation_sum`, `rain_sum`
- `precipitation_hours`
- `wind_speed_10m_max`, `wind_gusts_10m_max`
- `wind_direction_10m_dominant`

### Usage

```bash
# Scrape last 30 days (default)
python scripts/scrape_weather_data.py

# Scrape specific date range
python scripts/scrape_weather_data.py --start 2020-01-01 --end 2024-12-31

# Scrape multiple years for training dataset
python scripts/scrape_weather_data.py --years 2020,2021,2022,2023,2024

# Include fire detection data
python scripts/scrape_weather_data.py --years 2023,2024 --fires

# Hourly resolution (default is daily)
python scripts/scrape_weather_data.py --start 2024-01-01 --end 2024-01-31 --resolution hourly
```

### Output

```
datasets/weather/
├── weather_daily_2020-01-01_2020-12-31.csv
├── weather_daily_2021-01-01_2021-12-31.csv
├── weather_combined.csv  (merged file)
└── fire_detections_20241207.csv
```

### Coverage

All 25 Sri Lankan districts with coordinates:
- Colombo, Gampaha, Kalutara, Kandy, Matale, Nuwara Eliya
- Galle, Matara, Hambantota, Jaffna, Kilinochchi, Mannar
- Vavuniya, Mullaitivu, Batticaloa, Ampara, Trincomalee
- Kurunegala, Puttalam, Anuradhapura, Polonnaruwa
- Badulla, Monaragala, Ratnapura, Kegalle

---

## 🚀 Quick Start

### Prerequisites
- Python 3.11+
- Node.js 18+
- Docker Desktop (for Airflow)
- Groq API Key

### Installation

```bash
# 1. Clone repository
git clone <your-repo>
cd Roger-Final

# 2. Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.\.venv\Scripts\activate   # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.template .env
# Edit .env with your API keys

# 5. Download ML models
python models/anomaly-detection/download_models.py

# 6. Launch all services
./start_services.sh       # Linux/Mac
.\start_services.ps1      # Windows
```

---

## 🔧 API Endpoints

### REST API (FastAPI - Port 8000)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/status` | GET | System health |
| `/api/dashboard` | GET | Risk metrics |
| `/api/feed` | GET | Latest events |
| `/api/feeds` | GET | All feeds with pagination |
| `/api/feeds/by_district` | GET | Feeds filtered by district |
| `/api/rivernet` | GET | River monitoring data |
| `/api/predict` | POST | Run anomaly predictions |
| `/api/anomalies` | GET | Get anomalous feeds |
| `/api/model/status` | GET | ML model status |
| `/api/weather/predictions` | GET | All district forecasts |
| `/api/weather/predictions/{district}` | GET | Single district |
| `/api/weather/model/status` | GET | Weather model info |
| `/api/weather/historical` | GET | 30-year climate analysis |
| `/api/weather/threat` | GET | National flood threat score |
| `/api/currency/prediction` | GET | USD/LKR next-day forecast |
| `/api/currency/history` | GET | Historical rates |
| `/api/currency/model/status` | GET | Currency model info |
| `/api/stocks/predictions` | GET | All CSE stock forecasts |
| `/api/stocks/predictions/{symbol}` | GET | Single stock prediction |
| `/api/stocks/model/status` | GET | Stock models info |
| `/api/rag/chat` | POST | Chat with RAG |
| `/api/rag/stats` | GET | RAG system stats |
| `/api/rag/clear` | POST | Clear chat history |
| `/api/power` | GET | CEB power/load shedding status |
| `/api/fuel` | GET | Current fuel prices |
| `/api/economy` | GET | CBSL economic indicators |
| `/api/health` | GET | Health alerts & dengue data |
| `/api/commodities` | GET | Essential goods prices |
| `/api/water` | GET | Water supply disruptions |

### WebSocket
- `ws://localhost:8000/ws` - Real-time updates

---

## ⏰ Airflow Orchestration

### DAG: `anomaly_detection_training`

```
start → check_records → data_ingestion → data_validation 
      → data_transformation → model_training → end
```

**Triggers:**
- Batch threshold: 1000 new records
- Daily fallback: Every 24 hours

**Access Dashboard:**
```bash
cd models/anomaly-detection
astro dev start
# Open http://localhost:8080
```

### DAG: `weather_prediction_daily`

```
ingest_data → train_models → generate_predictions → publish_predictions
```

**Schedule:** Daily at 4:00 AM IST

**Tasks:**
- Scrape Tutiempo.net for latest data
- Train LSTM models (MLflow tracked)
- Generate 25-district predictions
- Save to JSON for API

### DAG: `currency_prediction_daily`

```
ingest_data → train_model → generate_prediction → publish_prediction
```

**Schedule:** Daily at 4:00 AM IST

**Tasks:**
- Fetch USD/LKR + indicators from yfinance
- Train GRU model (MLflow tracked)
- Generate next-day prediction
- Save to JSON for API

---

## 📁 Project Structure

```
Roger-Ultimate/
├── src/
│   ├── graphs/                    # LangGraph definitions
│   │   ├── combinedAgentGraph.py  # Mother graph
│   │   ├── politicalAgentGraph.py
│   │   ├── economicalAgentGraph.py
│   │   ├── meteorologicalAgentGraph.py
│   │   ├── socialAgentGraph.py
│   │   ├── intelligenceAgentGraph.py
│   │   ├── dataRetrievalAgentGraph.py
│   │   └── vectorizationAgentGraph.py  # 5-step with anomaly detection
│   ├── nodes/                     # Agent implementations
│   ├── states/                    # State definitions
│   ├── llms/                      # LLM configurations
│   ├── storage/                   # ChromaDB, SQLite, Neo4j stores
│   ├── rag.py                     # RAG chatbot
│   └── utils/
│       └── utils.py               # Tools incl. FloodWatch
├── scripts/
│   └── scrape_weather_data.py     # Weather data scraper
├── models/
│   ├── anomaly-detection/         # ML Anomaly Pipeline
│   │   ├── src/
│   │   │   ├── components/        # Pipeline stages
│   │   │   ├── entity/            # Config/Artifact classes
│   │   │   ├── pipeline/          # Orchestrators
│   │   │   └── utils/             # Vectorizer, metrics
│   │   ├── dags/                  # Airflow DAGs
│   │   ├── data_schema/           # Validation schemas
│   │   ├── output/                # Trained models
│   │   └── models_cache/          # Downloaded BERT models
│   ├── weather-prediction/        # Weather ML Pipeline
│   │   ├── src/components/        # data_ingestion, model_trainer, predictor
│   │   ├── dags/                  # weather_prediction_dag.py (4 AM)
│   │   ├── artifacts/             # Trained LSTM models (.h5)
│   │   └── main.py                # CLI entry point
│   └── currency-volatility-prediction/  # Currency ML Pipeline
│       ├── src/components/        # data_ingestion, model_trainer, predictor
│       ├── dags/                  # currency_prediction_dag.py (4 AM)
│       ├── artifacts/             # Trained GRU model
│       └── main.py                # CLI entry point
├── datasets/
│   └── weather/                   # Scraped weather CSVs
├── frontend/
│   └── app/
│       ├── components/
│       │   ├── dashboard/
│       │   │   ├── AnomalyDetection.tsx
│       │   │   ├── WeatherPredictions.tsx
│       │   │   ├── CurrencyPrediction.tsx
│       │   │   ├── NationalThreatCard.tsx     # Flood threat score
│       │   │   ├── HistoricalIntel.tsx        # 30-year climate
│       │   │   └── ...
│       │   ├── map/
│       │   │   ├── MapView.tsx
│       │   │   └── SatelliteView.tsx          # Windy.com embed
│       │   ├── FloatingChatBox.tsx            # RAG chat UI
│       │   └── ...
│       └── pages/
│           └── Index.tsx                       # 7 tabs incl. SATELLITE
├── main.py                        # FastAPI backend
├── start.sh                       # Startup script
└── requirements.txt
```

---

## 🔐 Environment Variables

```env
# LLM
GROQ_API_KEY=your_groq_key

# Neo4j (Knowledge Graph)
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password
NEO4J_ENABLED=true
NEO4J_DATABASE=neo4j

# ChromaDB (Vector Store)
CHROMADB_PATH=./data/chromadb
CHROMADB_COLLECTION=Roger_feeds
CHROMADB_SIMILARITY_THRESHOLD=0.85

# SQLite (Fast Cache)
SQLITE_DB_PATH=./data/cache/feeds.db

# MLflow (DagsHub)
MLFLOW_TRACKING_URI=https://dagshub.com/...
MLFLOW_TRACKING_USERNAME=...
MLFLOW_TRACKING_PASSWORD=...

# Pipeline
BATCH_THRESHOLD=1000
```

---

## 🧪 Testing Framework

Industry-level testing infrastructure for the agentic AI system.

### Test Structure

```
tests/
├── conftest.py                 # Pytest fixtures and configuration
├── unit/                       # Unit tests for individual components
│   └── test_utils.py
├── integration/                # Multi-component integration tests
│   └── test_agent_routing.py
├── evaluation/                 # LLM-as-Judge evaluation tests
│   ├── agent_evaluator.py      # Evaluation harness
│   ├── adversarial_tests.py    # Prompt injection & edge cases
│   └── golden_datasets/
│       └── expected_responses.json
└── e2e/                        # End-to-end workflow tests
    └── test_full_pipeline.py
```

### LangSmith Integration

Automatic tracing for all agent decisions when `LANGSMITH_API_KEY` is set.

```env
# Add to .env
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_PROJECT=roger-intelligence  # Optional, defaults to 'roger-intelligence'
```

**View traces:** [smith.langchain.com](https://smith.langchain.com/)

### Running Tests

```bash
# Run all tests
python run_tests.py

# Run specific test suites
python run_tests.py --unit           # Unit tests only
python run_tests.py --adversarial    # Security/adversarial tests
python run_tests.py --eval           # LLM-as-Judge evaluation
python run_tests.py --e2e            # End-to-end tests

# With coverage report
python run_tests.py --coverage

# Enable LangSmith tracing in tests
python run_tests.py --with-langsmith
```

### Agent Evaluation Harness

The `agent_evaluator.py` implements the **LLM-as-Judge** pattern:

| Metric | Description |
|--------|-------------|
| **Tool Selection Accuracy** | Did the agent use the correct tools? |
| **Response Quality** | Is the response relevant and coherent? |
| **BLEU Score** | N-gram text similarity (0-1, higher = better match) |
| **Hallucination Detection** | Did the agent fabricate information? |
| **Graceful Degradation** | Does it handle failures properly? |

```bash
# Run standalone evaluator
python tests/evaluation/agent_evaluator.py
```

### Adversarial Testing

Tests for security and robustness:

| Test Category | Description |
|--------------|-------------|
| **Prompt Injection** | Ignore instructions, jailbreak, context switching |
| **Out-of-Domain** | Non-SL queries, illegal requests, impossible questions |
| **Malformed Input** | Empty, XSS, SQL injection, unicode flood |
| **Graceful Degradation** | API timeouts, empty responses, rate limiting |

### CI/CD Pipeline

GitHub Actions workflow (`.github/workflows/test.yml`):

```yaml
on: [push, pull_request]

jobs:
  unit-tests:        # Runs on every push
  adversarial-tests: # Security tests on every push
  evaluation-tests:  # LLM evaluation on main branch only
  lint:              # Code quality checks
```

**Required Secrets:**
- `LANGSMITH_API_KEY` - For evaluation test logging
- `GROQ_API_KEY` - For LLM-based evaluation

---

## 🐛 Troubleshooting

### FastText won't install on Windows
```bash
# Use pre-built wheel instead
pip install fasttext-wheel
```

### BERT models downloading slowly
```bash
# Pre-download all models
python models/anomaly-detection/download_models.py
```

### Airflow not starting
```bash
# Ensure Docker is running
docker info

# Initialize Astro project
cd models/anomaly-detection
astro dev init
astro dev start
```

### NumPy 2.0 / ChromaDB compatibility error
```bash
# If you see "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x"
pip install "numpy<2.0"

# Or upgrade chromadb to latest
pip install --upgrade chromadb
```

### Keras model loading error ("Could not locate function 'mse'")
```bash
# If currency/weather models fail to load with Keras 3.x
# Retrain the model - it will save in .keras format automatically
cd models/currency-volatility-prediction
python main.py --mode train

# Or for weather
cd models/weather-prediction
python main.py --mode train
```

---

## 📄 License

MIT License - Built for Production

---

## 🙏 Acknowledgments

- **Groq** - High-speed LLM inference
- **LangGraph** - Agent orchestration
- **HuggingFace** - SinhalaBERTo, Tamil-BERT, DistilBERT
- **Optuna** - Hyperparameter optimization
- **MLflow** - Experiment tracking
- Sri Lankan government for open data sources