modelx / README.md
nivakaran's picture
Upload folder using huggingface_hub
4134ab0 verified
metadata
title: Roger Intelligence Platform
emoji: 
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

🇱🇰 Roger Intelligence Platform

Real-Time Situational Awareness for Sri Lanka

A multi-agent AI system that aggregates intelligence from 50+ data sources to provide risk analysis and opportunity detection for businesses operating in Sri Lanka.

🌐 Live Demo


🎯 Key Features

5 Domain Agents + 2 Orchestrators running in parallel:

  • Social Agent - Reddit, Twitter, Facebook, Threads, BlueSky monitoring
  • Political Agent - Gazette, Parliament, District Social Media
  • Economical Agent - CSE Stock Market + Technical Indicators (SMA, EMA, RSI, MACD)
  • Meteorological Agent - DMC Weather + RiverNet + FloodWatch Integration
  • Intelligence Agent - Brand Monitoring + Threat Detection + User-Configurable Targets
  • Combined Agent (Orchestrator) - Fan-out/Fan-in coordination, LLM filtering, feed ranking
  • Data Retrieval Agent - Web scraping orchestration with anti-bot features

Situational Awareness Dashboard:

  • CEB Power Status - Load shedding / power outage monitoring
  • Fuel Prices - Petrol 92/95, Diesel, Kerosene (CEYPETCO)
  • CBSL Economic Indicators - Inflation, policy rates, forex reserves, USD/LKR
  • Health Alerts - Dengue case tracking, disease outbreak monitoring
  • Commodity Prices - 15 essential goods (rice, sugar, gas, eggs, etc.)
  • Water Supply Status - NWSDB disruption alerts

ML Anomaly Detection Pipeline (Integrated into Graph):

  • Language-specific BERT models (Sinhala, Tamil, English)
  • Real-time anomaly inference on every graph cycle
  • Clustering (DBSCAN, KMeans, HDBSCAN)
  • Anomaly Detection (Isolation Forest, LOF)
  • MLflow + DagsHub tracking

Weather Prediction ML Pipeline:

  • LSTM Neural Network (30-day sequences)
  • Predicts: Temperature, Rainfall, Flood Risk, Severity
  • 21 weather stations → 25 districts
  • Airflow DAG runs daily at 4 AM

Currency Prediction ML Pipeline:

  • GRU Neural Network (optimized for 8GB RAM)
  • Predicts: USD/LKR exchange rate
  • Features: Technical indicators + CSE + Gold + Oil + USD Index
  • MLflow tracking + Airflow DAG at 4 AM

Stock Price Prediction ML Pipeline:

  • Multi-Architecture: LSTM, GRU, BiLSTM, BiGRU
  • Optuna hyperparameter tuning (30 trials per stock)
  • Per-stock best model selection
  • 10 top CSE stocks (JKH, COMB, DIAL, HNB, etc.)

RAG-Powered Chatbot:

  • Chat-history aware Q&A
  • Queries all ChromaDB intelligence collections
  • Domain filtering (political, economic, weather, social)
  • Floating chat UI in dashboard

Trending/Velocity Detection:

  • SQLite-based topic frequency tracking (24-hour rolling window)
  • Momentum calculation: current_hour / avg_last_6_hours
  • Spike alerts when topic volume > 3x baseline
  • Integrated into Combined Agent dashboard

Real-Time Dashboard with:

  • Live Intelligence Feed
  • Floating AI Chatbox
  • Weather Predictions Tab
  • Live Satellite/Weather Map (Windy.com)
  • National Flood Threat Score
  • 30-Year Historical Climate Analysis
  • Trending Topics & Spike Alerts
  • Enhanced Operational Indicators (infrastructure_health, regulatory_activity, investment_climate)
  • Operational Risk Radar
  • ML Anomaly Detection Display
  • Market Predictions with Moving Averages
  • Risk & Opportunity Classification

Weather Data Scraper for ML Training:

  • Open-Meteo API (free historical data)
  • NASA FIRMS (fire/heat detection)
  • All 25 districts coverage
  • Year-wise CSV export for model training

Operational Dashboard Metrics:

  • Logistics Friction: Average confidence of mobility/social domain risk events
  • Compliance Volatility: Average confidence of political domain risks
  • Market Instability: Average confidence of market/economical domain risks
  • Opportunity Index: Average confidence of opportunity-classified events

Multi-District Province-Aware Event Categorization:

  • Events mentioning provinces are displayed in all constituent districts
  • Supports: Western, Southern, Central, Northern, Eastern, Sabaragamuwa, Uva, North Western, North Central provinces
  • Both frontend (MapView, DistrictInfoPanel) and backend are synchronized

3-Tier Storage Architecture with Deduplication:

  • Tier 1: SQLite - Fast hash-based exact match (microseconds)
  • Tier 2: ChromaDB - Semantic similarity search with sentence transformers (milliseconds)
  • Tier 3: Neo4j Aura - Knowledge graph for event relationships and entity tracking
  • Unified StorageManager orchestrates all backends
  • Deduplication prevents duplicate feeds across all domain agents

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                       Roger Combined Graph                              │
│  ┌────────────────────────────────────────────────────────────────┐    │
│  │                    Graph Initiator (Reset)                      │    │
│  └────────────────────────────────────────────────────────────────┘    │
│                              │ Fan-Out                                   │
│    ┌────────────┬────────────┼────────────┬────────────┬────────────┐  │
│    ▼            ▼            ▼            ▼            ▼            ▼  │
│ ┌──────┐   ┌──────┐   ┌──────────┐   ┌──────┐   ┌──────────┐   ┌────┐│
│ │Social│   │Econ  │   │Political │   │Meteo │   │Intellig- │   │Data││
│ │Agent │   │Agent │   │Agent     │   │Agent │   │ence Agent│   │Retr││
│ └──────┘   └──────┘   └──────────┘   └──────┘   └──────────┘   └────┘│
│    │            │            │            │            │            │  │
│    └────────────┴────────────┴────────────┴────────────┴────────────┘  │
│                              │ Fan-In                                   │
│                    ┌─────────▼──────────┐                              │
│                    │   Feed Aggregator   │                              │
│                    │  (Rank & Dedupe)    │                              │
│                    └─────────┬──────────┘                              │
│                    ┌─────────▼──────────┐                              │
│                    │  Vectorization     │                          │
│                    │  Agent (Optional)  │                              │
│                    └─────────┬──────────┘                              │
│                    ┌─────────▼──────────┐                              │
│                    │  Router (Loop/End) │                              │
│                    └────────────────────┘                              │
└─────────────────────────────────────────────────────────────────────────┘

📊 Graph Implementations

1. Combined Agent Graph (combinedAgentGraph.py)

The Mother Graph - Orchestrates all domain agents in parallel.

graph TD
    A[Graph Initiator] -->|Fan-Out| B[Social Agent]
    A -->|Fan-Out| C[Economic Agent]
    A -->|Fan-Out| D[Political Agent]
    A -->|Fan-Out| E[Meteorological Agent]
    A -->|Fan-Out| F[Intelligence Agent]
    A -->|Fan-Out| G[Data Retrieval Agent]
    B -->|Fan-In| H[Feed Aggregator]
    C --> H
    D --> H
    E --> H
    F --> H
    G --> H
    H --> I[Data Refresher]
    I --> J{Router}
    J -->|Loop| A
    J -->|End| K[END]

Key Features:

  • Custom state reducers for parallel execution
  • Feed deduplication with content hashing
  • Loop control with configurable intervals
  • Real-time WebSocket broadcasting

Architecture Improvements (v2.1):

  • Rate Limiting: Domain-specific rate limits prevent anti-bot detection
    • Twitter: 15 RPM, LinkedIn: 10 RPM, News: 60 RPM
    • Thread-safe semaphores for max concurrent requests
  • Error Handling: Per-agent try/catch prevents cascading failures
    • Failed agents return empty results, others continue
  • Non-Blocking Refresh: 60-second cycle with interruptible sleep
    • threading.Event.wait() instead of blocking time.sleep()

Storage Data Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DOMAIN AGENTS (Parallel)                            │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐      │
│  │ Social   │ │Political │ │Economic  │ │  Meteo   │ │ Intelligence │      │
│  │ Agent    │ │ Agent    │ │ Agent    │ │  Agent   │ │    Agent     │      │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘      │
│       └────────────┴────────────┴────────────┴──────────────┘              │
│                                 │ Fan-In                                    │
│                    ┌────────────▼─────────────┐                            │
│                    │   CombinedAgentNode      │                            │
│                    │   (LLM Filter + Rank)    │                            │
│                    └────────────┬─────────────┘                            │
└─────────────────────────────────┼───────────────────────────────────────────┘
                                  │
                    ┌─────────────▼──────────────┐
                    │      StorageManager        │
                    │   (3-Tier Deduplication)   │
                    └─────────────┬──────────────┘
          ┌───────────────────────┼──────────────────────────┐
          │                       │                          │
          ▼                       ▼                          ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────────┐
│     SQLite      │    │    ChromaDB      │    │      Neo4j Aura         │
│   (Fast Cache)  │    │  (Vector Store)  │    │   (Knowledge Graph)     │
│  ─────────────  │    │  ──────────────  │    │  ───────────────────    │
│  Hash-based     │    │  Semantic search │    │  Event relationships    │
│  Exact match    │    │  Similarity 0.85 │    │  Domain nodes           │
│  ~microseconds  │    │  ~milliseconds   │    │  Entity tracking        │
└─────────────────┘    └──────────────────┘    └─────────────────────────┘

2. Political Agent Graph (politicalAgentGraph.py)

3-Module Hybrid Architecture

Module Description Sources
Official Sources Government data Gazette, Parliament Minutes
Social Media Political sentiment Twitter, Facebook, Reddit (National + 25 Districts)
Feed Generation LLM Processing Categorize → Summarize → Format
┌─────────────────────────────────────────────┐
│ Module 1: Official     │ Module 2: Social  │
│ ┌─────────────────┐    │ ┌───────────────┐ │
│ │ Gazette         │    │ │ National      │ │
│ │ Parliament      │    │ │ Districts (25)│ │
│ └─────────────────┘    │ │ World Politics│ │
│                        │ └───────────────┘ │
└────────────┬───────────┴────────┬──────────┘
             │       Fan-In       │
             ▼                    ▼
        ┌────────────────────────────┐
        │ Module 3: Feed Generation  │
        │ Categorize → LLM → Format  │
        └────────────────────────────┘

3. Economic Agent Graph (economicalAgentGraph.py)

Market Intelligence & Technical Analysis

Component Description
Stock Collector CSE market data (200+ stocks)
Technical Analyzer SMA, EMA, RSI, MACD
Trend Detector Bullish/Bearish signals
Feed Generator Risk/Opportunity classification

Indicators Calculated:

  • Simple Moving Average (SMA-20, SMA-50)
  • Exponential Moving Average (EMA-12, EMA-26)
  • Relative Strength Index (RSI)
  • MACD with Signal Line

4. Meteorological Agent Graph (meteorologicalAgentGraph.py)

Weather & Disaster Monitoring + FloodWatch Integration

┌─────────────────────────────────────┐
│        DMC Weather Collector        │
│   (Daily forecasts, 25 districts)   │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│        RiverNet Data Collector      │
│   (River levels, flood monitoring)  │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│    FloodWatch Historical Data    │
│   (30-year climate analysis)        │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│    National Threat Calculator    │
│   (Aggregated flood risk 0-100)     │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│        Alert Generator              │
│   (Severity classification)         │
└─────────────────────────────────────┘

Alert Levels:

  • 🟢 Normal: Standard conditions
  • 🟡 Advisory: Watch for developments
  • 🟠 Warning: Take precautions
  • 🔴 Critical: Immediate action required

FloodWatch Features:

Feature Description
Historical Analysis 30-year climate data (1995-2025)
Decadal Comparison 3 periods: 1995-2004, 2005-2014, 2015-2025
National Threat Score 0-100 aggregated risk from rivers + alerts + season
High-Risk Periods May-Jun (SW Monsoon), Oct-Nov (NE Monsoon)

5. Social Agent Graph (socialAgentGraph.py)

Multi-Platform Social Media Monitoring

Platform Data Source Coverage
Reddit PRAW API r/srilanka, r/colombo
Twitter/X Nitter scraping #SriLanka, #Colombo
Facebook Profile scraping News pages
Threads Meta API Trending topics
BlueSky AT Protocol Political discourse

6. Intelligence Agent Graph (intelligenceAgentGraph.py)

Brand & Threat Monitoring + User-Configurable Targets

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Brand Monitor   │    │ Threat Scanner  │    │ User Targets │
│ - Company news  │    │ - Security      │    │ - Custom keys   │
│ - Competitor    │    │ - Compliance    │    │ - User profiles │
│ - Market share  │    │ - Geopolitical  │    │ - Products      │
└────────┬────────┘    └────────┬────────┘    └────────┬────────┘
         │                      │                      │
         └──────────────────────┼──────────────────────┘
                                ▼
                   ┌─────────────────────┐
                   │ Intelligence Report │
                   │ (Priority ranked)   │
                   └─────────────────────┘

User-Configurable Monitoring: Users can define custom monitoring targets via the frontend settings panel or API:

Config Type Description Example
Keywords Custom search terms "Colombo Port", "BOI Investment"
Products Products to track "iPhone 15", "Samsung Galaxy"
Profiles Social media accounts @CompetitorX (Twitter), CompanyY (Facebook)

API Endpoints:

# Get current config
GET /api/intel/config

# Update full config
POST /api/intel/config
Body: {"user_keywords": ["keyword1"], "user_profiles": {"twitter": ["@account"]}, "user_products": ["Product"]}

# Add single target
POST /api/intel/config/add?target_type=keyword&value=Colombo+Port

# Remove target
DELETE /api/intel/config/remove?target_type=profile&value=CompetitorX&platform=twitter

Config File: src/config/intel_config.json


7. DATA Retrieval Agent Graph (dataRetrievalAgentGraph.py)

Web Scraping Orchestrator

Scraping Tools Available:

  • scrape_news_site - Generic news scraper
  • scrape_cse_live - CSE stock prices
  • scrape_official_data - Government portals
  • scrape_social_media - Multi-platform

Anti-Bot Features:

  • Random delays (1-3s)
  • User-agent rotation
  • Retry with exponential backoff
  • Headless browser fallback

8. Vectorization Agent Graph (vectorizationAgentGraph.py)

6-Step Multilingual NLP Pipeline with Anomaly + Trending Detection

┌─────────────────────────────────────────────────┐
│ Step 1: Language Detection                       │
│ FastText + Unicode script analysis              │
│ Supports: English, Sinhala (සිංහල), Tamil (தமிழ்)│
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 2: Text Vectorization                       │
│ ┌─────────────┬─────────────┬─────────────────┐ │
│ │ DistilBERT  │ SinhalaBERTo│ Tamil-BERT      │ │
│ │ (English)   │ (Sinhala)   │ (Tamil)         │ │
│ └─────────────┴─────────────┴─────────────────┘ │
│ Output: 768-dim vector per text                 │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 3: Anomaly Detection (Isolation Forest)    │
│ - English: ML model inference                    │
│ - Sinhala/Tamil: Skipped (incompatible vectors) │
│ - Outputs anomaly_score (0-1)                   │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 4: Trending Detection                    │
│ - Entity extraction (hashtags, proper nouns)    │
│ - Momentum: current_hour / avg_last_6_hours     │
│ - Spike alerts when momentum > 3x               │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 5: Expert Summary (GroqLLM)                │
│ - Opportunity & threat identification           │
│ - Sentiment analysis                            │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 6: Format Output                           │
│ - Includes anomaly + trending in domain_insights│
└─────────────────────────────────────────────────┘

Trending Detection API Endpoints:

Endpoint Method Description
/api/trending GET Get trending topics & spike alerts
/api/trending/topic/{topic} GET Get hourly history for a topic
/api/trending/record POST Record a topic mention (testing)

10. Weather Prediction Pipeline (models/weather-prediction/)

LSTM-Based Multi-District Weather Forecasting

┌─────────────────────────────────────────────────┐
│ Data Source: Tutiempo.net (21 stations)         │
│ Historical data since 1944                       │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ LSTM Neural Network                              │
│ ┌─────────────────────────────────────────────┐ │
│ │ Input: 30-day sequence (11 features)        │ │
│ │ Layer 1: LSTM(64) + BatchNorm + Dropout     │ │
│ │ Layer 2: LSTM(32) + BatchNorm + Dropout     │ │
│ │ Output: Dense(3) → temp_max, temp_min, rain │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Severity Classifier                              │
│ - Combines temp, rainfall, flood risk           │
│ - Outputs: normal/advisory/warning/critical     │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Output: 25 District Predictions                  │
│ - Temperature (high/low °C)                     │
│ - Rainfall (mm + probability)                   │
│ - Flood risk (integrated with RiverNet)        │
└─────────────────────────────────────────────────┘

Usage:

# Run full pipeline
cd models/weather-prediction
python main.py --mode full

# Just predictions
python main.py --mode predict

# Train specific station
python main.py --mode train --station COLOMBO

11. Currency Prediction Pipeline (models/currency-volatility-prediction/)

GRU-Based USD/LKR Exchange Rate Forecasting

┌─────────────────────────────────────────────────┐
│ Data Sources (yfinance)                          │
│ - USD/LKR exchange rate                         │
│ - CSE stock index (correlation)                 │
│ - Gold, Oil prices (global factors)             │
│ - USD strength index                            │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Feature Engineering (25+ features)              │
│ - SMA, EMA, RSI, MACD, Bollinger Bands         │
│ - Volatility, Momentum indicators              │
│ - Temporal encoding (day/month cycles)         │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ GRU Neural Network (8GB RAM optimized)          │
│ ┌─────────────────────────────────────────────┐ │
│ │ Input: 30-day sequence                      │ │
│ │ Layer 1: GRU(64) + BatchNorm + Dropout      │ │
│ │ Layer 2: GRU(32) + BatchNorm + Dropout      │ │
│ │ Output: Dense(1) → next_day_rate            │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Output: USD/LKR Prediction                       │
│ - Current & predicted rate                      │
│ - Change % and direction                        │
│ - Volatility classification (low/medium/high)  │
└─────────────────────────────────────────────────┘

Usage:

# Run full pipeline
cd models/currency-volatility-prediction
python main.py --mode full

# Just predict
python main.py --mode predict

# Train GRU model
python main.py --mode train --epochs 100

12. RAG Chatbot (src/rag.py)

Chat-History Aware Intelligence Q&A

┌─────────────────────────────────────────────────┐
│ MultiCollectionRetriever                         │
│ - Connects to ChromaDB intelligence collection  │
│ - Roger_feeds (all agent domain feeds)          │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Question Reformulation (History-Aware)          │
│ - Uses last 3-5 exchanges for context           │
│ - Reformulates follow-up questions              │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Groq LLM (llama-3.1-70b-versatile)              │
│ - RAG with source citations                     │
│ - Domain-specific analysis                      │
└─────────────────────────────────────────────────┘

Usage:

# CLI mode
python src/rag.py

# Or via API
curl -X POST http://localhost:8000/api/rag/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What are the latest political events?"}'

🤖 ML Anomaly Detection Pipeline

Located in models/anomaly-detection/

Pipeline Components

Component File Description
Data Ingestion data_ingestion.py SQLite + CSV fetching
Data Validation data_validation.py Schema-based validation
Data Transformation data_transformation.py Language detection + BERT vectorization
Model Trainer model_trainer.py Optuna + MLflow training

Clustering Models

Model Type Use Case
DBSCAN Density-based Noise-robust clustering
KMeans Centroid-based Fast, fixed k clusters
HDBSCAN Hierarchical density Variable density clusters
Isolation Forest Anomaly detection Outlier identification
LOF Local outlier Density-based anomalies

Training with Optuna

# Hyperparameter optimization
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)

MLflow Tracking

mlflow.set_tracking_uri("https://dagshub.com/...")
mlflow.log_params(best_params)
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(model, "model")

🌧️ Weather Data Scraper (scripts/scrape_weather_data.py)

Historical weather data collection for ML model training

Data Sources

Source API Key? Data Available
Open-Meteo ❌ Free Historical weather since 1940
NASA FIRMS ✅ Optional Fire/heat spot detection

Collected Weather Variables

  • temperature_2m_max/min/mean
  • precipitation_sum, rain_sum
  • precipitation_hours
  • wind_speed_10m_max, wind_gusts_10m_max
  • wind_direction_10m_dominant

Usage

# Scrape last 30 days (default)
python scripts/scrape_weather_data.py

# Scrape specific date range
python scripts/scrape_weather_data.py --start 2020-01-01 --end 2024-12-31

# Scrape multiple years for training dataset
python scripts/scrape_weather_data.py --years 2020,2021,2022,2023,2024

# Include fire detection data
python scripts/scrape_weather_data.py --years 2023,2024 --fires

# Hourly resolution (default is daily)
python scripts/scrape_weather_data.py --start 2024-01-01 --end 2024-01-31 --resolution hourly

Output

datasets/weather/
├── weather_daily_2020-01-01_2020-12-31.csv
├── weather_daily_2021-01-01_2021-12-31.csv
├── weather_combined.csv  (merged file)
└── fire_detections_20241207.csv

Coverage

All 25 Sri Lankan districts with coordinates:

  • Colombo, Gampaha, Kalutara, Kandy, Matale, Nuwara Eliya
  • Galle, Matara, Hambantota, Jaffna, Kilinochchi, Mannar
  • Vavuniya, Mullaitivu, Batticaloa, Ampara, Trincomalee
  • Kurunegala, Puttalam, Anuradhapura, Polonnaruwa
  • Badulla, Monaragala, Ratnapura, Kegalle

🚀 Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Docker Desktop (for Airflow)
  • Groq API Key

Installation

# 1. Clone repository
git clone <your-repo>
cd Roger-Final

# 2. Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.\.venv\Scripts\activate   # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.template .env
# Edit .env with your API keys

# 5. Download ML models
python models/anomaly-detection/download_models.py

# 6. Launch all services
./start_services.sh       # Linux/Mac
.\start_services.ps1      # Windows

🔧 API Endpoints

REST API (FastAPI - Port 8000)

Endpoint Method Description
/api/status GET System health
/api/dashboard GET Risk metrics
/api/feed GET Latest events
/api/feeds GET All feeds with pagination
/api/feeds/by_district GET Feeds filtered by district
/api/rivernet GET River monitoring data
/api/predict POST Run anomaly predictions
/api/anomalies GET Get anomalous feeds
/api/model/status GET ML model status
/api/weather/predictions GET All district forecasts
/api/weather/predictions/{district} GET Single district
/api/weather/model/status GET Weather model info
/api/weather/historical GET 30-year climate analysis
/api/weather/threat GET National flood threat score
/api/currency/prediction GET USD/LKR next-day forecast
/api/currency/history GET Historical rates
/api/currency/model/status GET Currency model info
/api/stocks/predictions GET All CSE stock forecasts
/api/stocks/predictions/{symbol} GET Single stock prediction
/api/stocks/model/status GET Stock models info
/api/rag/chat POST Chat with RAG
/api/rag/stats GET RAG system stats
/api/rag/clear POST Clear chat history
/api/power GET CEB power/load shedding status
/api/fuel GET Current fuel prices
/api/economy GET CBSL economic indicators
/api/health GET Health alerts & dengue data
/api/commodities GET Essential goods prices
/api/water GET Water supply disruptions

WebSocket

  • ws://localhost:8000/ws - Real-time updates

⏰ Airflow Orchestration

DAG: anomaly_detection_training

start → check_records → data_ingestion → data_validation 
      → data_transformation → model_training → end

Triggers:

  • Batch threshold: 1000 new records
  • Daily fallback: Every 24 hours

Access Dashboard:

cd models/anomaly-detection
astro dev start
# Open http://localhost:8080

DAG: weather_prediction_daily

ingest_data → train_models → generate_predictions → publish_predictions

Schedule: Daily at 4:00 AM IST

Tasks:

  • Scrape Tutiempo.net for latest data
  • Train LSTM models (MLflow tracked)
  • Generate 25-district predictions
  • Save to JSON for API

DAG: currency_prediction_daily

ingest_data → train_model → generate_prediction → publish_prediction

Schedule: Daily at 4:00 AM IST

Tasks:

  • Fetch USD/LKR + indicators from yfinance
  • Train GRU model (MLflow tracked)
  • Generate next-day prediction
  • Save to JSON for API

📁 Project Structure

Roger-Ultimate/
├── src/
│   ├── graphs/                    # LangGraph definitions
│   │   ├── combinedAgentGraph.py  # Mother graph
│   │   ├── politicalAgentGraph.py
│   │   ├── economicalAgentGraph.py
│   │   ├── meteorologicalAgentGraph.py
│   │   ├── socialAgentGraph.py
│   │   ├── intelligenceAgentGraph.py
│   │   ├── dataRetrievalAgentGraph.py
│   │   └── vectorizationAgentGraph.py  # 5-step with anomaly detection
│   ├── nodes/                     # Agent implementations
│   ├── states/                    # State definitions
│   ├── llms/                      # LLM configurations
│   ├── storage/                   # ChromaDB, SQLite, Neo4j stores
│   ├── rag.py                     # RAG chatbot
│   └── utils/
│       └── utils.py               # Tools incl. FloodWatch
├── scripts/
│   └── scrape_weather_data.py     # Weather data scraper
├── models/
│   ├── anomaly-detection/         # ML Anomaly Pipeline
│   │   ├── src/
│   │   │   ├── components/        # Pipeline stages
│   │   │   ├── entity/            # Config/Artifact classes
│   │   │   ├── pipeline/          # Orchestrators
│   │   │   └── utils/             # Vectorizer, metrics
│   │   ├── dags/                  # Airflow DAGs
│   │   ├── data_schema/           # Validation schemas
│   │   ├── output/                # Trained models
│   │   └── models_cache/          # Downloaded BERT models
│   ├── weather-prediction/        # Weather ML Pipeline
│   │   ├── src/components/        # data_ingestion, model_trainer, predictor
│   │   ├── dags/                  # weather_prediction_dag.py (4 AM)
│   │   ├── artifacts/             # Trained LSTM models (.h5)
│   │   └── main.py                # CLI entry point
│   └── currency-volatility-prediction/  # Currency ML Pipeline
│       ├── src/components/        # data_ingestion, model_trainer, predictor
│       ├── dags/                  # currency_prediction_dag.py (4 AM)
│       ├── artifacts/             # Trained GRU model
│       └── main.py                # CLI entry point
├── datasets/
│   └── weather/                   # Scraped weather CSVs
├── frontend/
│   └── app/
│       ├── components/
│       │   ├── dashboard/
│       │   │   ├── AnomalyDetection.tsx
│       │   │   ├── WeatherPredictions.tsx
│       │   │   ├── CurrencyPrediction.tsx
│       │   │   ├── NationalThreatCard.tsx     # Flood threat score
│       │   │   ├── HistoricalIntel.tsx        # 30-year climate
│       │   │   └── ...
│       │   ├── map/
│       │   │   ├── MapView.tsx
│       │   │   └── SatelliteView.tsx          # Windy.com embed
│       │   ├── FloatingChatBox.tsx            # RAG chat UI
│       │   └── ...
│       └── pages/
│           └── Index.tsx                       # 7 tabs incl. SATELLITE
├── main.py                        # FastAPI backend
├── start.sh                       # Startup script
└── requirements.txt

🔐 Environment Variables

# LLM
GROQ_API_KEY=your_groq_key

# Neo4j (Knowledge Graph)
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password
NEO4J_ENABLED=true
NEO4J_DATABASE=neo4j

# ChromaDB (Vector Store)
CHROMADB_PATH=./data/chromadb
CHROMADB_COLLECTION=Roger_feeds
CHROMADB_SIMILARITY_THRESHOLD=0.85

# SQLite (Fast Cache)
SQLITE_DB_PATH=./data/cache/feeds.db

# MLflow (DagsHub)
MLFLOW_TRACKING_URI=https://dagshub.com/...
MLFLOW_TRACKING_USERNAME=...
MLFLOW_TRACKING_PASSWORD=...

# Pipeline
BATCH_THRESHOLD=1000

🧪 Testing Framework

Industry-level testing infrastructure for the agentic AI system.

Test Structure

tests/
├── conftest.py                 # Pytest fixtures and configuration
├── unit/                       # Unit tests for individual components
│   └── test_utils.py
├── integration/                # Multi-component integration tests
│   └── test_agent_routing.py
├── evaluation/                 # LLM-as-Judge evaluation tests
│   ├── agent_evaluator.py      # Evaluation harness
│   ├── adversarial_tests.py    # Prompt injection & edge cases
│   └── golden_datasets/
│       └── expected_responses.json
└── e2e/                        # End-to-end workflow tests
    └── test_full_pipeline.py

LangSmith Integration

Automatic tracing for all agent decisions when LANGSMITH_API_KEY is set.

# Add to .env
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_PROJECT=roger-intelligence  # Optional, defaults to 'roger-intelligence'

View traces: smith.langchain.com

Running Tests

# Run all tests
python run_tests.py

# Run specific test suites
python run_tests.py --unit           # Unit tests only
python run_tests.py --adversarial    # Security/adversarial tests
python run_tests.py --eval           # LLM-as-Judge evaluation
python run_tests.py --e2e            # End-to-end tests

# With coverage report
python run_tests.py --coverage

# Enable LangSmith tracing in tests
python run_tests.py --with-langsmith

Agent Evaluation Harness

The agent_evaluator.py implements the LLM-as-Judge pattern:

Metric Description
Tool Selection Accuracy Did the agent use the correct tools?
Response Quality Is the response relevant and coherent?
BLEU Score N-gram text similarity (0-1, higher = better match)
Hallucination Detection Did the agent fabricate information?
Graceful Degradation Does it handle failures properly?
# Run standalone evaluator
python tests/evaluation/agent_evaluator.py

Adversarial Testing

Tests for security and robustness:

Test Category Description
Prompt Injection Ignore instructions, jailbreak, context switching
Out-of-Domain Non-SL queries, illegal requests, impossible questions
Malformed Input Empty, XSS, SQL injection, unicode flood
Graceful Degradation API timeouts, empty responses, rate limiting

CI/CD Pipeline

GitHub Actions workflow (.github/workflows/test.yml):

on: [push, pull_request]

jobs:
  unit-tests:        # Runs on every push
  adversarial-tests: # Security tests on every push
  evaluation-tests:  # LLM evaluation on main branch only
  lint:              # Code quality checks

Required Secrets:

  • LANGSMITH_API_KEY - For evaluation test logging
  • GROQ_API_KEY - For LLM-based evaluation

🐛 Troubleshooting

FastText won't install on Windows

# Use pre-built wheel instead
pip install fasttext-wheel

BERT models downloading slowly

# Pre-download all models
python models/anomaly-detection/download_models.py

Airflow not starting

# Ensure Docker is running
docker info

# Initialize Astro project
cd models/anomaly-detection
astro dev init
astro dev start

NumPy 2.0 / ChromaDB compatibility error

# If you see "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x"
pip install "numpy<2.0"

# Or upgrade chromadb to latest
pip install --upgrade chromadb

Keras model loading error ("Could not locate function 'mse'")

# If currency/weather models fail to load with Keras 3.x
# Retrain the model - it will save in .keras format automatically
cd models/currency-volatility-prediction
python main.py --mode train

# Or for weather
cd models/weather-prediction
python main.py --mode train

📄 License

MIT License - Built for Production


🙏 Acknowledgments

  • Groq - High-speed LLM inference
  • LangGraph - Agent orchestration
  • HuggingFace - SinhalaBERTo, Tamil-BERT, DistilBERT
  • Optuna - Hyperparameter optimization
  • MLflow - Experiment tracking
  • Sri Lankan government for open data sources