Spaces:

nivakaran
/

modelx

Running

App Files Files Community

modelx / README.md

nivakaran

Upload folder using huggingface_hub

4134ab0 verified 1 day ago

preview code

raw

history blame contribute delete

47.5 kB

metadata

title: Roger Intelligence Platform
emoji: ⚡
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

🇱🇰 Roger Intelligence Platform

Real-Time Situational Awareness for Sri Lanka

A multi-agent AI system that aggregates intelligence from 50+ data sources to provide risk analysis and opportunity detection for businesses operating in Sri Lanka.

🌐 Live Demo

Component	URL
Frontend Dashboard	https://model-x-frontend-snowy.vercel.app/
Backend API	https://nivakaran-Roger.hf.space

🎯 Key Features

✅ 5 Domain Agents + 2 Orchestrators running in parallel:

Social Agent - Reddit, Twitter, Facebook, Threads, BlueSky monitoring
Political Agent - Gazette, Parliament, District Social Media
Economical Agent - CSE Stock Market + Technical Indicators (SMA, EMA, RSI, MACD)
Meteorological Agent - DMC Weather + RiverNet + FloodWatch Integration
Intelligence Agent - Brand Monitoring + Threat Detection + User-Configurable Targets
Combined Agent (Orchestrator) - Fan-out/Fan-in coordination, LLM filtering, feed ranking
Data Retrieval Agent - Web scraping orchestration with anti-bot features

✅ Situational Awareness Dashboard:

CEB Power Status - Load shedding / power outage monitoring
Fuel Prices - Petrol 92/95, Diesel, Kerosene (CEYPETCO)
CBSL Economic Indicators - Inflation, policy rates, forex reserves, USD/LKR
Health Alerts - Dengue case tracking, disease outbreak monitoring
Commodity Prices - 15 essential goods (rice, sugar, gas, eggs, etc.)
Water Supply Status - NWSDB disruption alerts

✅ ML Anomaly Detection Pipeline (Integrated into Graph):

Language-specific BERT models (Sinhala, Tamil, English)
Real-time anomaly inference on every graph cycle
Clustering (DBSCAN, KMeans, HDBSCAN)
Anomaly Detection (Isolation Forest, LOF)
MLflow + DagsHub tracking

✅ Weather Prediction ML Pipeline:

LSTM Neural Network (30-day sequences)
Predicts: Temperature, Rainfall, Flood Risk, Severity
21 weather stations → 25 districts
Airflow DAG runs daily at 4 AM

✅ Currency Prediction ML Pipeline:

GRU Neural Network (optimized for 8GB RAM)
Predicts: USD/LKR exchange rate
Features: Technical indicators + CSE + Gold + Oil + USD Index
MLflow tracking + Airflow DAG at 4 AM

✅ Stock Price Prediction ML Pipeline:

Multi-Architecture: LSTM, GRU, BiLSTM, BiGRU
Optuna hyperparameter tuning (30 trials per stock)
Per-stock best model selection
10 top CSE stocks (JKH, COMB, DIAL, HNB, etc.)

✅ RAG-Powered Chatbot:

Chat-history aware Q&A
Queries all ChromaDB intelligence collections
Domain filtering (political, economic, weather, social)
Floating chat UI in dashboard

✅ Trending/Velocity Detection:

SQLite-based topic frequency tracking (24-hour rolling window)
Momentum calculation: current_hour / avg_last_6_hours
Spike alerts when topic volume > 3x baseline
Integrated into Combined Agent dashboard

✅ Real-Time Dashboard with:

Live Intelligence Feed
Floating AI Chatbox
Weather Predictions Tab
Live Satellite/Weather Map (Windy.com)
National Flood Threat Score
30-Year Historical Climate Analysis
Trending Topics & Spike Alerts
Enhanced Operational Indicators (infrastructure_health, regulatory_activity, investment_climate)
Operational Risk Radar
ML Anomaly Detection Display
Market Predictions with Moving Averages
Risk & Opportunity Classification

✅ Weather Data Scraper for ML Training:

Open-Meteo API (free historical data)
NASA FIRMS (fire/heat detection)
All 25 districts coverage
Year-wise CSV export for model training

✅ Operational Dashboard Metrics:

Logistics Friction: Average confidence of mobility/social domain risk events
Compliance Volatility: Average confidence of political domain risks
Market Instability: Average confidence of market/economical domain risks
Opportunity Index: Average confidence of opportunity-classified events

✅ Multi-District Province-Aware Event Categorization:

Events mentioning provinces are displayed in all constituent districts
Supports: Western, Southern, Central, Northern, Eastern, Sabaragamuwa, Uva, North Western, North Central provinces
Both frontend (MapView, DistrictInfoPanel) and backend are synchronized

✅ 3-Tier Storage Architecture with Deduplication:

Tier 1: SQLite - Fast hash-based exact match (microseconds)
Tier 2: ChromaDB - Semantic similarity search with sentence transformers (milliseconds)
Tier 3: Neo4j Aura - Knowledge graph for event relationships and entity tracking
Unified StorageManager orchestrates all backends
Deduplication prevents duplicate feeds across all domain agents

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                       Roger Combined Graph                              │
│  ┌────────────────────────────────────────────────────────────────┐    │
│  │                    Graph Initiator (Reset)                      │    │
│  └────────────────────────────────────────────────────────────────┘    │
│                              │ Fan-Out                                   │
│    ┌────────────┬────────────┼────────────┬────────────┬────────────┐  │
│    ▼            ▼            ▼            ▼            ▼            ▼  │
│ ┌──────┐   ┌──────┐   ┌──────────┐   ┌──────┐   ┌──────────┐   ┌────┐│
│ │Social│   │Econ  │   │Political │   │Meteo │   │Intellig- │   │Data││
│ │Agent │   │Agent │   │Agent     │   │Agent │   │ence Agent│   │Retr││
│ └──────┘   └──────┘   └──────────┘   └──────┘   └──────────┘   └────┘│
│    │            │            │            │            │            │  │
│    └────────────┴────────────┴────────────┴────────────┴────────────┘  │
│                              │ Fan-In                                   │
│                    ┌─────────▼──────────┐                              │
│                    │   Feed Aggregator   │                              │
│                    │  (Rank & Dedupe)    │                              │
│                    └─────────┬──────────┘                              │
│                    ┌─────────▼──────────┐                              │
│                    │  Vectorization     │                          │
│                    │  Agent (Optional)  │                              │
│                    └─────────┬──────────┘                              │
│                    ┌─────────▼──────────┐                              │
│                    │  Router (Loop/End) │                              │
│                    └────────────────────┘                              │
└─────────────────────────────────────────────────────────────────────────┘

📊 Graph Implementations

1. Combined Agent Graph (`combinedAgentGraph.py`)

The Mother Graph - Orchestrates all domain agents in parallel.

graph TD
    A[Graph Initiator] -->|Fan-Out| B[Social Agent]
    A -->|Fan-Out| C[Economic Agent]
    A -->|Fan-Out| D[Political Agent]
    A -->|Fan-Out| E[Meteorological Agent]
    A -->|Fan-Out| F[Intelligence Agent]
    A -->|Fan-Out| G[Data Retrieval Agent]
    B -->|Fan-In| H[Feed Aggregator]
    C --> H
    D --> H
    E --> H
    F --> H
    G --> H
    H --> I[Data Refresher]
    I --> J{Router}
    J -->|Loop| A
    J -->|End| K[END]

Key Features:

Custom state reducers for parallel execution
Feed deduplication with content hashing
Loop control with configurable intervals
Real-time WebSocket broadcasting

Architecture Improvements (v2.1):

Rate Limiting: Domain-specific rate limits prevent anti-bot detection
- Twitter: 15 RPM, LinkedIn: 10 RPM, News: 60 RPM
- Thread-safe semaphores for max concurrent requests
Error Handling: Per-agent try/catch prevents cascading failures
- Failed agents return empty results, others continue
Non-Blocking Refresh: 60-second cycle with interruptible sleep
- threading.Event.wait() instead of blocking time.sleep()

Storage Data Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DOMAIN AGENTS (Parallel)                            │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐      │
│  │ Social   │ │Political │ │Economic  │ │  Meteo   │ │ Intelligence │      │
│  │ Agent    │ │ Agent    │ │ Agent    │ │  Agent   │ │    Agent     │      │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘      │
│       └────────────┴────────────┴────────────┴──────────────┘              │
│                                 │ Fan-In                                    │
│                    ┌────────────▼─────────────┐                            │
│                    │   CombinedAgentNode      │                            │
│                    │   (LLM Filter + Rank)    │                            │
│                    └────────────┬─────────────┘                            │
└─────────────────────────────────┼───────────────────────────────────────────┘
                                  │
                    ┌─────────────▼──────────────┐
                    │      StorageManager        │
                    │   (3-Tier Deduplication)   │
                    └─────────────┬──────────────┘
          ┌───────────────────────┼──────────────────────────┐
          │                       │                          │
          ▼                       ▼                          ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────────┐
│     SQLite      │    │    ChromaDB      │    │      Neo4j Aura         │
│   (Fast Cache)  │    │  (Vector Store)  │    │   (Knowledge Graph)     │
│  ─────────────  │    │  ──────────────  │    │  ───────────────────    │
│  Hash-based     │    │  Semantic search │    │  Event relationships    │
│  Exact match    │    │  Similarity 0.85 │    │  Domain nodes           │
│  ~microseconds  │    │  ~milliseconds   │    │  Entity tracking        │
└─────────────────┘    └──────────────────┘    └─────────────────────────┘

2. Political Agent Graph (`politicalAgentGraph.py`)

3-Module Hybrid Architecture

Module	Description	Sources
Official Sources	Government data	Gazette, Parliament Minutes
Social Media	Political sentiment	Twitter, Facebook, Reddit (National + 25 Districts)
Feed Generation	LLM Processing	Categorize → Summarize → Format

┌─────────────────────────────────────────────┐
│ Module 1: Official     │ Module 2: Social  │
│ ┌─────────────────┐    │ ┌───────────────┐ │
│ │ Gazette         │    │ │ National      │ │
│ │ Parliament      │    │ │ Districts (25)│ │
│ └─────────────────┘    │ │ World Politics│ │
│                        │ └───────────────┘ │
└────────────┬───────────┴────────┬──────────┘
             │       Fan-In       │
             ▼                    ▼
        ┌────────────────────────────┐
        │ Module 3: Feed Generation  │
        │ Categorize → LLM → Format  │
        └────────────────────────────┘

3. Economic Agent Graph (`economicalAgentGraph.py`)

Market Intelligence & Technical Analysis

Component	Description
Stock Collector	CSE market data (200+ stocks)
Technical Analyzer	SMA, EMA, RSI, MACD
Trend Detector	Bullish/Bearish signals
Feed Generator	Risk/Opportunity classification

Indicators Calculated:

Simple Moving Average (SMA-20, SMA-50)
Exponential Moving Average (EMA-12, EMA-26)
Relative Strength Index (RSI)
MACD with Signal Line

4. Meteorological Agent Graph (`meteorologicalAgentGraph.py`)

Weather & Disaster Monitoring + FloodWatch Integration

┌─────────────────────────────────────┐
│        DMC Weather Collector        │
│   (Daily forecasts, 25 districts)   │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│        RiverNet Data Collector      │
│   (River levels, flood monitoring)  │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│    FloodWatch Historical Data    │
│   (30-year climate analysis)        │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│    National Threat Calculator    │
│   (Aggregated flood risk 0-100)     │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│        Alert Generator              │
│   (Severity classification)         │
└─────────────────────────────────────┘

Alert Levels:

🟢 Normal: Standard conditions
🟡 Advisory: Watch for developments
🟠 Warning: Take precautions
🔴 Critical: Immediate action required

FloodWatch Features:

Feature	Description
Historical Analysis	30-year climate data (1995-2025)
Decadal Comparison	3 periods: 1995-2004, 2005-2014, 2015-2025
National Threat Score	0-100 aggregated risk from rivers + alerts + season
High-Risk Periods	May-Jun (SW Monsoon), Oct-Nov (NE Monsoon)

5. Social Agent Graph (`socialAgentGraph.py`)

Multi-Platform Social Media Monitoring

Platform	Data Source	Coverage
Reddit	PRAW API	r/srilanka, r/colombo
Twitter/X	Nitter scraping	#SriLanka, #Colombo
Facebook	Profile scraping	News pages
Threads	Meta API	Trending topics
BlueSky	AT Protocol	Political discourse

6. Intelligence Agent Graph (`intelligenceAgentGraph.py`)

Brand & Threat Monitoring + User-Configurable Targets

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Brand Monitor   │    │ Threat Scanner  │    │ User Targets │
│ - Company news  │    │ - Security      │    │ - Custom keys   │
│ - Competitor    │    │ - Compliance    │    │ - User profiles │
│ - Market share  │    │ - Geopolitical  │    │ - Products      │
└────────┬────────┘    └────────┬────────┘    └────────┬────────┘
         │                      │                      │
         └──────────────────────┼──────────────────────┘
                                ▼
                   ┌─────────────────────┐
                   │ Intelligence Report │
                   │ (Priority ranked)   │
                   └─────────────────────┘

User-Configurable Monitoring: Users can define custom monitoring targets via the frontend settings panel or API:

Config Type	Description	Example
Keywords	Custom search terms	"Colombo Port", "BOI Investment"
Products	Products to track	"iPhone 15", "Samsung Galaxy"
Profiles	Social media accounts	@CompetitorX (Twitter), CompanyY (Facebook)

API Endpoints:

# Get current config
GET /api/intel/config

# Update full config
POST /api/intel/config
Body: {"user_keywords": ["keyword1"], "user_profiles": {"twitter": ["@account"]}, "user_products": ["Product"]}

# Add single target
POST /api/intel/config/add?target_type=keyword&value=Colombo+Port

# Remove target
DELETE /api/intel/config/remove?target_type=profile&value=CompetitorX&platform=twitter

Config File: src/config/intel_config.json

7. DATA Retrieval Agent Graph (`dataRetrievalAgentGraph.py`)

Web Scraping Orchestrator

Scraping Tools Available:

scrape_news_site - Generic news scraper
scrape_cse_live - CSE stock prices
scrape_official_data - Government portals
scrape_social_media - Multi-platform

Anti-Bot Features:

Random delays (1-3s)
User-agent rotation
Retry with exponential backoff
Headless browser fallback

8. Vectorization Agent Graph (`vectorizationAgentGraph.py`)

6-Step Multilingual NLP Pipeline with Anomaly + Trending Detection

┌─────────────────────────────────────────────────┐
│ Step 1: Language Detection                       │
│ FastText + Unicode script analysis              │
│ Supports: English, Sinhala (සිංහල), Tamil (தமிழ்)│
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 2: Text Vectorization                       │
│ ┌─────────────┬─────────────┬─────────────────┐ │
│ │ DistilBERT  │ SinhalaBERTo│ Tamil-BERT      │ │
│ │ (English)   │ (Sinhala)   │ (Tamil)         │ │
│ └─────────────┴─────────────┴─────────────────┘ │
│ Output: 768-dim vector per text                 │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 3: Anomaly Detection (Isolation Forest)    │
│ - English: ML model inference                    │
│ - Sinhala/Tamil: Skipped (incompatible vectors) │
│ - Outputs anomaly_score (0-1)                   │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 4: Trending Detection                    │
│ - Entity extraction (hashtags, proper nouns)    │
│ - Momentum: current_hour / avg_last_6_hours     │
│ - Spike alerts when momentum > 3x               │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 5: Expert Summary (GroqLLM)                │
│ - Opportunity & threat identification           │
│ - Sentiment analysis                            │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Step 6: Format Output                           │
│ - Includes anomaly + trending in domain_insights│
└─────────────────────────────────────────────────┘

Trending Detection API Endpoints:

Endpoint	Method	Description
`/api/trending`	GET	Get trending topics & spike alerts
`/api/trending/topic/{topic}`	GET	Get hourly history for a topic
`/api/trending/record`	POST	Record a topic mention (testing)

10. Weather Prediction Pipeline (`models/weather-prediction/`)

LSTM-Based Multi-District Weather Forecasting

┌─────────────────────────────────────────────────┐
│ Data Source: Tutiempo.net (21 stations)         │
│ Historical data since 1944                       │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ LSTM Neural Network                              │
│ ┌─────────────────────────────────────────────┐ │
│ │ Input: 30-day sequence (11 features)        │ │
│ │ Layer 1: LSTM(64) + BatchNorm + Dropout     │ │
│ │ Layer 2: LSTM(32) + BatchNorm + Dropout     │ │
│ │ Output: Dense(3) → temp_max, temp_min, rain │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Severity Classifier                              │
│ - Combines temp, rainfall, flood risk           │
│ - Outputs: normal/advisory/warning/critical     │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Output: 25 District Predictions                  │
│ - Temperature (high/low °C)                     │
│ - Rainfall (mm + probability)                   │
│ - Flood risk (integrated with RiverNet)        │
└─────────────────────────────────────────────────┘

Usage:

# Run full pipeline
cd models/weather-prediction
python main.py --mode full

# Just predictions
python main.py --mode predict

# Train specific station
python main.py --mode train --station COLOMBO

11. Currency Prediction Pipeline (`models/currency-volatility-prediction/`)

GRU-Based USD/LKR Exchange Rate Forecasting

┌─────────────────────────────────────────────────┐
│ Data Sources (yfinance)                          │
│ - USD/LKR exchange rate                         │
│ - CSE stock index (correlation)                 │
│ - Gold, Oil prices (global factors)             │
│ - USD strength index                            │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Feature Engineering (25+ features)              │
│ - SMA, EMA, RSI, MACD, Bollinger Bands         │
│ - Volatility, Momentum indicators              │
│ - Temporal encoding (day/month cycles)         │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ GRU Neural Network (8GB RAM optimized)          │
│ ┌─────────────────────────────────────────────┐ │
│ │ Input: 30-day sequence                      │ │
│ │ Layer 1: GRU(64) + BatchNorm + Dropout      │ │
│ │ Layer 2: GRU(32) + BatchNorm + Dropout      │ │
│ │ Output: Dense(1) → next_day_rate            │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Output: USD/LKR Prediction                       │
│ - Current & predicted rate                      │
│ - Change % and direction                        │
│ - Volatility classification (low/medium/high)  │
└─────────────────────────────────────────────────┘

Usage:

# Run full pipeline
cd models/currency-volatility-prediction
python main.py --mode full

# Just predict
python main.py --mode predict

# Train GRU model
python main.py --mode train --epochs 100

12. RAG Chatbot (`src/rag.py`)

Chat-History Aware Intelligence Q&A

┌─────────────────────────────────────────────────┐
│ MultiCollectionRetriever                         │
│ - Connects to ChromaDB intelligence collection  │
│ - Roger_feeds (all agent domain feeds)          │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Question Reformulation (History-Aware)          │
│ - Uses last 3-5 exchanges for context           │
│ - Reformulates follow-up questions              │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│ Groq LLM (llama-3.1-70b-versatile)              │
│ - RAG with source citations                     │
│ - Domain-specific analysis                      │
└─────────────────────────────────────────────────┘

Usage:

# CLI mode
python src/rag.py

# Or via API
curl -X POST http://localhost:8000/api/rag/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What are the latest political events?"}'

🤖 ML Anomaly Detection Pipeline

Located in models/anomaly-detection/

Pipeline Components

Component	File	Description
Data Ingestion	`data_ingestion.py`	SQLite + CSV fetching
Data Validation	`data_validation.py`	Schema-based validation
Data Transformation	`data_transformation.py`	Language detection + BERT vectorization
Model Trainer	`model_trainer.py`	Optuna + MLflow training

Clustering Models

Model	Type	Use Case
DBSCAN	Density-based	Noise-robust clustering
KMeans	Centroid-based	Fast, fixed k clusters
HDBSCAN	Hierarchical density	Variable density clusters
Isolation Forest	Anomaly detection	Outlier identification
LOF	Local outlier	Density-based anomalies

Training with Optuna

# Hyperparameter optimization
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)

MLflow Tracking

mlflow.set_tracking_uri("https://dagshub.com/...")
mlflow.log_params(best_params)
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(model, "model")

🌧️ Weather Data Scraper (`scripts/scrape_weather_data.py`)

Historical weather data collection for ML model training

Data Sources

Source	API Key?	Data Available
Open-Meteo	❌ Free	Historical weather since 1940
NASA FIRMS	✅ Optional	Fire/heat spot detection

Collected Weather Variables

temperature_2m_max/min/mean
precipitation_sum, rain_sum
precipitation_hours
wind_speed_10m_max, wind_gusts_10m_max
wind_direction_10m_dominant

Usage

# Scrape last 30 days (default)
python scripts/scrape_weather_data.py

# Scrape specific date range
python scripts/scrape_weather_data.py --start 2020-01-01 --end 2024-12-31

# Scrape multiple years for training dataset
python scripts/scrape_weather_data.py --years 2020,2021,2022,2023,2024

# Include fire detection data
python scripts/scrape_weather_data.py --years 2023,2024 --fires

# Hourly resolution (default is daily)
python scripts/scrape_weather_data.py --start 2024-01-01 --end 2024-01-31 --resolution hourly

Output

datasets/weather/
├── weather_daily_2020-01-01_2020-12-31.csv
├── weather_daily_2021-01-01_2021-12-31.csv
├── weather_combined.csv  (merged file)
└── fire_detections_20241207.csv

Coverage

All 25 Sri Lankan districts with coordinates:

Colombo, Gampaha, Kalutara, Kandy, Matale, Nuwara Eliya
Galle, Matara, Hambantota, Jaffna, Kilinochchi, Mannar
Vavuniya, Mullaitivu, Batticaloa, Ampara, Trincomalee
Kurunegala, Puttalam, Anuradhapura, Polonnaruwa
Badulla, Monaragala, Ratnapura, Kegalle

🚀 Quick Start

Prerequisites

Python 3.11+
Node.js 18+
Docker Desktop (for Airflow)
Groq API Key

Installation

# 1. Clone repository
git clone <your-repo>
cd Roger-Final

# 2. Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.\.venv\Scripts\activate   # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.template .env
# Edit .env with your API keys

# 5. Download ML models
python models/anomaly-detection/download_models.py

# 6. Launch all services
./start_services.sh       # Linux/Mac
.\start_services.ps1      # Windows

🔧 API Endpoints

REST API (FastAPI - Port 8000)

Endpoint	Method	Description
`/api/status`	GET	System health
`/api/dashboard`	GET	Risk metrics
`/api/feed`	GET	Latest events
`/api/feeds`	GET	All feeds with pagination
`/api/feeds/by_district`	GET	Feeds filtered by district
`/api/rivernet`	GET	River monitoring data
`/api/predict`	POST	Run anomaly predictions
`/api/anomalies`	GET	Get anomalous feeds
`/api/model/status`	GET	ML model status
`/api/weather/predictions`	GET	All district forecasts
`/api/weather/predictions/{district}`	GET	Single district
`/api/weather/model/status`	GET	Weather model info
`/api/weather/historical`	GET	30-year climate analysis
`/api/weather/threat`	GET	National flood threat score
`/api/currency/prediction`	GET	USD/LKR next-day forecast
`/api/currency/history`	GET	Historical rates
`/api/currency/model/status`	GET	Currency model info
`/api/stocks/predictions`	GET	All CSE stock forecasts
`/api/stocks/predictions/{symbol}`	GET	Single stock prediction
`/api/stocks/model/status`	GET	Stock models info
`/api/rag/chat`	POST	Chat with RAG
`/api/rag/stats`	GET	RAG system stats
`/api/rag/clear`	POST	Clear chat history
`/api/power`	GET	CEB power/load shedding status
`/api/fuel`	GET	Current fuel prices
`/api/economy`	GET	CBSL economic indicators
`/api/health`	GET	Health alerts & dengue data
`/api/commodities`	GET	Essential goods prices
`/api/water`	GET	Water supply disruptions

WebSocket

ws://localhost:8000/ws - Real-time updates

⏰ Airflow Orchestration

DAG: `anomaly_detection_training`

start → check_records → data_ingestion → data_validation 
      → data_transformation → model_training → end

Triggers:

Batch threshold: 1000 new records
Daily fallback: Every 24 hours

Access Dashboard:

cd models/anomaly-detection
astro dev start
# Open http://localhost:8080

DAG: `weather_prediction_daily`

ingest_data → train_models → generate_predictions → publish_predictions

Schedule: Daily at 4:00 AM IST

Tasks:

Scrape Tutiempo.net for latest data
Train LSTM models (MLflow tracked)
Generate 25-district predictions
Save to JSON for API

DAG: `currency_prediction_daily`

ingest_data → train_model → generate_prediction → publish_prediction

Schedule: Daily at 4:00 AM IST

Tasks:

Fetch USD/LKR + indicators from yfinance
Train GRU model (MLflow tracked)
Generate next-day prediction
Save to JSON for API

📁 Project Structure

Roger-Ultimate/
├── src/
│   ├── graphs/                    # LangGraph definitions
│   │   ├── combinedAgentGraph.py  # Mother graph
│   │   ├── politicalAgentGraph.py
│   │   ├── economicalAgentGraph.py
│   │   ├── meteorologicalAgentGraph.py
│   │   ├── socialAgentGraph.py
│   │   ├── intelligenceAgentGraph.py
│   │   ├── dataRetrievalAgentGraph.py
│   │   └── vectorizationAgentGraph.py  # 5-step with anomaly detection
│   ├── nodes/                     # Agent implementations
│   ├── states/                    # State definitions
│   ├── llms/                      # LLM configurations
│   ├── storage/                   # ChromaDB, SQLite, Neo4j stores
│   ├── rag.py                     # RAG chatbot
│   └── utils/
│       └── utils.py               # Tools incl. FloodWatch
├── scripts/
│   └── scrape_weather_data.py     # Weather data scraper
├── models/
│   ├── anomaly-detection/         # ML Anomaly Pipeline
│   │   ├── src/
│   │   │   ├── components/        # Pipeline stages
│   │   │   ├── entity/            # Config/Artifact classes
│   │   │   ├── pipeline/          # Orchestrators
│   │   │   └── utils/             # Vectorizer, metrics
│   │   ├── dags/                  # Airflow DAGs
│   │   ├── data_schema/           # Validation schemas
│   │   ├── output/                # Trained models
│   │   └── models_cache/          # Downloaded BERT models
│   ├── weather-prediction/        # Weather ML Pipeline
│   │   ├── src/components/        # data_ingestion, model_trainer, predictor
│   │   ├── dags/                  # weather_prediction_dag.py (4 AM)
│   │   ├── artifacts/             # Trained LSTM models (.h5)
│   │   └── main.py                # CLI entry point
│   └── currency-volatility-prediction/  # Currency ML Pipeline
│       ├── src/components/        # data_ingestion, model_trainer, predictor
│       ├── dags/                  # currency_prediction_dag.py (4 AM)
│       ├── artifacts/             # Trained GRU model
│       └── main.py                # CLI entry point
├── datasets/
│   └── weather/                   # Scraped weather CSVs
├── frontend/
│   └── app/
│       ├── components/
│       │   ├── dashboard/
│       │   │   ├── AnomalyDetection.tsx
│       │   │   ├── WeatherPredictions.tsx
│       │   │   ├── CurrencyPrediction.tsx
│       │   │   ├── NationalThreatCard.tsx     # Flood threat score
│       │   │   ├── HistoricalIntel.tsx        # 30-year climate
│       │   │   └── ...
│       │   ├── map/
│       │   │   ├── MapView.tsx
│       │   │   └── SatelliteView.tsx          # Windy.com embed
│       │   ├── FloatingChatBox.tsx            # RAG chat UI
│       │   └── ...
│       └── pages/
│           └── Index.tsx                       # 7 tabs incl. SATELLITE
├── main.py                        # FastAPI backend
├── start.sh                       # Startup script
└── requirements.txt

🔐 Environment Variables

# LLM
GROQ_API_KEY=your_groq_key

# Neo4j (Knowledge Graph)
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password
NEO4J_ENABLED=true
NEO4J_DATABASE=neo4j

# ChromaDB (Vector Store)
CHROMADB_PATH=./data/chromadb
CHROMADB_COLLECTION=Roger_feeds
CHROMADB_SIMILARITY_THRESHOLD=0.85

# SQLite (Fast Cache)
SQLITE_DB_PATH=./data/cache/feeds.db

# MLflow (DagsHub)
MLFLOW_TRACKING_URI=https://dagshub.com/...
MLFLOW_TRACKING_USERNAME=...
MLFLOW_TRACKING_PASSWORD=...

# Pipeline
BATCH_THRESHOLD=1000

🧪 Testing Framework

Industry-level testing infrastructure for the agentic AI system.

Test Structure

tests/
├── conftest.py                 # Pytest fixtures and configuration
├── unit/                       # Unit tests for individual components
│   └── test_utils.py
├── integration/                # Multi-component integration tests
│   └── test_agent_routing.py
├── evaluation/                 # LLM-as-Judge evaluation tests
│   ├── agent_evaluator.py      # Evaluation harness
│   ├── adversarial_tests.py    # Prompt injection & edge cases
│   └── golden_datasets/
│       └── expected_responses.json
└── e2e/                        # End-to-end workflow tests
    └── test_full_pipeline.py

LangSmith Integration

Automatic tracing for all agent decisions when LANGSMITH_API_KEY is set.

# Add to .env
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_PROJECT=roger-intelligence  # Optional, defaults to 'roger-intelligence'

View traces: smith.langchain.com

Running Tests

# Run all tests
python run_tests.py

# Run specific test suites
python run_tests.py --unit           # Unit tests only
python run_tests.py --adversarial    # Security/adversarial tests
python run_tests.py --eval           # LLM-as-Judge evaluation
python run_tests.py --e2e            # End-to-end tests

# With coverage report
python run_tests.py --coverage

# Enable LangSmith tracing in tests
python run_tests.py --with-langsmith

Agent Evaluation Harness

The agent_evaluator.py implements the LLM-as-Judge pattern:

Metric	Description
Tool Selection Accuracy	Did the agent use the correct tools?
Response Quality	Is the response relevant and coherent?
BLEU Score	N-gram text similarity (0-1, higher = better match)
Hallucination Detection	Did the agent fabricate information?
Graceful Degradation	Does it handle failures properly?

# Run standalone evaluator
python tests/evaluation/agent_evaluator.py

Adversarial Testing

Tests for security and robustness:

Test Category	Description
Prompt Injection	Ignore instructions, jailbreak, context switching
Out-of-Domain	Non-SL queries, illegal requests, impossible questions
Malformed Input	Empty, XSS, SQL injection, unicode flood
Graceful Degradation	API timeouts, empty responses, rate limiting

CI/CD Pipeline

GitHub Actions workflow (.github/workflows/test.yml):

on: [push, pull_request]

jobs:
  unit-tests:        # Runs on every push
  adversarial-tests: # Security tests on every push
  evaluation-tests:  # LLM evaluation on main branch only
  lint:              # Code quality checks

Required Secrets:

LANGSMITH_API_KEY - For evaluation test logging
GROQ_API_KEY - For LLM-based evaluation

🐛 Troubleshooting

FastText won't install on Windows

# Use pre-built wheel instead
pip install fasttext-wheel

BERT models downloading slowly

# Pre-download all models
python models/anomaly-detection/download_models.py

Airflow not starting

# Ensure Docker is running
docker info

# Initialize Astro project
cd models/anomaly-detection
astro dev init
astro dev start

NumPy 2.0 / ChromaDB compatibility error

# If you see "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x"
pip install "numpy<2.0"

# Or upgrade chromadb to latest
pip install --upgrade chromadb

Keras model loading error ("Could not locate function 'mse'")

# If currency/weather models fail to load with Keras 3.x
# Retrain the model - it will save in .keras format automatically
cd models/currency-volatility-prediction
python main.py --mode train

# Or for weather
cd models/weather-prediction
python main.py --mode train

📄 License

MIT License - Built for Production

🙏 Acknowledgments

Groq - High-speed LLM inference
LangGraph - Agent orchestration
HuggingFace - SinhalaBERTo, Tamil-BERT, DistilBERT
Optuna - Hyperparameter optimization
MLflow - Experiment tracking
Sri Lankan government for open data sources

🇱🇰 Roger Intelligence Platform

🌐 Live Demo

🎯 Key Features

🏗️ System Architecture

📊 Graph Implementations

1. Combined Agent Graph (combinedAgentGraph.py)

Storage Data Flow

2. Political Agent Graph (politicalAgentGraph.py)

3. Economic Agent Graph (economicalAgentGraph.py)

4. Meteorological Agent Graph (meteorologicalAgentGraph.py)

5. Social Agent Graph (socialAgentGraph.py)

6. Intelligence Agent Graph (intelligenceAgentGraph.py)

7. DATA Retrieval Agent Graph (dataRetrievalAgentGraph.py)

8. Vectorization Agent Graph (vectorizationAgentGraph.py)

10. Weather Prediction Pipeline (models/weather-prediction/)

11. Currency Prediction Pipeline (models/currency-volatility-prediction/)

12. RAG Chatbot (src/rag.py)

🤖 ML Anomaly Detection Pipeline

Pipeline Components

Clustering Models

Training with Optuna

MLflow Tracking

🌧️ Weather Data Scraper (scripts/scrape_weather_data.py)

Data Sources

Collected Weather Variables

Usage

Output

Coverage

🚀 Quick Start

Prerequisites

Installation

🔧 API Endpoints

REST API (FastAPI - Port 8000)

WebSocket

⏰ Airflow Orchestration

DAG: anomaly_detection_training

DAG: weather_prediction_daily

DAG: currency_prediction_daily

📁 Project Structure

🔐 Environment Variables

🧪 Testing Framework

Test Structure

LangSmith Integration

Running Tests

Agent Evaluation Harness

Adversarial Testing

CI/CD Pipeline

🐛 Troubleshooting

FastText won't install on Windows

BERT models downloading slowly

Airflow not starting

NumPy 2.0 / ChromaDB compatibility error

Keras model loading error ("Could not locate function 'mse'")

📄 License

🙏 Acknowledgments

1. Combined Agent Graph (`combinedAgentGraph.py`)

2. Political Agent Graph (`politicalAgentGraph.py`)

3. Economic Agent Graph (`economicalAgentGraph.py`)

4. Meteorological Agent Graph (`meteorologicalAgentGraph.py`)

5. Social Agent Graph (`socialAgentGraph.py`)

6. Intelligence Agent Graph (`intelligenceAgentGraph.py`)

7. DATA Retrieval Agent Graph (`dataRetrievalAgentGraph.py`)

8. Vectorization Agent Graph (`vectorizationAgentGraph.py`)

10. Weather Prediction Pipeline (`models/weather-prediction/`)

11. Currency Prediction Pipeline (`models/currency-volatility-prediction/`)

12. RAG Chatbot (`src/rag.py`)

🌧️ Weather Data Scraper (`scripts/scrape_weather_data.py`)

DAG: `anomaly_detection_training`

DAG: `weather_prediction_daily`

DAG: `currency_prediction_daily`