--- title: Roger Intelligence Platform emoji: ⚡ colorFrom: blue colorTo: green sdk: docker pinned: false --- # 🇱🇰 Roger Intelligence Platform **Real-Time Situational Awareness for Sri Lanka** A multi-agent AI system that aggregates intelligence from **50+ data sources** to provide risk analysis and opportunity detection for businesses operating in Sri Lanka. ## 🌐 Live Demo | Component | URL | |-----------|-----| | **Frontend Dashboard** | [https://model-x-frontend-snowy.vercel.app/](https://model-x-frontend-snowy.vercel.app/) | | **Backend API** | [https://nivakaran-Roger.hf.space](https://nivakaran-Roger.hf.space) | --- ## 🎯 Key Features ✅ **5 Domain Agents + 2 Orchestrators** running in parallel: - **Social Agent** - Reddit, Twitter, Facebook, Threads, BlueSky monitoring - **Political Agent** - Gazette, Parliament, District Social Media - **Economical Agent** - CSE Stock Market + Technical Indicators (SMA, EMA, RSI, MACD) - **Meteorological Agent** - DMC Weather + RiverNet + **FloodWatch Integration** - **Intelligence Agent** - Brand Monitoring + Threat Detection + **User-Configurable Targets** - **Combined Agent (Orchestrator)** - Fan-out/Fan-in coordination, LLM filtering, feed ranking - **Data Retrieval Agent** - Web scraping orchestration with anti-bot features ✅ **Situational Awareness Dashboard**: - **CEB Power Status** - Load shedding / power outage monitoring - **Fuel Prices** - Petrol 92/95, Diesel, Kerosene (CEYPETCO) - **CBSL Economic Indicators** - Inflation, policy rates, forex reserves, USD/LKR - **Health Alerts** - Dengue case tracking, disease outbreak monitoring - **Commodity Prices** - 15 essential goods (rice, sugar, gas, eggs, etc.) - **Water Supply Status** - NWSDB disruption alerts ✅ **ML Anomaly Detection Pipeline** (Integrated into Graph): - Language-specific BERT models (Sinhala, Tamil, English) - Real-time anomaly inference on every graph cycle - Clustering (DBSCAN, KMeans, HDBSCAN) - Anomaly Detection (Isolation Forest, LOF) - MLflow + DagsHub tracking ✅ **Weather Prediction ML Pipeline**: - LSTM Neural Network (30-day sequences) - Predicts: Temperature, Rainfall, Flood Risk, Severity - 21 weather stations → 25 districts - Airflow DAG runs daily at 4 AM ✅ **Currency Prediction ML Pipeline**: - GRU Neural Network (optimized for 8GB RAM) - Predicts: USD/LKR exchange rate - Features: Technical indicators + CSE + Gold + Oil + USD Index - MLflow tracking + Airflow DAG at 4 AM ✅ **Stock Price Prediction ML Pipeline**: - Multi-Architecture: LSTM, GRU, BiLSTM, BiGRU - Optuna hyperparameter tuning (30 trials per stock) - Per-stock best model selection - 10 top CSE stocks (JKH, COMB, DIAL, HNB, etc.) ✅ **RAG-Powered Chatbot**: - Chat-history aware Q&A - Queries all ChromaDB intelligence collections - Domain filtering (political, economic, weather, social) - Floating chat UI in dashboard ✅ **Trending/Velocity Detection**: - SQLite-based topic frequency tracking (24-hour rolling window) - Momentum calculation: `current_hour / avg_last_6_hours` - Spike alerts when topic volume > 3x baseline - Integrated into Combined Agent dashboard ✅ **Real-Time Dashboard** with: - Live Intelligence Feed - Floating AI Chatbox - Weather Predictions Tab - **Live Satellite/Weather Map** (Windy.com) - **National Flood Threat Score** - **30-Year Historical Climate Analysis** - **Trending Topics & Spike Alerts** - **Enhanced Operational Indicators** (infrastructure_health, regulatory_activity, investment_climate) - Operational Risk Radar - ML Anomaly Detection Display - Market Predictions with Moving Averages - Risk & Opportunity Classification ✅ **Weather Data Scraper for ML Training**: - Open-Meteo API (free historical data) - NASA FIRMS (fire/heat detection) - All 25 districts coverage - Year-wise CSV export for model training ✅ **Operational Dashboard Metrics**: - **Logistics Friction**: Average confidence of mobility/social domain risk events - **Compliance Volatility**: Average confidence of political domain risks - **Market Instability**: Average confidence of market/economical domain risks - **Opportunity Index**: Average confidence of opportunity-classified events ✅ **Multi-District Province-Aware Event Categorization**: - Events mentioning provinces are displayed in all constituent districts - Supports: Western, Southern, Central, Northern, Eastern, Sabaragamuwa, Uva, North Western, North Central provinces - Both frontend (MapView, DistrictInfoPanel) and backend are synchronized ✅ **3-Tier Storage Architecture** with Deduplication: - **Tier 1: SQLite** - Fast hash-based exact match (microseconds) - **Tier 2: ChromaDB** - Semantic similarity search with sentence transformers (milliseconds) - **Tier 3: Neo4j Aura** - Knowledge graph for event relationships and entity tracking - Unified `StorageManager` orchestrates all backends - Deduplication prevents duplicate feeds across all domain agents --- ## 🏗️ System Architecture ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Roger Combined Graph │ │ ┌────────────────────────────────────────────────────────────────┐ │ │ │ Graph Initiator (Reset) │ │ │ └────────────────────────────────────────────────────────────────┘ │ │ │ Fan-Out │ │ ┌────────────┬────────────┼────────────┬────────────┬────────────┐ │ │ ▼ ▼ ▼ ▼ ▼ ▼ │ │ ┌──────┐ ┌──────┐ ┌──────────┐ ┌──────┐ ┌──────────┐ ┌────┐│ │ │Social│ │Econ │ │Political │ │Meteo │ │Intellig- │ │Data││ │ │Agent │ │Agent │ │Agent │ │Agent │ │ence Agent│ │Retr││ │ └──────┘ └──────┘ └──────────┘ └──────┘ └──────────┘ └────┘│ │ │ │ │ │ │ │ │ │ └────────────┴────────────┴────────────┴────────────┴────────────┘ │ │ │ Fan-In │ │ ┌─────────▼──────────┐ │ │ │ Feed Aggregator │ │ │ │ (Rank & Dedupe) │ │ │ └─────────┬──────────┘ │ │ ┌─────────▼──────────┐ │ │ │ Vectorization │ │ │ │ Agent (Optional) │ │ │ └─────────┬──────────┘ │ │ ┌─────────▼──────────┐ │ │ │ Router (Loop/End) │ │ │ └────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘ ``` --- ## 📊 Graph Implementations ### 1. Combined Agent Graph (`combinedAgentGraph.py`) **The Mother Graph** - Orchestrates all domain agents in parallel. ```mermaid graph TD A[Graph Initiator] -->|Fan-Out| B[Social Agent] A -->|Fan-Out| C[Economic Agent] A -->|Fan-Out| D[Political Agent] A -->|Fan-Out| E[Meteorological Agent] A -->|Fan-Out| F[Intelligence Agent] A -->|Fan-Out| G[Data Retrieval Agent] B -->|Fan-In| H[Feed Aggregator] C --> H D --> H E --> H F --> H G --> H H --> I[Data Refresher] I --> J{Router} J -->|Loop| A J -->|End| K[END] ``` **Key Features:** - Custom state reducers for parallel execution - Feed deduplication with content hashing - Loop control with configurable intervals - Real-time WebSocket broadcasting **Architecture Improvements (v2.1):** - **Rate Limiting**: Domain-specific rate limits prevent anti-bot detection - Twitter: 15 RPM, LinkedIn: 10 RPM, News: 60 RPM - Thread-safe semaphores for max concurrent requests - **Error Handling**: Per-agent try/catch prevents cascading failures - Failed agents return empty results, others continue - **Non-Blocking Refresh**: 60-second cycle with interruptible sleep - `threading.Event.wait()` instead of blocking `time.sleep()` ### Storage Data Flow ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ DOMAIN AGENTS (Parallel) │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ │ │ Social │ │Political │ │Economic │ │ Meteo │ │ Intelligence │ │ │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │ │ └────────────┴────────────┴────────────┴──────────────┘ │ │ │ Fan-In │ │ ┌────────────▼─────────────┐ │ │ │ CombinedAgentNode │ │ │ │ (LLM Filter + Rank) │ │ │ └────────────┬─────────────┘ │ └─────────────────────────────────┼───────────────────────────────────────────┘ │ ┌─────────────▼──────────────┐ │ StorageManager │ │ (3-Tier Deduplication) │ └─────────────┬──────────────┘ ┌───────────────────────┼──────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────────┐ │ SQLite │ │ ChromaDB │ │ Neo4j Aura │ │ (Fast Cache) │ │ (Vector Store) │ │ (Knowledge Graph) │ │ ───────────── │ │ ────────────── │ │ ─────────────────── │ │ Hash-based │ │ Semantic search │ │ Event relationships │ │ Exact match │ │ Similarity 0.85 │ │ Domain nodes │ │ ~microseconds │ │ ~milliseconds │ │ Entity tracking │ └─────────────────┘ └──────────────────┘ └─────────────────────────┘ ``` --- ### 2. Political Agent Graph (`politicalAgentGraph.py`) **3-Module Hybrid Architecture** | Module | Description | Sources | |--------|-------------|---------| | **Official Sources** | Government data | Gazette, Parliament Minutes | | **Social Media** | Political sentiment | Twitter, Facebook, Reddit (National + 25 Districts) | | **Feed Generation** | LLM Processing | Categorize → Summarize → Format | ``` ┌─────────────────────────────────────────────┐ │ Module 1: Official │ Module 2: Social │ │ ┌─────────────────┐ │ ┌───────────────┐ │ │ │ Gazette │ │ │ National │ │ │ │ Parliament │ │ │ Districts (25)│ │ │ └─────────────────┘ │ │ World Politics│ │ │ │ └───────────────┘ │ └────────────┬───────────┴────────┬──────────┘ │ Fan-In │ ▼ ▼ ┌────────────────────────────┐ │ Module 3: Feed Generation │ │ Categorize → LLM → Format │ └────────────────────────────┘ ``` --- ### 3. Economic Agent Graph (`economicalAgentGraph.py`) **Market Intelligence & Technical Analysis** | Component | Description | |-----------|-------------| | **Stock Collector** | CSE market data (200+ stocks) | | **Technical Analyzer** | SMA, EMA, RSI, MACD | | **Trend Detector** | Bullish/Bearish signals | | **Feed Generator** | Risk/Opportunity classification | **Indicators Calculated:** - Simple Moving Average (SMA-20, SMA-50) - Exponential Moving Average (EMA-12, EMA-26) - Relative Strength Index (RSI) - MACD with Signal Line --- ### 4. Meteorological Agent Graph (`meteorologicalAgentGraph.py`) **Weather & Disaster Monitoring + FloodWatch Integration** ``` ┌─────────────────────────────────────┐ │ DMC Weather Collector │ │ (Daily forecasts, 25 districts) │ └─────────────┬───────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ RiverNet Data Collector │ │ (River levels, flood monitoring) │ └─────────────┬───────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ FloodWatch Historical Data │ │ (30-year climate analysis) │ └─────────────┬───────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ National Threat Calculator │ │ (Aggregated flood risk 0-100) │ └─────────────┬───────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ Alert Generator │ │ (Severity classification) │ └─────────────────────────────────────┘ ``` **Alert Levels:** - 🟢 Normal: Standard conditions - 🟡 Advisory: Watch for developments - 🟠 Warning: Take precautions - 🔴 Critical: Immediate action required **FloodWatch Features:** | Feature | Description | |---------|-------------| | **Historical Analysis** | 30-year climate data (1995-2025) | | **Decadal Comparison** | 3 periods: 1995-2004, 2005-2014, 2015-2025 | | **National Threat Score** | 0-100 aggregated risk from rivers + alerts + season | | **High-Risk Periods** | May-Jun (SW Monsoon), Oct-Nov (NE Monsoon) | --- ### 5. Social Agent Graph (`socialAgentGraph.py`) **Multi-Platform Social Media Monitoring** | Platform | Data Source | Coverage | |----------|-------------|----------| | Reddit | PRAW API | r/srilanka, r/colombo | | Twitter/X | Nitter scraping | #SriLanka, #Colombo | | Facebook | Profile scraping | News pages | | Threads | Meta API | Trending topics | | BlueSky | AT Protocol | Political discourse | --- ### 6. Intelligence Agent Graph (`intelligenceAgentGraph.py`) **Brand & Threat Monitoring + User-Configurable Targets** ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Brand Monitor │ │ Threat Scanner │ │ User Targets │ │ - Company news │ │ - Security │ │ - Custom keys │ │ - Competitor │ │ - Compliance │ │ - User profiles │ │ - Market share │ │ - Geopolitical │ │ - Products │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ └──────────────────────┼──────────────────────┘ ▼ ┌─────────────────────┐ │ Intelligence Report │ │ (Priority ranked) │ └─────────────────────┘ ``` **User-Configurable Monitoring**: Users can define custom monitoring targets via the frontend settings panel or API: | Config Type | Description | Example | |-------------|-------------|---------| | **Keywords** | Custom search terms | "Colombo Port", "BOI Investment" | | **Products** | Products to track | "iPhone 15", "Samsung Galaxy" | | **Profiles** | Social media accounts | @CompetitorX (Twitter), CompanyY (Facebook) | **API Endpoints:** ```bash # Get current config GET /api/intel/config # Update full config POST /api/intel/config Body: {"user_keywords": ["keyword1"], "user_profiles": {"twitter": ["@account"]}, "user_products": ["Product"]} # Add single target POST /api/intel/config/add?target_type=keyword&value=Colombo+Port # Remove target DELETE /api/intel/config/remove?target_type=profile&value=CompetitorX&platform=twitter ``` **Config File**: `src/config/intel_config.json` --- ### 7. DATA Retrieval Agent Graph (`dataRetrievalAgentGraph.py`) **Web Scraping Orchestrator** **Scraping Tools Available:** - `scrape_news_site` - Generic news scraper - `scrape_cse_live` - CSE stock prices - `scrape_official_data` - Government portals - `scrape_social_media` - Multi-platform **Anti-Bot Features:** - Random delays (1-3s) - User-agent rotation - Retry with exponential backoff - Headless browser fallback --- ### 8. Vectorization Agent Graph (`vectorizationAgentGraph.py`) **6-Step Multilingual NLP Pipeline with Anomaly + Trending Detection** ``` ┌─────────────────────────────────────────────────┐ │ Step 1: Language Detection │ │ FastText + Unicode script analysis │ │ Supports: English, Sinhala (සිංහල), Tamil (தமிழ்)│ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Step 2: Text Vectorization │ │ ┌─────────────┬─────────────┬─────────────────┐ │ │ │ DistilBERT │ SinhalaBERTo│ Tamil-BERT │ │ │ │ (English) │ (Sinhala) │ (Tamil) │ │ │ └─────────────┴─────────────┴─────────────────┘ │ │ Output: 768-dim vector per text │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Step 3: Anomaly Detection (Isolation Forest) │ │ - English: ML model inference │ │ - Sinhala/Tamil: Skipped (incompatible vectors) │ │ - Outputs anomaly_score (0-1) │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Step 4: Trending Detection │ │ - Entity extraction (hashtags, proper nouns) │ │ - Momentum: current_hour / avg_last_6_hours │ │ - Spike alerts when momentum > 3x │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Step 5: Expert Summary (GroqLLM) │ │ - Opportunity & threat identification │ │ - Sentiment analysis │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Step 6: Format Output │ │ - Includes anomaly + trending in domain_insights│ └─────────────────────────────────────────────────┘ ``` **Trending Detection API Endpoints:** | Endpoint | Method | Description | |----------|--------|-------------| | `/api/trending` | GET | Get trending topics & spike alerts | | `/api/trending/topic/{topic}` | GET | Get hourly history for a topic | | `/api/trending/record` | POST | Record a topic mention (testing) | --- ### 10. Weather Prediction Pipeline (`models/weather-prediction/`) **LSTM-Based Multi-District Weather Forecasting** ``` ┌─────────────────────────────────────────────────┐ │ Data Source: Tutiempo.net (21 stations) │ │ Historical data since 1944 │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ LSTM Neural Network │ │ ┌─────────────────────────────────────────────┐ │ │ │ Input: 30-day sequence (11 features) │ │ │ │ Layer 1: LSTM(64) + BatchNorm + Dropout │ │ │ │ Layer 2: LSTM(32) + BatchNorm + Dropout │ │ │ │ Output: Dense(3) → temp_max, temp_min, rain │ │ │ └─────────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Severity Classifier │ │ - Combines temp, rainfall, flood risk │ │ - Outputs: normal/advisory/warning/critical │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Output: 25 District Predictions │ │ - Temperature (high/low °C) │ │ - Rainfall (mm + probability) │ │ - Flood risk (integrated with RiverNet) │ └─────────────────────────────────────────────────┘ ``` **Usage:** ```bash # Run full pipeline cd models/weather-prediction python main.py --mode full # Just predictions python main.py --mode predict # Train specific station python main.py --mode train --station COLOMBO ``` --- ### 11. Currency Prediction Pipeline (`models/currency-volatility-prediction/`) **GRU-Based USD/LKR Exchange Rate Forecasting** ``` ┌─────────────────────────────────────────────────┐ │ Data Sources (yfinance) │ │ - USD/LKR exchange rate │ │ - CSE stock index (correlation) │ │ - Gold, Oil prices (global factors) │ │ - USD strength index │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Feature Engineering (25+ features) │ │ - SMA, EMA, RSI, MACD, Bollinger Bands │ │ - Volatility, Momentum indicators │ │ - Temporal encoding (day/month cycles) │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ GRU Neural Network (8GB RAM optimized) │ │ ┌─────────────────────────────────────────────┐ │ │ │ Input: 30-day sequence │ │ │ │ Layer 1: GRU(64) + BatchNorm + Dropout │ │ │ │ Layer 2: GRU(32) + BatchNorm + Dropout │ │ │ │ Output: Dense(1) → next_day_rate │ │ │ └─────────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Output: USD/LKR Prediction │ │ - Current & predicted rate │ │ - Change % and direction │ │ - Volatility classification (low/medium/high) │ └─────────────────────────────────────────────────┘ ``` **Usage:** ```bash # Run full pipeline cd models/currency-volatility-prediction python main.py --mode full # Just predict python main.py --mode predict # Train GRU model python main.py --mode train --epochs 100 ``` --- ### 12. RAG Chatbot (`src/rag.py`) **Chat-History Aware Intelligence Q&A** ``` ┌─────────────────────────────────────────────────┐ │ MultiCollectionRetriever │ │ - Connects to ChromaDB intelligence collection │ │ - Roger_feeds (all agent domain feeds) │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Question Reformulation (History-Aware) │ │ - Uses last 3-5 exchanges for context │ │ - Reformulates follow-up questions │ └─────────────────┬───────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Groq LLM (llama-3.1-70b-versatile) │ │ - RAG with source citations │ │ - Domain-specific analysis │ └─────────────────────────────────────────────────┘ ``` **Usage:** ```bash # CLI mode python src/rag.py # Or via API curl -X POST http://localhost:8000/api/rag/chat \ -H "Content-Type: application/json" \ -d '{"message": "What are the latest political events?"}' ``` --- ## 🤖 ML Anomaly Detection Pipeline Located in `models/anomaly-detection/` ### Pipeline Components | Component | File | Description | |-----------|------|-------------| | Data Ingestion | `data_ingestion.py` | SQLite + CSV fetching | | Data Validation | `data_validation.py` | Schema-based validation | | Data Transformation | `data_transformation.py` | Language detection + BERT vectorization | | Model Trainer | `model_trainer.py` | Optuna + MLflow training | ### Clustering Models | Model | Type | Use Case | |-------|------|----------| | **DBSCAN** | Density-based | Noise-robust clustering | | **KMeans** | Centroid-based | Fast, fixed k clusters | | **HDBSCAN** | Hierarchical density | Variable density clusters | | **Isolation Forest** | Anomaly detection | Outlier identification | | **LOF** | Local outlier | Density-based anomalies | ### Training with Optuna ```python # Hyperparameter optimization study = optuna.create_study(direction="maximize") study.optimize(objective, n_trials=50) ``` ### MLflow Tracking ```python mlflow.set_tracking_uri("https://dagshub.com/...") mlflow.log_params(best_params) mlflow.log_metrics(metrics) mlflow.sklearn.log_model(model, "model") ``` --- ## 🌧️ Weather Data Scraper (`scripts/scrape_weather_data.py`) **Historical weather data collection for ML model training** ### Data Sources | Source | API Key? | Data Available | |--------|----------|----------------| | **Open-Meteo** | ❌ Free | Historical weather since 1940 | | **NASA FIRMS** | ✅ Optional | Fire/heat spot detection | ### Collected Weather Variables - `temperature_2m_max/min/mean` - `precipitation_sum`, `rain_sum` - `precipitation_hours` - `wind_speed_10m_max`, `wind_gusts_10m_max` - `wind_direction_10m_dominant` ### Usage ```bash # Scrape last 30 days (default) python scripts/scrape_weather_data.py # Scrape specific date range python scripts/scrape_weather_data.py --start 2020-01-01 --end 2024-12-31 # Scrape multiple years for training dataset python scripts/scrape_weather_data.py --years 2020,2021,2022,2023,2024 # Include fire detection data python scripts/scrape_weather_data.py --years 2023,2024 --fires # Hourly resolution (default is daily) python scripts/scrape_weather_data.py --start 2024-01-01 --end 2024-01-31 --resolution hourly ``` ### Output ``` datasets/weather/ ├── weather_daily_2020-01-01_2020-12-31.csv ├── weather_daily_2021-01-01_2021-12-31.csv ├── weather_combined.csv (merged file) └── fire_detections_20241207.csv ``` ### Coverage All 25 Sri Lankan districts with coordinates: - Colombo, Gampaha, Kalutara, Kandy, Matale, Nuwara Eliya - Galle, Matara, Hambantota, Jaffna, Kilinochchi, Mannar - Vavuniya, Mullaitivu, Batticaloa, Ampara, Trincomalee - Kurunegala, Puttalam, Anuradhapura, Polonnaruwa - Badulla, Monaragala, Ratnapura, Kegalle --- ## 🚀 Quick Start ### Prerequisites - Python 3.11+ - Node.js 18+ - Docker Desktop (for Airflow) - Groq API Key ### Installation ```bash # 1. Clone repository git clone cd Roger-Final # 2. Create virtual environment python -m venv .venv source .venv/bin/activate # Linux/Mac .\.venv\Scripts\activate # Windows # 3. Install dependencies pip install -r requirements.txt # 4. Configure environment cp .env.template .env # Edit .env with your API keys # 5. Download ML models python models/anomaly-detection/download_models.py # 6. Launch all services ./start_services.sh # Linux/Mac .\start_services.ps1 # Windows ``` --- ## 🔧 API Endpoints ### REST API (FastAPI - Port 8000) | Endpoint | Method | Description | |----------|--------|-------------| | `/api/status` | GET | System health | | `/api/dashboard` | GET | Risk metrics | | `/api/feed` | GET | Latest events | | `/api/feeds` | GET | All feeds with pagination | | `/api/feeds/by_district` | GET | Feeds filtered by district | | `/api/rivernet` | GET | River monitoring data | | `/api/predict` | POST | Run anomaly predictions | | `/api/anomalies` | GET | Get anomalous feeds | | `/api/model/status` | GET | ML model status | | `/api/weather/predictions` | GET | All district forecasts | | `/api/weather/predictions/{district}` | GET | Single district | | `/api/weather/model/status` | GET | Weather model info | | `/api/weather/historical` | GET | 30-year climate analysis | | `/api/weather/threat` | GET | National flood threat score | | `/api/currency/prediction` | GET | USD/LKR next-day forecast | | `/api/currency/history` | GET | Historical rates | | `/api/currency/model/status` | GET | Currency model info | | `/api/stocks/predictions` | GET | All CSE stock forecasts | | `/api/stocks/predictions/{symbol}` | GET | Single stock prediction | | `/api/stocks/model/status` | GET | Stock models info | | `/api/rag/chat` | POST | Chat with RAG | | `/api/rag/stats` | GET | RAG system stats | | `/api/rag/clear` | POST | Clear chat history | | `/api/power` | GET | CEB power/load shedding status | | `/api/fuel` | GET | Current fuel prices | | `/api/economy` | GET | CBSL economic indicators | | `/api/health` | GET | Health alerts & dengue data | | `/api/commodities` | GET | Essential goods prices | | `/api/water` | GET | Water supply disruptions | ### WebSocket - `ws://localhost:8000/ws` - Real-time updates --- ## ⏰ Airflow Orchestration ### DAG: `anomaly_detection_training` ``` start → check_records → data_ingestion → data_validation → data_transformation → model_training → end ``` **Triggers:** - Batch threshold: 1000 new records - Daily fallback: Every 24 hours **Access Dashboard:** ```bash cd models/anomaly-detection astro dev start # Open http://localhost:8080 ``` ### DAG: `weather_prediction_daily` ``` ingest_data → train_models → generate_predictions → publish_predictions ``` **Schedule:** Daily at 4:00 AM IST **Tasks:** - Scrape Tutiempo.net for latest data - Train LSTM models (MLflow tracked) - Generate 25-district predictions - Save to JSON for API ### DAG: `currency_prediction_daily` ``` ingest_data → train_model → generate_prediction → publish_prediction ``` **Schedule:** Daily at 4:00 AM IST **Tasks:** - Fetch USD/LKR + indicators from yfinance - Train GRU model (MLflow tracked) - Generate next-day prediction - Save to JSON for API --- ## 📁 Project Structure ``` Roger-Ultimate/ ├── src/ │ ├── graphs/ # LangGraph definitions │ │ ├── combinedAgentGraph.py # Mother graph │ │ ├── politicalAgentGraph.py │ │ ├── economicalAgentGraph.py │ │ ├── meteorologicalAgentGraph.py │ │ ├── socialAgentGraph.py │ │ ├── intelligenceAgentGraph.py │ │ ├── dataRetrievalAgentGraph.py │ │ └── vectorizationAgentGraph.py # 5-step with anomaly detection │ ├── nodes/ # Agent implementations │ ├── states/ # State definitions │ ├── llms/ # LLM configurations │ ├── storage/ # ChromaDB, SQLite, Neo4j stores │ ├── rag.py # RAG chatbot │ └── utils/ │ └── utils.py # Tools incl. FloodWatch ├── scripts/ │ └── scrape_weather_data.py # Weather data scraper ├── models/ │ ├── anomaly-detection/ # ML Anomaly Pipeline │ │ ├── src/ │ │ │ ├── components/ # Pipeline stages │ │ │ ├── entity/ # Config/Artifact classes │ │ │ ├── pipeline/ # Orchestrators │ │ │ └── utils/ # Vectorizer, metrics │ │ ├── dags/ # Airflow DAGs │ │ ├── data_schema/ # Validation schemas │ │ ├── output/ # Trained models │ │ └── models_cache/ # Downloaded BERT models │ ├── weather-prediction/ # Weather ML Pipeline │ │ ├── src/components/ # data_ingestion, model_trainer, predictor │ │ ├── dags/ # weather_prediction_dag.py (4 AM) │ │ ├── artifacts/ # Trained LSTM models (.h5) │ │ └── main.py # CLI entry point │ └── currency-volatility-prediction/ # Currency ML Pipeline │ ├── src/components/ # data_ingestion, model_trainer, predictor │ ├── dags/ # currency_prediction_dag.py (4 AM) │ ├── artifacts/ # Trained GRU model │ └── main.py # CLI entry point ├── datasets/ │ └── weather/ # Scraped weather CSVs ├── frontend/ │ └── app/ │ ├── components/ │ │ ├── dashboard/ │ │ │ ├── AnomalyDetection.tsx │ │ │ ├── WeatherPredictions.tsx │ │ │ ├── CurrencyPrediction.tsx │ │ │ ├── NationalThreatCard.tsx # Flood threat score │ │ │ ├── HistoricalIntel.tsx # 30-year climate │ │ │ └── ... │ │ ├── map/ │ │ │ ├── MapView.tsx │ │ │ └── SatelliteView.tsx # Windy.com embed │ │ ├── FloatingChatBox.tsx # RAG chat UI │ │ └── ... │ └── pages/ │ └── Index.tsx # 7 tabs incl. SATELLITE ├── main.py # FastAPI backend ├── start.sh # Startup script └── requirements.txt ``` --- ## 🔐 Environment Variables ```env # LLM GROQ_API_KEY=your_groq_key # Neo4j (Knowledge Graph) NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your_password NEO4J_ENABLED=true NEO4J_DATABASE=neo4j # ChromaDB (Vector Store) CHROMADB_PATH=./data/chromadb CHROMADB_COLLECTION=Roger_feeds CHROMADB_SIMILARITY_THRESHOLD=0.85 # SQLite (Fast Cache) SQLITE_DB_PATH=./data/cache/feeds.db # MLflow (DagsHub) MLFLOW_TRACKING_URI=https://dagshub.com/... MLFLOW_TRACKING_USERNAME=... MLFLOW_TRACKING_PASSWORD=... # Pipeline BATCH_THRESHOLD=1000 ``` --- ## 🧪 Testing Framework Industry-level testing infrastructure for the agentic AI system. ### Test Structure ``` tests/ ├── conftest.py # Pytest fixtures and configuration ├── unit/ # Unit tests for individual components │ └── test_utils.py ├── integration/ # Multi-component integration tests │ └── test_agent_routing.py ├── evaluation/ # LLM-as-Judge evaluation tests │ ├── agent_evaluator.py # Evaluation harness │ ├── adversarial_tests.py # Prompt injection & edge cases │ └── golden_datasets/ │ └── expected_responses.json └── e2e/ # End-to-end workflow tests └── test_full_pipeline.py ``` ### LangSmith Integration Automatic tracing for all agent decisions when `LANGSMITH_API_KEY` is set. ```env # Add to .env LANGSMITH_API_KEY=your_langsmith_api_key LANGSMITH_PROJECT=roger-intelligence # Optional, defaults to 'roger-intelligence' ``` **View traces:** [smith.langchain.com](https://smith.langchain.com/) ### Running Tests ```bash # Run all tests python run_tests.py # Run specific test suites python run_tests.py --unit # Unit tests only python run_tests.py --adversarial # Security/adversarial tests python run_tests.py --eval # LLM-as-Judge evaluation python run_tests.py --e2e # End-to-end tests # With coverage report python run_tests.py --coverage # Enable LangSmith tracing in tests python run_tests.py --with-langsmith ``` ### Agent Evaluation Harness The `agent_evaluator.py` implements the **LLM-as-Judge** pattern: | Metric | Description | |--------|-------------| | **Tool Selection Accuracy** | Did the agent use the correct tools? | | **Response Quality** | Is the response relevant and coherent? | | **BLEU Score** | N-gram text similarity (0-1, higher = better match) | | **Hallucination Detection** | Did the agent fabricate information? | | **Graceful Degradation** | Does it handle failures properly? | ```bash # Run standalone evaluator python tests/evaluation/agent_evaluator.py ``` ### Adversarial Testing Tests for security and robustness: | Test Category | Description | |--------------|-------------| | **Prompt Injection** | Ignore instructions, jailbreak, context switching | | **Out-of-Domain** | Non-SL queries, illegal requests, impossible questions | | **Malformed Input** | Empty, XSS, SQL injection, unicode flood | | **Graceful Degradation** | API timeouts, empty responses, rate limiting | ### CI/CD Pipeline GitHub Actions workflow (`.github/workflows/test.yml`): ```yaml on: [push, pull_request] jobs: unit-tests: # Runs on every push adversarial-tests: # Security tests on every push evaluation-tests: # LLM evaluation on main branch only lint: # Code quality checks ``` **Required Secrets:** - `LANGSMITH_API_KEY` - For evaluation test logging - `GROQ_API_KEY` - For LLM-based evaluation --- ## 🐛 Troubleshooting ### FastText won't install on Windows ```bash # Use pre-built wheel instead pip install fasttext-wheel ``` ### BERT models downloading slowly ```bash # Pre-download all models python models/anomaly-detection/download_models.py ``` ### Airflow not starting ```bash # Ensure Docker is running docker info # Initialize Astro project cd models/anomaly-detection astro dev init astro dev start ``` ### NumPy 2.0 / ChromaDB compatibility error ```bash # If you see "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x" pip install "numpy<2.0" # Or upgrade chromadb to latest pip install --upgrade chromadb ``` ### Keras model loading error ("Could not locate function 'mse'") ```bash # If currency/weather models fail to load with Keras 3.x # Retrain the model - it will save in .keras format automatically cd models/currency-volatility-prediction python main.py --mode train # Or for weather cd models/weather-prediction python main.py --mode train ``` --- ## 📄 License MIT License - Built for Production --- ## 🙏 Acknowledgments - **Groq** - High-speed LLM inference - **LangGraph** - Agent orchestration - **HuggingFace** - SinhalaBERTo, Tamil-BERT, DistilBERT - **Optuna** - Hyperparameter optimization - **MLflow** - Experiment tracking - Sri Lankan government for open data sources