SCENTINEL
AI-driven monitoring and control platform for H2S scavenging operations — from field sensors to real-time dashboards.
The Challenge
Manual Operations
H2S scavenging in oil and gas is largely manual. Operators drive to remote well sites to check chemical levels, read gauges, and adjust injection rates — costing $500-$1,000 per unnecessary truck roll.
Chemical Waste
Without real-time visibility, operators over-treat by 15-30% as a safety margin. Across hundreds of sites, this adds up to significant unnecessary chemical spend.
Scale vs. Cost
Commercial historian platforms (AVEVA PI, GE Proficy) cost $200K-$500K/year at scale. An open-source stack could deliver the same capability at a fraction of the cost — but requires custom engineering.
The Process & My Contribution
I designed and built the full platform end-to-end — from hardware selection to data pipeline to live dashboards.
Problem Scoping & Architecture Design
Analyzed the industrial monitoring problem space, mapped out tag counts (40-60 per tower), calculated data throughput at scale (25,000 writes/sec for 500 sites), and designed an edge-first architecture where field sites operate independently of cloud connectivity.
Infrastructure & Server Sizing
Spec'd an on-premises server build (EPYC CPU, 256GB DDR5 ECC, NVMe + HDD tiered storage, workstation GPU for ML training). Built the complete TCO comparison showing $69K savings over 3 years vs. equivalent cloud infrastructure.
Demo Stack Implementation
Built a working prototype using Docker Compose: Mosquitto MQTT broker, Telegraf for data ingestion, InfluxDB for time-series storage, and Grafana for dashboards. Created a Python simulator that generates realistic tower telemetry with live anomaly injection.
Business Case & Cost Analysis
Developed a vendor-verified $150K budget with line-item costs, ROI analysis (9.1-month payback at 50 sites), and a phased scaling roadmap from 10 to 5,000+ sites. Every number backed by real vendor pricing.
Architecture & Technical Deep Dive
System Architecture
SCENTINEL uses an edge-first architecture where field devices collect and buffer data locally, transmitting aggregates over cellular and switching to full-resolution bursts only during anomalies.
Edge Layer
Ruggedized edge gateways collect sensor data at 1Hz, buffer locally in SQLite for offline resilience, and publish to MQTT. Each site runs autonomously — if connectivity drops, no data is lost.
Transport Layer
MQTT provides lightweight pub/sub messaging optimized for constrained networks. 5-minute aggregates under normal conditions, 1-second bursts during excursions — keeping cellular costs to ~40MB/site/month.
Storage & Processing
InfluxDB handles time-series ingestion at 25,000+ writes/second. Tiered storage: NVMe for hot data (30-90 days), HDD array for cold compliance retention (1-7 years). All on a single on-premises server.
Visualization & Control
Grafana dashboards provide real-time monitoring, historical trending, and alerting. Bidirectional communication allows operators to send commands back to field equipment — toggle pumps, adjust setpoints — from the dashboard.
Tech Stack
Edge
Python paho-mqtt SQLiteTransport & Ingestion
MQTT Mosquitto TelegrafStorage & Compute
InfluxDB PostgreSQL DockerVisualization & ML
Grafana PyTorch scikit-learnOutcomes & Impact
Working prototype: Full Docker stack with live data flowing from simulator through MQTT to InfluxDB to Grafana dashboards.
Defensible cost model: Every line item backed by real vendor pricing — GPU options, DDR5 ECC RAM, industrial sensor costs, cellular bandwidth math.
Scalable architecture: Designed to grow from 10 sites (single server) to 5,000+ (clustered) with a clear phase-by-phase roadmap.
Open-source advantage: Entire stack built on InfluxDB OSS, Grafana OSS, Mosquitto — eliminating per-tag licensing that makes commercial platforms cost $200K-$500K/year.
Key Learnings
Building an industrial IoT platform from scratch — even as a prototype — taught me things no tutorial covers:
- Tag scaling math is real: At 50,000 tags and 1Hz, you're generating 4.3 billion data points per day. Normal computers literally freeze trying to trend that data — understanding why (RAM paging, IOPS limits, index sizes) is the difference between a system that works and one that crashes.
- Edge-first is non-negotiable: In oil and gas, cellular drops. Power fluctuates. The field site must keep operating and buffering data locally regardless of connectivity. This constraint drives every architectural decision.
- Business case before architecture: The most technically elegant system means nothing if you can't justify the spend. Leading with cost savings and ROI — not MQTT and InfluxDB — is what gets budget approved.
- Hardware sizing requires domain knowledge: You can't Google "how much RAM for an IoT server." You need to understand your specific tag count, query patterns, retention requirements, and concurrent user load to size correctly.
Future Vision
SCENTINEL was built as a prototype, but the architecture is designed for production. The roadmap includes:
- ML anomaly detection: LSTM and Transformer models trained on historical tag data to detect H2S breakthrough, equipment degradation, and chemical depletion before they become incidents.
- Predictive chemical optimization: AI-driven injection timing and dosing to reduce the 15-30% over-treatment that's standard in the industry today.
- Fleet management: OTA firmware updates, remote configuration, and health monitoring across hundreds of edge devices.
- Phase II scaling: Adding Kafka/Redpanda for stream processing and database replication at 200+ sites, with geographic distribution at 500+.