StreamForge is an in-progress industrial data gateway for moving OT data into modern data systems without treating the edge as an afterthought.
It is designed around a simple idea: every gateway should be able to collect industrial signals, write them to a durable local stream backbone, keep running through network and sink outages, and expose enough control-plane visibility for operators and data teams to understand what is happening.
The project keeps Kafka-compatible client semantics while moving the embedded edge broker direction to Redpanda. The local development/runtime path is now Redpanda-backed, but production packaging and image-pull templates are still pending.
I studied electrical and electronics engineering, and my interests sit where industrial automation, robotics, digital twins, data engineering, and AI meet. In industrial environments, the OT/IT gap is hard to miss: PLCs, SCADA systems, sensors, and machines produce valuable signals, but getting those signals into reliable, replayable, AI-ready pipelines is still harder than it should be.
StreamForge is my attempt to bridge that gap architecturally. It treats industrial data as something that must survive bad networks, sink failures, offline periods, changing downstream systems, and operator handoffs while still remaining understandable to people working close to the equipment.
StreamForge is not a finished production product yet. It is a serious engineering project with a working core and an honest production-readiness queue.
Implemented today:
- reusable adapter, sink, and deployment objects
- protocol-aware forms for Modbus TCP, Modbus RTU, MQTT, and OPC UA
- TimescaleDB, Kafka-compatible, HTTP, and alert-routing sink paths
- gateway runtime with local stream processing, validation, aggregation, logs, health, and overflow controls
- control-plane APIs for configuration, auth, RBAC, audit, alarms, DLQ, events, aggregates, fleet, logs, and operator checks
- React operator UI for adapters, sinks, deployments, validation/test/preflight, events, aggregates, fleet, logs, alarms, DLQ, health, and users
- regression coverage across control plane, runtime, adapters, sinks, and UI workflows
Still in progress:
- production packaging for the Redpanda-backed gateway/runtime stack
- production-grade gateway onboarding and approval flow
- gateway-executed physical-device connection tests
- topology redesign into a clearer architecture/dataflow view
- fresh-stack, failure-path, and full pre-AI verification gates
- documentation cleanup so implemented behavior and planned behavior stay clearly separated
StreamForge is split into a control plane and a data plane.
- FastAPI service for configuration, auth, RBAC, audit, gateway state, and operator workflows
- PostgreSQL-backed configuration and audit state
- React/TypeScript UI for operators and engineers
- Catalog-driven configuration contracts for adapters and sinks
- Gateway runtime that runs at the edge
- Containerized protocol adapters for industrial sources
- Local Kafka-compatible stream backbone backed by Redpanda in the local dev/runtime stack
- Validator, event validator, aggregator, overflow controller, and managed sink processes
- Sink services that write to databases, HTTP endpoints, alert destinations, or customer-owned Kafka-compatible systems
The browser UI talks to the central control plane. Remote gateways call out to the control plane for configuration and heartbeat updates. The frontend does not need direct network access to gateways in different physical locations.
Industrial data should land first in a durable local stream on the gateway. Everything downstream should be replaceable and replayable.
Today this is implemented with Kafka-compatible clients and a Redpanda-backed local dev/runtime stack. Production packaging is still pending, so the current claim is local/runtime readiness rather than a finished production install model.
The control plane defines what should run. The gateway runtime makes it happen locally. If the control plane is unavailable, gateways should keep running from their last known good configuration.
StreamForge separates data into:
- telemetry: continuous measurements
- events: discrete state changes
- alarms: lifecycle-aware abnormal conditions
- logs: operational/runtime messages
That model keeps downstream systems from receiving one undifferentiated stream of industrial noise.
Configuration should be testable before deployment. Runtime state should be visible without shell access. Failures should degrade clearly instead of failing silently.
streamforge/
├── control-plane/ # FastAPI control plane
├── gateway_runtime/ # Edge runtime and local orchestration
├── adapters/ # Protocol adapter implementations
├── sinks/ # Sink service implementations
├── ui/ # React/TypeScript operator UI
├── schemas/ # Avro and JSON schemas
├── deploy/ # Local dev stack and future package templates
├── docs/ # Architecture, security, readiness, and design docs
└── copilot/ # Reserved for future AI/copilot work
The current development stack uses Redpanda as the local Kafka-compatible broker. The config keys remain Kafka-named because they describe the protocol contract, not a dependency on Apache Kafka or Confluent images.
docker compose -f deploy/dev/docker-compose.yml up -d --buildSeed the demo environment:
docker compose -f deploy/dev/docker-compose.yml --profile seed run --rm dev_bootstrapThe seed path is only a local-demo shortcut. The production-like path is now
gateway enrollment: create an enrollment token in the Gateways UI, start the
gateway with CONTROL_PLANE_ENROLLMENT_TOKEN, approve the pending gateway, then
compose and activate a deployment.
See docs/DEPLOYMENT.md for the no-seed first-run command sequence.
The UI is exposed at:
http://localhost:5000
Useful local checks:
cd control-plane && ./.venv/bin/python -m pytest
python3 -m unittest discover -s gateway_runtime/tests
python3 -m unittest discover -s adapters/tests
python3 -m unittest discover -s sinks/tests
cd ui && npm run build| Area | StreamForge | SCADA/Ignition | Cloud IoT Platforms | Custom Scripts |
|---|---|---|---|---|
| OT protocol focus | Native adapter model | Strong, often plugin-based | Limited or gateway-dependent | Manual |
| Edge autonomy | Core design goal | Varies by deployment | Often cloud-centered | Fragile |
| Local replay buffer | Built around a Kafka-compatible edge stream | Usually database/historian-centered | Cloud queue dependent | Usually absent |
| Data engineering fit | Stream-native, replayable, sink-oriented | Possible but indirect | Strong in-cloud, weaker offline | Ad hoc |
| Operator visibility | Control-plane UI and runtime health surfaces | Strong HMI/SCADA visibility | Cloud dashboards | Minimal |
| Vendor neutrality | Designed for on-prem and hybrid use | Often proprietary | Cloud lock-in risk | Depends on author |
- architecture baseline
- control-plane API baseline
- gateway runtime baseline
- Redpanda-backed local dev/runtime broker path
- reusable adapters, sinks, and deployments
- validation, test connection, and deployment preflight flows
- events, aggregates, fleet, logs, alarms, and DLQ operator views
- RBAC, audit trail, safer browser sessions, and secret-field handling
- finish production packaging and image-pull templates for the Redpanda-backed gateway/runtime stack
- implement production-grade gateway enrollment and approval
- add gateway-side physical-device verification
- improve topology into a true dataflow/architecture view
- finish responsive/readability hardening
- complete fresh-stack and failure-path verification before AI work continues
- additional industrial adapters and sink destinations
- deeper production observability and long-term log retention
- optional enterprise auth UX
- AI/copilot features after the core platform is trustworthy enough to support them
- Architecture
- Data Flow
- Deployment
- Security
- Production Readiness Reconciliation
- UI Product Action List
- Adapter Development
- Control Plane API
Licensed under the Apache License 2.0. See LICENSE.
This project is currently owner-led while the core architecture and production readiness work settle. Technical feedback, issue reports, and review comments are welcome, especially around industrial protocols, edge reliability, and operator workflow quality.
There is no commercial support channel yet. For now, use GitHub issues or my public contact channels for questions and feedback.