Adnan Ahmad
← Back to blog
June 15, 20263 min read

Scaling Real-Time Infrastructure for Smart City Systems

In modern urban environments, managing traffic is no longer just about optimizing timing loops; it’s about digesting continuous streams of spatial, temporal, and sensor data in real time. As a Principal Software Engineer, scaling these systems presents a classic distributed systems problem: high throughput, low latency, and zero tolerance for data drops.

This article walks through the architectural decisions behind designing server-side infrastructure for real-time traffic data processing, storage, and cross-system orchestration.

The Architecture: Overview

At a high level, the pipeline consists of three core layers:

  1. Ingestion & Ingress: Receiving spatial data and sensor outputs (including LiDAR point clouds and traffic camera events).
  2. Stream Processing (ETL): Extracting, transforming, and filtering data fields dynamically.
  3. AI Inference Orchestration: Routing processed packets to specialized deep learning networks for object recognition, speed analysis, and anomaly detection.
[LiDAR / Camera Sensors]
           │
           ▼
[Ingestion Gateways (gRPC)]
           │
           ▼
 [Apache Kafka Streams]  <───►  [ETL Transformation Pipelines]
           │
           ▼
 [Model Inference Hub]   ◄───►  [GPU Worker Nodes]
           │
           ▼
  [TimescaleDB / Redis]

Designing the Stream Ingestion (ETL)

Traffic sensors send rapid updates—often hundreds of times a second. We adopted gRPC for low-latency binary serialization on the ingress gateways, bypassing HTTP/1.1 overhead.

The raw payloads are dumped into an Apache Kafka cluster, partition-keyed by intersection ID. This guarantees that messages from a single intersection are processed sequentially, which is crucial for calculating velocity vectors and object trajectories.

Here is a simplified Python model for real-time sensor ingestion, parsing coordinate fields before pushing them down the pipeline:

import json
from dataclasses import dataclass

@dataclass
class TrafficSensorFrame:
    sensor_id: str
    timestamp: int
    coordinates: list[float]  # [X, Y, Z]
    classifications: list[str]

    def to_clean_payload(self) -> str:
        # Filter noise from LiDAR point frames
        # and validate coordinate boundaries
        valid_coords = [
            c for c in self.coordinates 
            if -180.0 <= c <= 180.0
        ]
        return json.dumps({
            "sensor": self.sensor_id,
            "ts": self.timestamp,
            "coords": valid_coords,
            "classes": [c.upper() for c in self.classifications]
        })

LiDAR & Deep Learning Integration

A major challenge in smart city infrastructure is integrating 3D LiDAR point clouds with standard traffic management data.

LiDAR sensors yield dense, unstructured point files. Feeding raw point files to a deep learning model is computationally prohibitive over narrow network bands. To resolve this:

  • We implemented edge-filtering to trim point cloud boundaries to the active roadway.
  • The ETL pipeline downsamples the point density, converting the 3D grid into simplified object clusters (representing cars, pedestrians, or cyclists).
  • The dynamic cluster details are passed to deep learning models for classification and speed calculation.

Storing High-Frequency Time-Series Data

For storage, we utilized a partitioned time-series database (TimescaleDB) backed by Redis for sub-millisecond caching of active intersection states. Databases are partitioned by hour, enabling rapid query times and automated retention policies.

Key Takeaways

  1. Backpressure is Mandatory: If deep learning inference nodes slow down, ingestion must buffer. Utilizing Kafka allows us to decouple processing speed from ingestion speed.
  2. Perform Transformations Early: Clean and filter sensor data at the edge or immediately after ingress to prevent downstream network bottlenecks.
  3. Partition Strategically: Databases and streaming topics should always partition on a logical geokey (such as intersection ID) to ensure horizontal scaling.