🧱 Architectural Evolution – CNC Data Analytics System

⚠️ Preliminary Note

The following architectural proposals (Architectures A, B, and C) have been initially designed with the support of ChatGPT (OpenAI) as a reference framework for the project “Understanding from Numerical Control Driven Machining Control.”

They represent conceptual drafts and working hypotheses, not final decisions.
All elements — including component choices, data flows, and execution layers — must be reviewed, discussed, and validated by the entire project team before implementation.
Any modification, simplification, or extension should be agreed upon collectively to ensure technical consistency and feasibility.

⚙️ Architecture A — “Local Batch Pipeline”

🔹 Concept

A simple, local-first architecture for initial experimentation.
All processing is performed on the user’s PC, while data are stored on a remote PostgreSQL database.
It enables the team to explore the dataset, understand machine behavior, and prototype the analytics pipeline.

This is the starting point for the project — low complexity, minimal infrastructure, and ideal for the early development phase.

🧩 Main Components

Layer	Component	Execution / Hosting	Description
Production plant	CNC Machine	Physical system	Produces raw operational variables (axis position, spindle load, etc.)
Cloud / Server	PostgreSQL Database (raw data)	Hosted on UPM / remote server	Stores high-frequency machine signals
Local PC	Python ETL (pandas, SQL)	Executed locally in Jupyter / Spyder	Extracts and cleans data, performs simple aggregations
	Python Analytics	Executed locally in Jupyter / Spyder	Computes KPIs, operation time, energy per program, and alerts
	Streamlit Dashboard (UI)	Executed locally with Streamlit (Python web server)	Visualizes results and provides basic user interaction
User	Operator / Analyst	Local access	Explores data, applies filters, interprets results

🔄 Workflow

The CNC machine sends raw variables to the PostgreSQL database.
Local ETL scripts retrieve and clean the data.
Analytics modules compute machine indicators and alerts.
The dashboard presents results interactively.

✅ Advantages

Very simple to deploy and maintain
Ideal for small datasets or offline analysis
Full transparency and control for debugging and learning

⚠️ Limitations

Query time increases with dataset size
Manual updates required (no automation)
No real-time feedback; purely batch operation

🕒 Adoption timeline

Used in Phase 1 (project start).
Goal: validate the data model, ensure database connectivity, and develop the first analysis scripts.

📷 Architecture Diagram

Architecture A

⚡ Architecture B — “Optimized Time-Series Pipeline”

🔹 Concept

An upgraded version focusing on performance and scalability.
PostgreSQL is extended with TimescaleDB, enabling hypertables and continuous aggregation for fast time-based queries.
This allows the system to handle larger volumes of CNC data with near-real-time responsiveness.

🧩 Main Components

Layer	Component	Execution / Hosting	Description
Production plant	CNC Machine	Physical source	Streams operational data continuously
Cloud / Server	PostgreSQL (raw data)	Hosted on remote server	Stores unprocessed variables
	TimescaleDB	Hosted on cloud server – automatic aggregation	Extends PostgreSQL to manage time-series efficiently
Local PC	Python ETL	Executed locally in Jupyter / Spyder	Loads data, applies additional transformations and cleaning
	Python Analytics	Executed locally in Jupyter / Spyder	Performs KPI calculation, energy/time estimation, and alert generation
	Streamlit Dashboard (UI)	Executed locally with Streamlit	Visualizes aggregated data with time filters and metrics
User	Operator / Analyst	Local or LAN access	Uses the dashboard for insights and comparisons

🔄 Workflow

The CNC data are inserted into PostgreSQL (raw).
TimescaleDB automatically builds hypertables and continuous aggregates.
Local ETL scripts access pre-aggregated data instead of raw tables.
Analytics compute KPIs and alerts efficiently.
The dashboard displays up-to-date results.

✅ Advantages

Fast queries thanks to TimescaleDB hypertables
Continuous aggregation handled automatically on the server
Scalable to millions of records
Compatible with the same local Python workflow

⚠️ Limitations

Requires server configuration for TimescaleDB
Still partially manual (batch updates)
No true real-time streaming yet

🕒 Adoption timeline

Used in Phase 2 (mid-project).
Goal: optimize performance and manage higher data volumes once the prototype is validated.

📷 Architecture Diagram

Architecture B

🌐 Architecture C — “Streaming / Real-Time Pipeline”

🔹 Concept

The most advanced version, enabling real-time monitoring and analytics.
This architecture integrates Apache Kafka for data streaming and Spark Structured Streaming for on-the-fly processing.
It combines real-time pipelines with long-term storage and visualization.

🧩 Main Components

Layer	Component	Execution / Hosting	Description
Production plant	CNC Machine	Physical source	Continuously generates real-time signals
Cloud / Processing layer	Apache Kafka (stream)	Hosted on cloud server – streaming message broker	Receives live data, buffers, and distributes messages
	Spark Structured Streaming	Executed on cloud server – real-time processing engine	Processes Kafka streams, detects patterns and alerts, forwards results
	PostgreSQL (operational storage)	Hosted on remote server	Stores latest operational data for fast queries
	TimescaleDB (historical storage)	Hosted on remote server	Maintains long-term time-series logs and aggregates
Local PC	Python ETL	Executed locally in Jupyter / Spyder	Periodically integrates cloud data for additional analysis
	Python Analytics	Executed locally in Jupyter / Spyder	Further exploration and validation of real-time results
	Streamlit Dashboard (UI)	Executed locally with Streamlit	Displays live machine state, alerts, and KPIs in near real-time
User	Operator / Analyst	Web access	Monitors operations and system health interactively

🔄 Workflow

CNC sensors send live data streams to Apache Kafka.
Kafka buffers and forwards events to Spark Structured Streaming.
Spark aggregates and processes data in micro-batches, detecting operation cycles and anomalies.
Results are written to PostgreSQL (for fast querying) and TimescaleDB (for historical logs).
The Streamlit dashboard continuously updates with live KPIs and alerts.

✅ Advantages

Real-time data ingestion and processing
Immediate alerting and visualization
Hybrid storage (short-term + long-term)
Scalable to industrial workloads

⚠️ Limitations

Higher infrastructure complexity
Requires Kafka and Spark deployment
Needs reliable network and cloud configuration

🕒 Adoption timeline

Used in Phase 3 (final stage).
Goal: demonstrate real-time analytics capability, provide continuous monitoring, and connect the full data lifecycle (machine → cloud → user).

📷 Architecture Diagram

Architecture C

🧭 Overall Evolution Summary

Stage	Main Goal	Key Technologies	Deployment Level
A – Local Batch	Prototype, data understanding	PostgreSQL, Python (pandas), Streamlit	Local execution
B – Time-Series Optimized	Performance & scalability	TimescaleDB, Python (ETL/Analytics)	Local + Cloud
C – Streaming / Real-Time	Real-time insights & automation	Kafka, Spark, PostgreSQL, TimescaleDB	Cloud + Local UI

🔄 Progressive Implementation Strategy

Phase 1: Start with Architecture A → focus on ETL pipeline, KPIs, and dashboard.
Phase 2: Migrate to Architecture B → activate TimescaleDB and optimize queries.
Phase 3: Extend to Architecture C → integrate streaming for real-time feedback and alerts.

This progressive approach ensures that: - The team learns the full data lifecycle step by step.
- Each phase delivers a working, demonstrable system.
- The final architecture (C) aligns with modern Industry 4.0 and IoT paradigms — turning raw CNC signals into real-time, actionable intelligence.