The main dashboard displaying key data center metrics like PUE, power consumption, and temperature.
Real-time Data Center Health & Efficiency Monitor
An automated system for real-time monitoring, anomaly detection, and efficiency optimization of data center operations.
Python · PySpark · Kafka · SQLite · Flask · Docker · SQL · Pandas · NumPy
Project Overview
This project focuses on building a complete system to monitor the health and energy efficiency of a data center in real-time. It aims to gather massive amounts of sensor data, process it instantly to calculate important metrics like Power Usage Effectiveness (PUE), identify unusual activities, and present all this information on an easy-to-understand dashboard. The main goal is to help data center managers quickly spot problems, ensure smooth operations, and make decisions that save energy and costs.
Key Features
- Real-time Data Ingestion: Continuously collects simulated sensor data (power, temperature, CPU usage) from various data center devices.
- Live Data Processing: Instantly processes incoming data streams to calculate key performance indicators and detect anomalies.
- PUE Calculation: Automatically calculates Power Usage Effectiveness (PUE) to assess data center energy efficiency in real-time.
- Anomaly Detection: Identifies and alerts on unusual sensor readings or operational thresholds that might indicate a problem (e.g., overheating servers, power spikes).
- Historical Data Storage: Stores processed metrics and anomaly alerts for long-term analysis and trend identification.
- Interactive Dashboard: Provides a user-friendly web interface to visualize current and historical data, making it easy to monitor the data center's status.
How It Works (Workflow Overview)
- A **Python** program simulates IoT sensors generating diverse data (power, temp, CPU) from servers, CRACs, and PDUs within a data center environment.
- This simulated data is immediately sent to a **Kafka** messaging system, acting as the central hub for incoming sensor readings.
- A **PySpark streaming application** continuously consumes data from Kafka, parsing JSON messages, extracting relevant metrics, and enriching the data with derived fields like room and rack information.
- The **PySpark** processor performs real-time aggregations (e.g., 5-minute averages per room, 1-minute overall data center metrics including PUE, IT power, cooling power) and applies threshold-based anomaly detection for critical parameters.
- Processed data, including aggregated metrics and anomaly alerts, is then stored into an **SQLite database** structured with dimension and fact tables for historical analysis.
- A **Flask web application** retrieves the latest processed data from the database and displays it on an interactive dashboard, providing real-time operational insights into the data center's health and efficiency.
Behind the Scenes (Technical Architecture)
- Data Generation: Custom **Python** script using `kafka-python` and `NumPy` to simulate diverse IoT sensor data streams.
- Messaging Queue: **Kafka** and **Zookeeper** set up via **Docker Compose** for high-throughput, fault-tolerant ingestion of streaming sensor data.
- Stream Processing: **PySpark Structured Streaming** for real-time data ingestion from Kafka, complex aggregations, windowing operations, and threshold-based anomaly detection.
- Data Storage: **SQLite database** designed with a star schema (dimension tables: `dim_device`, `dim_location`; fact tables: `fact_sensor_readings`, `fact_pue_metrics`, `fact_anomalies`) for efficient storage and retrieval of time-series and event data.
- Backend Logic: **Python** scripts for Kafka consumption (`pyspark_java.py`) and database interaction (`sql.py`, `sql1.py`), including functions to load processed data into the SQLite database.
- Dashboard: A **Flask** web application providing a dynamic dashboard to visualize real-time PUE, power metrics, temperature, and system status, enhancing operational visibility.
- Orchestration: **Python** scripts (`pyspark2.py` / `pyspark.py`) leveraging `subprocess` to manage the lifecycle of Docker containers, Kafka topics, and Python components, ensuring seamless system startup and shutdown.
Visual Highlights
A screenshot of the main dashboard, showing current PUE, power consumption (Total, IT), and average temperature, along with system status.
Full Tech Stack Used
- Languages: Python
- Big Data Framework: PySpark
- Messaging: Kafka, Zookeeper
- Database: SQLite
- Web Framework: Flask
- Containerization: Docker, Docker Compose
- Data Processing Libraries: Pandas, NumPy
- Orchestration: Custom Python Scripts
Learnings & Impact
This project provided hands-on experience in building a complete **real-time data pipeline**, from raw data generation to dashboard visualization. I gained significant experience with **streaming data processing** using **PySpark**, setting up and managing **Kafka** for high-throughput data ingestion, and designing effective database schemas for time-series data. The project demonstrated my ability to integrate various technologies to deliver a practical solution for **operational monitoring and efficiency optimization**. It enhanced my skills in **anomaly detection**, **data aggregation**, and presenting complex data in an understandable format for decision-makers.