Indoor AQI Monitoring & PM2.5 Prediction

A complete machine-learning pipeline for indoor air quality analysis and edge inference, built on the Dalton multi-site indoor AQI dataset.
The project trains a lightweight binary classifier that predicts whether PM2.5 will exceed 60 µg/m³ in the next 10 minutes — and exports the model as a float32 TFLite file ready to run on an ESP32-S3 microcontroller.

Background

Indoor air quality (IAQ) is strongly linked to occupant health. PM2.5 — fine particulate matter ≤ 2.5 µm — is one of the most harmful pollutants in indoor environments.
This project:

Loads continuous multi-sensor readings from dozens of real indoor sites.
Detects sudden PM2.5 spikes and sustained degradation events.
Trains a tiny neural network that predicts impending high-PM2.5 episodes.
Exports the trained model to TFLite for low-power edge inference.

Dataset

The project uses the Dalton indoor AQI dataset, organized by site type:

Site prefix	Description
`A*`	Academic/study desks
`C*`	Classroom teacher desks
`F*`	Food-prep kitchens
`H*`	Residential homes (multiple rooms each)
`R*`	General room deployments

Each CSV file contains ~1 Hz sensor readings with these raw columns:

Raw column	Friendly name	Unit
`ts`	`timestamp`	ISO datetime
`T`	`temperature`	°C
`H`	`humidity`	% RH
`PMS2_5`	`pm25`	µg/m³
`PMS10`	`pm10`	µg/m³
`CO2`	`co2`	ppm
`VoC`	`voc`	ppb

The dataset folder (dalton-dataset-files/) is excluded from version control via .gitignore.
Place it locally at d:/Projects/AQI/dalton-dataset-files/Data/ or update DATA_ROOT in each script/notebook.

Repository Structure

.
├── aqi_pm25_predictor.ipynb   # End-to-end training + TFLite export (main notebook)
├── analyze_pm_spikes.py       # Spike & degradation-event detection + 4 plot types
├── visualize_trends.py        # PM2.5 / temperature / humidity trend visualizations
├── compare_models.py          # Numerical parity check: Keras vs TFLite outputs
├── tflite_model_test.py       # Latency + I/O sanity test for the TFLite model
├── requirements.txt           # Python dependencies
├── LICENSE
└── .github/
    └── workflows/
        └── python-ci.yml      # CI: syntax check on every push / PR

Generated at runtime (excluded from Git):

figures/           # PNG plots produced by visualization scripts
model_output/      # Trained model files, scaler_params.json

ML Pipeline Overview

CSV files (all sites)
        │
        ▼
 Load & merge → downsample to ≤ 3M rows → parse timestamps → float32 cast
        │
        ▼
 Feature selection: [temperature, humidity, pm25] + optional [voc]
        │
        ▼
 Label creation: future_pm25 (t + 600 rows) > 60 µg/m³ → binary label
        │
        ▼
 Sliding window (size = 20) → flatten → input vector (dim = 20 × F)
        │
        ▼
 StandardScaler (fit on train split only) → z-score normalization
        │
        ▼
 Dense neural network → binary_crossentropy + EarlyStopping
        │
        ▼
 TFLite float32 export → scaler_params.json (for firmware)

Model Architecture

A minimal Dense-only network designed for microcontroller deployment:

Input  (20 × F)    — flattened sliding window
  │
Dense(32, ReLU)
  │
Dense(16, ReLU)
  │
Dense(1,  Sigmoid) — P(PM2.5 will exceed threshold in 10 min)

F = number of active features (3 if VOC missing, 4 if present)
Input dim = 60 or 80 depending on VOC availability
Total parameters: ~2 400 (float32 TFLite ≈ 12–15 KB)
Training uses class-weighted binary cross-entropy and Adam (lr = 1e-3)
EarlyStopping on val_loss with patience = 5, restore_best_weights = True

Scripts & Notebook Reference

`aqi_pm25_predictor.ipynb`

The primary end-to-end workflow. Cells in order:

#	Section	What it does
1	Install	pip-installs all dependencies
2	Imports	Libraries, versions, GPU check
3	GPU config	Enables memory growth, sets `DEVICE`
4	Load data	Reads all CSVs, renames columns, caps rows, casts to float32
5	Feature selection	Drops low-coverage optional columns
6	Label creation	Creates binary `label` via 600-step forward shift of pm25
7	Sliding window	Builds flat input vectors with `make_windows()`
8	Normalize	Fits `StandardScaler` on train split, saves `scaler_params.json`
9	Build model	Defines Keras Sequential model
10	Train	Fits with class weights, EarlyStopping, learning-curve plots
11	Evaluate	Accuracy, F1, confusion matrix, probability distribution
12	TFLite export	Converts model to float32 `.tflite`
13	Deployment summary	File size, sanity-check Keras vs TFLite delta

`analyze_pm_spikes.py`

Detects and visualizes two event types across the full dataset:

Method	Description
Spike	PM2.5 rose ≥ 15 µg/m³ over the preceding 10 samples (rate-of-change)
Degradation event	Sustained period ≥ 60 s above 60 µg/m³; events within 120 s are merged

Produces 4 figures in figures/spikes/:

pm25_spikes_overview.png — full timeseries with spike markers & event shading
pm25_roc_signal.png — rate-of-change signal below the PM2.5 trace
event_statistics.png — histograms of event duration and peak PM2.5
events_per_site.png — bar chart of event count per site

Prints a per-site summary table to stdout.

`visualize_trends.py`

Generates three individual trend figures + one combined overview:

Figure	Content
`pm25_trend.png`	Raw PM2.5 + 1-min and 10-min moving averages + 60 µg/m³ threshold
`temperature_trend.png`	Temperature with short and long moving averages
`humidity_trend.png`	Relative humidity with moving averages
`combined_trends.png`	All three metrics in vertically stacked subplots

All figures are saved to figures/ at 150 DPI.

`compare_models.py`

Loads best_model.h5 and model_float32.tflite from model_output/, runs the same random input through both, and prints the numerical difference to verify conversion parity.

`tflite_model_test.py`

Loads model_float32.tflite, runs a dummy input, and reports:

Input/output tensor shapes and dtypes
Model prediction
Inference latency in milliseconds

Key Configuration Parameters

Parameter	Location	Default	Meaning
`PM25_THRESHOLD`	notebook / scripts	`60` µg/m³	WHO "Unhealthy for Sensitive Groups" boundary
`FORECAST_STEPS`	notebook	`600` rows	Prediction horizon (~10 min at 1 Hz)
`WINDOW_SIZE`	notebook	`20` rows	Sliding-window history (~20 seconds)
`MAX_ROWS_GLOBAL`	notebook (cell 8)	`3_000_000`	Cap on combined rows before training
`MAX_ROWS_PER_FILE`	notebook (cell 8)	`100_000`	Per-CSV row cap to protect peak RAM
`MAX_ROWS`	notebook (cell 16)	`1_000_000`	Stratified subsample before windowing
`SPIKE_DELTA`	analyze_pm_spikes	`15` µg/m³	Minimum PM2.5 rise to count as a spike
`SPIKE_WINDOW`	analyze_pm_spikes	`10` rows	Look-back window for rate-of-change
`MIN_EVENT_DURATION`	analyze_pm_spikes	`60` s	Minimum duration for a degradation event
`MERGE_GAP`	analyze_pm_spikes	`120` s	Gap below which two events are merged

Output Artifacts

After a full notebook run, model_output/ contains:

File	Description
`model_float32.tflite`	TFLite model, float32 weights, ready for ESP32-S3
`scaler_params.json`	z-score normalization params (mean + scale per feature) for firmware

scaler_params.json schema:

{
  "feature_cols":   ["temperature", "humidity", "pm25"],
  "window_size":    20,
  "input_dim":      60,
  "mean":           [...],
  "scale":          [...],
  "pm25_threshold": 60,
  "forecast_steps": 600
}

Setup

Requirements: Python 3.9–3.11

# 1. Clone the repo
git clone https://github.com/<your-username>/aqi.git
cd aqi

# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

GPU (optional): Install tensorflow-directml-plugin on Windows or tensorflow[and-cuda] on Linux for GPU-accelerated training.

Running the Project

Visualize sensor trends:

python visualize_trends.py

Analyze PM2.5 spikes and degradation events:

python analyze_pm_spikes.py

Train the model and export TFLite — open and run all cells in:

aqi_pm25_predictor.ipynb

Verify TFLite model after training:

python tflite_model_test.py

Compare Keras vs TFLite outputs:

python compare_models.py

Edge Deployment (ESP32-S3)

Copy model_output/model_float32.tflite to your firmware project.
Load scaler_params.json → apply z-score normalization to each incoming sensor reading before inference:
```
normalized = (raw_value − mean[i]) / scale[i]
```
Assemble a rolling window of 20 normalized readings per feature into a flat float32 array of length input_dim.
Run the TFLite interpreter; output sigmoid probability > 0.5 → predict high PM2.5 in 10 min.

The model is ~12–15 KB and runs a single inference in < 1 ms on the ESP32-S3 CPU.

Contributing

Fork the repository and create a feature branch.
Follow existing code style (PEP 8, module-level docstrings, type hints where practical).
Test your changes against at least one site's CSV before submitting a PR.
Open a pull request with a clear description of changes and motivation.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
figures		figures
model		model
model_output		model_output
reports		reports
tests		tests
visualize		visualize
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pm_spike_report.py		pm_spike_report.py
prepare.py		prepare.py
program.md		program.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Indoor AQI Monitoring & PM2.5 Prediction

Table of Contents

Background

Dataset

Repository Structure

ML Pipeline Overview

Model Architecture

Scripts & Notebook Reference

`aqi_pm25_predictor.ipynb`

`analyze_pm_spikes.py`

`visualize_trends.py`

`compare_models.py`

`tflite_model_test.py`

Key Configuration Parameters

Output Artifacts

Setup

Running the Project

Edge Deployment (ESP32-S3)

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Indoor AQI Monitoring & PM2.5 Prediction

Table of Contents

Background

Dataset

Repository Structure

ML Pipeline Overview

Model Architecture

Scripts & Notebook Reference

aqi_pm25_predictor.ipynb

analyze_pm_spikes.py

visualize_trends.py

compare_models.py

tflite_model_test.py

Key Configuration Parameters

Output Artifacts

Setup

Running the Project

Edge Deployment (ESP32-S3)

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`aqi_pm25_predictor.ipynb`

`analyze_pm_spikes.py`

`visualize_trends.py`

`compare_models.py`

`tflite_model_test.py`

Packages