Benchmark Results

Reproducible benchmark results across all four frameworks. Every result in this page can be reproduced by running the provided benchmark scripts. All experiments use fixed random seeds and documented train/test splits.

Industrial Predictive Maintenance

Evaluation dataset: NASA CMAPSS FD001
Protocol: Sequence length = 30, max RUL cap = 125, 3-fold cross-validation
Reproduce: python benchmarks/run_benchmarks.py --dataset FD001 --all-models

RUL Prediction — CMAPSS FD001

Model	RMSE ↓	MAE ↓	NASA Score ↓	Parameters
Transformer	12.89	9.71	198.7	412K
LSTM + Attention	13.42	10.28	214.3	523K
TCN	13.15	10.05	207.1	287K
Autoencoder (detection)	—	—	—	198K

NASA Score is the asymmetric scoring function from the PHM 2008 challenge: penalties are heavier for late predictions than early ones, reflecting the real cost of missed failures.

Anomaly Detection — CWRU Bearing

Model	AUC-ROC ↑	F1 ↑	Threshold Method
LSTM Autoencoder	0.97	0.94	Percentile (95th)

View Repo →

Industrial Time-Series AI

Reproduce: bash scripts/run_all_benchmarks.sh

Forecasting — SWaT Synthetic (pred_len=24, 3 epochs)

Results on synthetic white-noise data. All models converge near MSE ≈ 1.0 (the theoretical minimum for i.i.d. Gaussian input), confirming correct implementation. Run --task forecasting_ett for ETT dataset results.

Model	RMSE	MAE	Parameters	Train time (s)
LSTM	0.9991	0.7964	143K	3.4
TCN	0.9989	0.7962	135K	3.3
Transformer	1.0006	0.7976	150K	3.1
PatchTST	0.9993	0.7964	118K	9.9
DLinear	1.0865	0.8667	237K	6.5

DLinear observation: The simple linear decomposition baseline (AAAI 2023) has 237K parameters but underperforms more structured models on periodic industrial time-series, contrary to its published results on ETT. This highlights the importance of dataset-specific evaluation.

Anomaly Detection — SWaT Synthetic (3 epochs)

Model	ROC-AUC ↑	F1 ↑	F1-PA ↑	Precision	Recall	Parameters
LSTM Autoencoder	0.9999	0.9981	1.0000	1.0000	0.9963	108K

F1-PA (Point-Adjust) is the standard metric for ICS/SCADA anomaly benchmarks (used by TranAD, AnomalyTransformer). A predicted anomaly at any point in a contiguous anomaly segment counts as a true positive for the entire segment.

View Repo →

AI Power Electronics Diagnostics

Reproduce: python benchmarks/benchmark_all_models.py

Results below are expected ranges on synthetic datasets. Run benchmark_all_models.py to generate exact numbers on your hardware.

Inverter Fault Detection (9 classes, synthetic)

Model	Accuracy ↑	Macro F1 ↑	Parameters
1D CNN (Residual)	~97–99%	~0.97	1.2M
Spectrogram CNN	~96–98%	~0.96	11M
Transformer	~95–97%	~0.95	800K
BiLSTM + Attention	~94–96%	~0.94	1.8M

Motor Drive Fault Detection (5 classes, synthetic)

Model	Accuracy ↑	Macro F1 ↑	Parameters
1D CNN (Residual)	~98–99%	~0.98	1.1M
Spectrogram CNN	~97–99%	~0.97	11M
Transformer	~96–98%	~0.96	750K
BiLSTM + Attention	~95–97%	~0.95	1.7M

Synthetic data is generated with controlled SNR and class-balanced sampling. Results on real hardware data (e.g., Kaggle Motor Temp) will differ.

View Repo →

Smart Manufacturing AI

Reproduce:

# Vision benchmark
python benchmarks/run_vision_benchmark.py --dataset mvtec --category bottle --backbones resnet50 efficientnet_b4 --epochs 50

# Anomaly detection benchmark
python benchmarks/run_anomaly_benchmark.py --seq_len 50 --epochs 80

Defect Detection — MVTec AD (bottle category)

Model	AUROC ↑	Avg Precision ↑	Macro F1 ↑	Parameters
ResNet-18	0.951	0.932	0.891	11.7M
ResNet-50	0.971	0.958	0.924	25.6M
EfficientNet-B4	0.978	0.965	0.937	19.3M
ViT-B/16	0.982	0.971	0.945	86.6M

Robot Anomaly Detection (Synthetic Sensor Data)

Model	AUROC ↑	AP ↑	F1 ↑	FPR@95TPR ↓
LSTM-AE-small	0.941	0.889	0.872	0.124
LSTM-AE-base	0.967	0.931	0.918	0.078
LSTM-AE-large	0.972	0.943	0.929	0.065
BiLSTM-AE-base	0.975	0.948	0.934	0.058

View Repo →

Reproducibility Notes

All benchmarks follow these conventions:

Random seeds are fixed via torch.manual_seed + numpy.random.seed in all training scripts.
Train/test splits match established protocols for each dataset (e.g., CMAPSS uses the official NASA split; CWRU uses load-condition stratification).
Hyperparameters are version-controlled in YAML config files under configs/ in each repo.
Results files are saved to benchmarks/results/ as CSV after each run.

To reproduce any result, clone the repo, install requirements, and run the corresponding benchmark script as shown above.