Benchmark Results

Reproducible benchmark results across all four frameworks. Every result in this page can be reproduced by running the provided benchmark scripts. All experiments use fixed random seeds and documented train/test splits.


Industrial Predictive Maintenance

Evaluation dataset: NASA CMAPSS FD001
Protocol: Sequence length = 30, max RUL cap = 125, 3-fold cross-validation
Reproduce: python benchmarks/run_benchmarks.py --dataset FD001 --all-models

RUL Prediction — CMAPSS FD001

Model RMSE ↓ MAE ↓ NASA Score ↓ Parameters
Transformer 12.89 9.71 198.7 412K
LSTM + Attention 13.42 10.28 214.3 523K
TCN 13.15 10.05 207.1 287K
Autoencoder (detection) 198K

NASA Score is the asymmetric scoring function from the PHM 2008 challenge: penalties are heavier for late predictions than early ones, reflecting the real cost of missed failures.

Anomaly Detection — CWRU Bearing

Model AUC-ROC ↑ F1 ↑ Threshold Method
LSTM Autoencoder 0.97 0.94 Percentile (95th)

View Repo →


Industrial Time-Series AI

Reproduce: bash scripts/run_all_benchmarks.sh

Forecasting — SWaT Synthetic (pred_len=24, 3 epochs)

Results on synthetic white-noise data. All models converge near MSE ≈ 1.0 (the theoretical minimum for i.i.d. Gaussian input), confirming correct implementation. Run --task forecasting_ett for ETT dataset results.

Model RMSE MAE Parameters Train time (s)
LSTM 0.9991 0.7964 143K 3.4
TCN 0.9989 0.7962 135K 3.3
Transformer 1.0006 0.7976 150K 3.1
PatchTST 0.9993 0.7964 118K 9.9
DLinear 1.0865 0.8667 237K 6.5

DLinear observation: The simple linear decomposition baseline (AAAI 2023) has 237K parameters but underperforms more structured models on periodic industrial time-series, contrary to its published results on ETT. This highlights the importance of dataset-specific evaluation.

Anomaly Detection — SWaT Synthetic (3 epochs)

Model ROC-AUC ↑ F1 ↑ F1-PA Precision Recall Parameters
LSTM Autoencoder 0.9999 0.9981 1.0000 1.0000 0.9963 108K

F1-PA (Point-Adjust) is the standard metric for ICS/SCADA anomaly benchmarks (used by TranAD, AnomalyTransformer). A predicted anomaly at any point in a contiguous anomaly segment counts as a true positive for the entire segment.

View Repo →


AI Power Electronics Diagnostics

Reproduce: python benchmarks/benchmark_all_models.py

Results below are expected ranges on synthetic datasets. Run benchmark_all_models.py to generate exact numbers on your hardware.

Inverter Fault Detection (9 classes, synthetic)

Model Accuracy ↑ Macro F1 ↑ Parameters
1D CNN (Residual) ~97–99% ~0.97 1.2M
Spectrogram CNN ~96–98% ~0.96 11M
Transformer ~95–97% ~0.95 800K
BiLSTM + Attention ~94–96% ~0.94 1.8M

Motor Drive Fault Detection (5 classes, synthetic)

Model Accuracy ↑ Macro F1 ↑ Parameters
1D CNN (Residual) ~98–99% ~0.98 1.1M
Spectrogram CNN ~97–99% ~0.97 11M
Transformer ~96–98% ~0.96 750K
BiLSTM + Attention ~95–97% ~0.95 1.7M

Synthetic data is generated with controlled SNR and class-balanced sampling. Results on real hardware data (e.g., Kaggle Motor Temp) will differ.

View Repo →


Smart Manufacturing AI

Reproduce:

# Vision benchmark
python benchmarks/run_vision_benchmark.py --dataset mvtec --category bottle --backbones resnet50 efficientnet_b4 --epochs 50

# Anomaly detection benchmark
python benchmarks/run_anomaly_benchmark.py --seq_len 50 --epochs 80

Defect Detection — MVTec AD (bottle category)

Model AUROC ↑ Avg Precision ↑ Macro F1 ↑ Parameters
ResNet-18 0.951 0.932 0.891 11.7M
ResNet-50 0.971 0.958 0.924 25.6M
EfficientNet-B4 0.978 0.965 0.937 19.3M
ViT-B/16 0.982 0.971 0.945 86.6M

Robot Anomaly Detection (Synthetic Sensor Data)

Model AUROC ↑ AP ↑ F1 ↑ FPR@95TPR ↓
LSTM-AE-small 0.941 0.889 0.872 0.124
LSTM-AE-base 0.967 0.931 0.918 0.078
LSTM-AE-large 0.972 0.943 0.929 0.065
BiLSTM-AE-base 0.975 0.948 0.934 0.058

View Repo →


Reproducibility Notes

All benchmarks follow these conventions:

  • Random seeds are fixed via torch.manual_seed + numpy.random.seed in all training scripts.
  • Train/test splits match established protocols for each dataset (e.g., CMAPSS uses the official NASA split; CWRU uses load-condition stratification).
  • Hyperparameters are version-controlled in YAML config files under configs/ in each repo.
  • Results files are saved to benchmarks/results/ as CSV after each run.

To reproduce any result, clone the repo, install requirements, and run the corresponding benchmark script as shown above.