Published March 11, 2026 | Version v1
Dataset Open

Health Indicator Degradation Dataset

Authors/Creators

Description

To be representative of realistic problem configuration, the degradation trajectories scenarios are designed by taking into account the following considerations. Generally, the turbofan engines may degrade gradually over time based on different conditions, usage and maintenance operations performed during their lifetime. The degradation speed may also change with respect to these conditions and regarding the different components in the engine. For example the high pressure compressor degrades faster than the other components, as it is exposed to much higher temperature than the other components.

In this sense, each degradation trajectory S represents a multivariate time series of dimension 10 (number of health indicators corresponding to different components of the engine). Each component of the engine has its own specific degradation pattern and boundaries (minimum and maximum authorized values). Three different degradation speeds (i.e., slow, normal and fast) are considered. The components may degrade following a probability distribution over the three speed values and also to transition from one speed to another with a specific frequency (e.g., every 100 timesteps). Maintenance operations could also take place after certain time steps, picking a random value within the interval [200;500]. The maintenance events allow to partially recover the previous health state, which is controlled by a coefficient selected randomly in the interval [0.6;0.8].

The dataset is organized in a csv file with a header designating the name of columns. The first column is the sequence ID. There are more than 500 sequences with different lengths. The columns include

  • 10 health indicators ('deg_CmpBst_s_mapEff_in', 'deg_CmpBst_s_mapWc_in', 'deg_CmpFan_s_mapEff_in', 'deg_CmpFan_s_mapWc_in', 'deg_CmpH_s_mapEff_in', 'deg_CmpH_s_mapWc_in', 'deg_TrbH_s_mapEff_in', 'deg_TrbH_s_mapWc_in', 'deg_TrbL_s_mapEff_in', 'deg_TrbL_s_mapWc_in'),
  • timestep for each sequence
  • 3 generation-based parameters ('maintenance', 'speed_change', 'speed_strategy')
  • Sensor measurements and operational conditions for each flight phase (i.e., Cruise, Takeoff, Climb1, Climb2)
    • These variables are ('PHASE_DeckSMR__HPC_Tout', 'PHASE_DeckSMR__HP_Nmech', 'PHASE_DeckSMR__HPC_Tin', 'PHASE_DeckSMR__LPT_Tin', 'PHASE_DeckSMR__Fuel_flow', 'PHASE_DeckSMR__HPC_Pout_st', 'PHASE_DeckSMR__LP_Nmech'), where PHASE is one the above-mentioned flight phases.

In the inverse problem introduced in the article, we try to predict the 10 health indicators from 28 (4x7) sensor measurements. Where we have four flight phases and seven sensor measurements.

The operational conditions ('Cruise_DeckSMR__DTAMB', 'Cruise_DeckSMR__ALT', 'Cruise_DeckSMR__COMMAND', 'Cruise_DeckSMR__MACH', 'Cruise_DeckSMR__Convergence') are considered as constant in this study, but could also evolve in future.

Use the following python code to reorganize the data as sequences:

import pandas as pd
import numpy as np

file_name = "2000_aligned_and_clean_indexfalse.csv"
df = pd.read_csv(file_name)

values = (
    df
    .groupby("sequence_id")[list(df.columns[1:])]
    .apply(lambda x: x.to_numpy())
    .to_list()
)

sequences = np.array(values, dtype="O")
 

Files

2000_aligned_and_clean_indexfalse.csv

Files (1.2 GB)

Name Size Download all
md5:ab0b9632b65e08a419ca3e29e1175d61
1.2 GB Preview Download

Additional details

Additional titles

Alternative title (English)
Sequence-Based and multivariate HI Estimation Dataset