ENSO Index Skill - Hindcast Verification

How well the forecast models have predicted El Niño and La Niña in the past, checked against what actually happened, for every start month and lead time over 1993-2026.

What am I looking at?

Each heatmap below grades how skilful a forecast model has been at predicting ENSO, by replaying its past forecasts (hindcasts) and comparing them against what was actually observed. Higher, redder numbers mean more reliable; pale, lower numbers mean less reliable.

That score is the skill - 1.0 is perfect, about 0.6+ is useful, near 0 is no skill. The heatmap below shows it for every start month and lead time.

New here? You do not need the terms below to read the maps: greener and higher means more reliable. They are explained in plain words on the Guide.

SST = sea-surface temperature, the ocean signal that defines El Niño and La Niña.

Niño 3.4 / Niño 3 / Niño 4 = standard boxes in the tropical Pacific where that SST is averaged. Niño 3.4 is the headline ENSO index; Niño 3 also carries the rainfall signal we use to flag extreme events.

Lead time (L1-L6, the rows) = how many months ahead the forecast looks. L1 = next month, L6 = six months ahead. Skill naturally fades at longer leads.

Start month (the columns) = the calendar month the forecast was launched from. The dip around boreal spring is the well-known "spring predictability barrier."

Correlation (0 to 1) = how closely the model's forecasts tracked reality across all hindcast years. 1.0 = perfect, about 0.6+ = useful, near 0 = no skill. For example, 0.85 at L3 means the model reliably anticipated ENSO three months ahead.

0.0

1.0 temporal correlation (hindcast mean vs obs)

Loading skill heatmaps…

The actual track - hindcast vs observations

A correlation number is abstract, so here is the track behind it: observed (black) versus the hindcast ensemble mean at 3-month lead (orange), 1993-2026. The model catches every major El Niño and La Niña. Switch the model in the toolbar above; pick a region below.

Observations: SST indices from COBE-SST 2 (Hirahara et al. 2014); the Niño 3 rainfall index from GPCP (Adler et al. 2003). Full citations with DOIs on the Methodology and data sources page.

Intensity discrimination - does the model get the strength right?

Correlation says the model tracks the index, but for action what matters is the intensity class. Only a handful of strong or extreme events are on record, so a per-class reliability score would be meaningless. Instead we pool to moderate vs strong + extreme and ask: is the model's forecast probability of the stronger category higher when the event truly was stronger? A large gap between the paired bars means yes. Pooled over all leads and start months.

Loading…

How to read this

Each cell is the correlation, across hindcast years, between the ensemble-mean NINO index and the observed index for that start month (column) and lead time (row).
Higher (red) = the model reliably tracks observed ENSO at that start/lead. Skill drops at longer leads and across the boreal-spring predictability barrier.
Forecast probabilities elsewhere on the site use the raw ensemble fraction in each ENSO class, with each member bias- and amplitude-corrected to the hindcast (z = (anomaly − μ_hindcast) / σ_hindcast); no statistical calibration.