Machine Learning

Last modified by s2s_wiki on 2020/05/17 10:09

Machine Learning / Artificial Intelligence for S2S prediction

There is currently a lot excitement in the weather and climate communities to explore the potential of data driven approaches based on Artificial Intelligence/Machine Learning/Deep learning for S2S prediction through, for instance, improved parameterization, improved calibration and multi-model calibration, extreme event attribution, verification... The publicly available S2S database which contains a considerable amount of data (re-forecasts and real-time forecasts from 11 operational centres) represents an ideal testbed for these data-driven methods. The SubX database provides another such opportunity (

Potential applications :
 1. Improved data assimilation (e.g. better quality control of observations)
 2. Improved parameterization (e.g. radiative schemes)
 3. Improved post-processing (model calibration, bias-correction, multi-ensemble combination...)
 4. Predictability diagnostics (e.g. teleconnections)
 5. S2S event attribution (e.g. origins of extreme events)
 6. Empirical forecasts

Links on activities of ML community 

Ongoing Community Research on Machine Learning / Artificial Intelligence for S2S prediction
1. Scripps Institute of Oceanography  (From Dr. Peter Gibson. Updated on April 1, 2020)
 The project explores the potential for modern machine learning tools to improve seasonal prediction skill of precipitation over the Western US. Modern machine learning approaches are 'data hungry' while observations are data limited (relatively short in length for the purposes of seasonal forecasting). To circumvent this issue, we train a variety of machine learning tools on perturbed initial condition climate model ensembles that span several thousands of years, then use these 'learnt' teleconnections to make seasonal predictions. We are testing a hierarchy of machine learning approaches from simple to complex: simple logistic regression, LASSO, Random Forests, Gradient Boosted decision trees, and convolutional neural networks. This project is a collaboration between researchers at Scripps CW3E and JPL, and funded by the California Department of Water Resources.  

2. Australian Bureau of Meteorology (From Catherine de Burgh-Day​ with Oscar Alves and Debbie Hudson. Updated on April 2, 2020))

We are at the early stages of work developing a ML-based vegetation model which uses outputs of the Bureau's seasonal prediction ACCESS-S.The purpose of this work is twofold:  

  • Investigate the possibility of making predictions of vegetation ​characteristics in the coming weeks and seasons using model outputs as predictors. Forecasts of vegetation could have potential use for a number of sectors including fire agencies and agriculture 
  • Attempt to use the vegetation model we develop to periodically update the vegetation ancillary file used in model runs. Currently ACCESS-S1 uses a static vegetation file. We plan to investigate what possible skill gains could be got from a more dynamic representation of vegetation, and then to try updating the vegetation ancillary of the model every N timesteps by passing it through our vegetation model, along with the latest model parameters. 

We intend to start by trying an LSTM Neural Network for the vegetation model, potentially also including some convolutional layers. We will however be investigating what is most effective as we go. Initially we will be training using the ACCESS-S1 hindcast, however if a larger training set is needed we may investigate using a larger set to train, followed by transfer learning techniques to update the model to ACCESS-S. 

3. APEC Climate Center (From Dr. Hyung Jin Kim with Dr. Uran Chung and Dr. Kyungwon Park. Updated on April 3, 2020)
Our project is to develop a deep learning ensemble technique to improve subseasonal forecast over the Korean Peninsula. Deep learning is now recognized as a technique to improve climate forecasting, especially subseasonal climate prediction; however there is a limit to the application of deep learning due to insufficiency in size of subseasonal forecast data to train and test for deep learning models. Therefore, we are testing ensemble techniques for constructing sufficient subseasonal prediction data of the Korean Peninsula from climate models, and developing the application of machine learning and various deep learning algorithms (e.g. SVM, RF, RNN, LSTM, and Convolution LSTM) to the multi-model-ensemble based-subseasonal prediction data, to improve the daily maximum and minimum temperatures, and precipitation of the Korean Peninsula.
4. NOAA (ESRL/PSD) (From Dr. Michael Scheuerer. Updated on April 3, 2020)
'Using artificial neural networks for generating probabilistic subseasonal precipitation forecasts over California'
We have NOT obtained our data from the publicly available S2S database mentioned in the email below. For this study, we have used
  • Subseasonal retrospective forecasts by the IFS ensemble, Cycle 43r3, that we retrieved from the ECMWF MARS archive system
  • The daily accumulated PRISM precipitation data set, obtained from 
  • ERA5 reanalysis data, obtained from the Copernicus Climate Change Service

ML/AI methodology used:
In our work (paper has been submitted recently) we propose two new approaches for statistical post-processing of subseasonal ensemble forecasts:

  • The first approach uses an artificial neural network to translate subseasonal IFS precipitation forecasts into reliable probabilistic forecasts of week-2, week-3, and week-4 precipitation accumulations over California
  • The second approach uses a convolutional neural network to link large-scale predictors (geopotential height and total column water over the north-eastern Pacific) calculated from ERA5 analyses to precipitation amounts over California; these relationships are then used to derive week-2, week-3, and week-4 precipitation forecasts from subseasonal IFS forecasts of these large-scale weather variables
5. Colorado State University (From Prof. Elizabeth A. Barnes. Updated on April 3, 2020)
I have multiple members of my group using ML for S2S prediction. Specifically, we are focused on interpretable neural networks - so the goal is to not only make better empirical predictions, but to also understand where the predictability is coming from. We are also working on using ML to leverage climate model information to improve observational predictions.
6. Climate Prediction Center, NOAA/NWS/NCEP (From Dr. Yun Fan with Dr. Jon Gottschalck. Updated on April 4, 2020)
Benefiting from great advances in the machine learning techniques in recent years, such as more flexible and capable machine learning algorithms and availability of big dataset, we designed a more beneficial neural network setups which enable us not only to explore nonlinear impacts from big data, but also extract more sophisticated pattern and co-variabilities relationships hidden behind the multiple dimensional predictors and predictands. Then these learned more complicate relationships and high level statistical information are used to correct the original bias corrected NOAA NCEP Climate Forecast System(CFSv2) Week 34 precipitation and 2 meter temperature forecasts. The results show that to some extent neural network techniques can clearly improve the Week 34 forecast accuracy and greatly increase the efficiency over the traditional pointwise multiple linear regression methods. The dataset currently used is the NOAA NCEP CFSv2. In the near future, we will work on the NCEP GEFS, ECMWF, CMC etc real-time data sets available here in the NOAA CPC.
The following link has our NN short paper (on page 59-63:
A paper submitted to the AMS Journal: WAF (under revision):
    Yun Fan, Vladimir Krasnopolsky, Huug van den Dool, Chung-Yu Wu and Jon Gottschalck; 2020: Using Artificial Neural Networks to Improve CFS Week 3-4 Precipitation and 2 Meter Air Temperature Forecasts. 
7. Royal Dutch Meteorological Institute (KNMI) and the Institute for Environmental Studies at the Vrije Universiteit Amsterdam (IVM) (From Chiem van Straaten. Updated on April 6, 2020)
At the Royal Dutch Meteorological Institute (KNMI) and the Institute for Environmental Studies at the Vrije Universiteit Amsterdam (IVM) we run a research project called ‘Improvement of sub-seasonal probabilistic forecasts of European high-impact weather events using machine learning techniques’. The project uses ML for post-processing and diagnostics (mainly dimension reduction and learning connections).
We evaluate whether probabilistic forecasts at the sub-seasonal timescale contain skill for surface variables in Europe (e.g. 2-meter temperature), and how this depends on scale, location and extremity. Then, for events in which some predictability is found (for hot extremes predictability is expected), we try to find their physical precursors in other variables. Ridge regression and unsupervised clustering are used for dimension reduction in SST’s, geopotential height and more.
Lastly, we combine the information on observed driving factors with information on shortcomings of the ensemble prediction systems (e.g. propagation of waves from the tropics to the mid-latitudes) to post-process the forecasts. We have experience with RF’s and CNN’s for post-processing at shorter timescales. Regarding data: forecast evaluation was done on ECMWF cycle 45r1, precursors are currently searched in ERA5, and we might apply our post-processing to the EPS’s in the S2S database.
8. National Center of Scientific Research “Demokritos” (NCSRD) (From Dr. Athanasios Sfetsos. Updated on April 9, 2020)
Generic Title: implementing a Deep Learning approach for spatial and time error correction of S2S simulation data over Greece
The current work of National Center of Scientific Research “Demokritos” (NCSRD) with respect to Machine Learning (ML) and Seasonal to Subseasonal (S2S)  is based on the temporal and spatial enhancement of S2S predictions with Deep Learning approaches. More specifically, NCSRD locally produces a S2S prediction for Greece (at very high spatial resolution of 5x5 km2) downscaled from a European wide (at 20x20 km2 grid resolution) domain. The simulations are forced by the Climate Forecast System (CFS)  model from the National Centers for Environmental Prediction (NCEP) in addition to existing datasets from the S2S database.  In order to effectively correct the error of the simulation result, a deep learning approach is tested based on a combination of Convolutional neural networks (CNN) and Recurrent Neural Network (RNN) architectures concerning the space and time domains respectively, focusing on Greece, thus enhancing the accuracy and predictability of longer S2S simulations.
9. ETH Zurich (From Prof. Daniela Domeisen. Updated on April 13, 2020)
Our ongoing project is a collaboration between ETH and the Swiss Data Science Center (SDSC), exploring the subseasonal predictability of stratospheric extreme events using data science methods.
The upper atmosphere, i.e. the stratosphere at about 12 – 50km above the Earth’s surface, provides increased predictability to Europe after extreme stratospheric events, so-called Sudden Stratospheric Warming (SSW) events. These events can provide skill over Europe for up to several weeks to months, with persistently colder than usual weather over Northern and central Europe. SSW events themselves are currently only possible to predict several days in advance. An extended prediction of SSW events would therefore significantly benefit forecasts at the surface. It is therefore crucial to understand the predictability of the stratosphere itself.
The main objectives of this project are the use of reanalysis data and the S2S prediction database to extract novel insights from this data using data science tools. A first step will be an improved classification of stratospheric events, allowing for a flexible definition that includes the predictability aspects of these events. For instance, we are building new representations of the polar vortex using non-linear dimension reduction techniques that can later be used in unsupervised clustering algorithms. In a second step, this project aims to classify remote predictors of long-term weather variability. In particular, known predictors for stratospheric and tropospheric variability will be evaluated using data science methods and possible new predictors will be identified. This knowledge is expected to lead to an improved predictability of the weather over Europe on weekly to monthly timescales.
10. ECMWF (From Dr. Michel Rixen. Updated on April 13, 2020)
Machine learning seminars:
Created by Administrator on 2020/04/01 10:19
This wiki is licensed under a Creative Commons 2.0 license
XWiki Enterprise 6.2.2 - Documentation