INTRO TO MACHINE LEARNING PROJECTS: CLIMATE DATA


Climate Data Background and summary: Climate data provides a perfect playground for experimenting with time- series models that predict a given weather metric tomorrow.

There could be many approaches to deal with this data. While the field has found that single index models are successful, you are not required to use them here. Another approach that seems to work well with this data is recurrent neural networks. We list below some references that may be of help for thinking about how to work with this data.

While the goal listed above is the most natural task, there are many other things that can be done with this data and these would also qualify for a project in this class. For example, we can predict the temperature tomorrow given the last few days’ temperature values. If you have any other ideas on projects related to this data, we would be happy to consider them as well.

Goal: Given a single daily element (metric like temperature or precipitation), your goal is to predict the value of metric on the next day or the probability that metric is over a predetermined threshold.

Input data: The following is a description of the global surface summary of day product produced by the National Climatic Data Center (NCDC) in Asheville, NC. The input data used in building these daily summaries are the Integrated Surface Data (ISD), which includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The online data files begin with 1929, and are now at the Version 7 software level.

The daily elements included in the dataset (as available from each station) are:
1. Mean temperature (.1 Fahrenheit)
2. Mean dew point (.1 Fahrenheit)
3. Mean sea level pressure (.1 mb)
4. Mean station pressure (.1 mb)
5. Mean visibility (.1 miles)
6. Mean wind speed (.1 knots)
7. Maximum sustained wind speed (.1 knots)
8. Maximum wind gust (.1 knots)
9. Maximum temperature (.1 Fahrenheit)
10. Minimum temperature (.1 Fahrenheit)
11. Precipitation amount (.01 inches)
12. Snow depth (.1 inches)
13. Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel Cloud

For the provided dataset, we preprocessed precipitation data from 2007 to 2017 – perfect for developing models to predict the probability precipitation exceeds a threshold tomorrow given the precipitation from the previous day. As stated earlier, do download more data from GSOD if you deem necessary or want to work with another element. The data set can be downloaded here: DATA
Relevant papers:
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
SILVar: Single Index Latent Variable Models
Neural Granger Causality for Nonlinear Time Series