Wind+Forecast+Project

=Wind Forecast Project= //Proposed by Anna-Lisa Miller, Joe Baesuvan, and Quang Nguyen//

Purpose
Renewable energy sources are the next wave of energy. Currently, many different companies are investigating and using windmills to produce energy. Windmills provide an energy source that is sustainable and does not harm the environment. The problem with wind energy is that it is difficult to predict the wind. When windmills are producing energy, other energy sources do not be used. Energy is difficult to store, so energy produced by windmills needs to be used as it is produced. Sudden changes in the wind affect an energy company’s ability to maintain adequate energy levels.

Our group project will be to build a model to predict wind speed. We will use many varying ways to evaluate the data, as well as different predictors. Our goal is to build a model that more accurately predicts wind than models that are currently being used. We will try to build two types of models, one predicting if it is windy (a yes or no response), and one predicting wind speed. Using these predictions, both energy companies and windmill producers can better predict the amount of energy that a windmill will produce at any given time.

Variable Control
The data being used is weather data from Portland, Oregon. It contains hourly weather conditions for the last 5 years. This is over 42,000 rows of data. The data includes date, time, temperature, dew point, humidity, sea level pressure, visibility, wind direction, wind speed, gust speed, precipitation, events, and conditions. Of this data, we will not be using wind direction or gust speed, as those variables presuppose that it is windy. Our out variable will be wind speed.

We are using only 5 years of data because we assume that climate change, such as global warming, will make a larger range of data less relevant to current weather conditions.

Hypothesis
Our first hypothesis is that we will be able to predict the wind speed from the set of variables that we collected from the kgw.com website, which contains meteorological data from Portland, OR. This is a reasonable hypothesis because our research shows that experts also use the same set of meteorological variables. Secondly, since the question “when is there wind?” is significantly important, we hope to derive from the data set a pattern of when it is windy. Lastly, we assume our models will forecast the wind reliably enough to be used by wind energy firms.

Data Collection
Since weather condition is dramatically different from area to area, we decided to working on data set from kwg.com website. We wrote an automatic bot (a small Java script) to collect data from kwg.com website. Below is the sample of data on Apr 13, 2010.

As the process is automatic, the chance of error in collecting data is zero.

Procedure
Cross Industry Standard Process for Data Mining (CRISP-DM) is strictly followed to make sure the process is easy to duplicate and reliable to apply. Below is a brief outline of procedure.

Approaching this issue begins by understanding why wind prediction is important. As a group, we strongly believe that better wind prediction model will help wind energy companies utilize their resource more efficiently and effectively.
 * Business Understanding**

Before collecting data, each variable was evaluated to make sure we understood its meaning and relationship to our outcome prediction, wind speed.
 * Data Understanding**

Unnecessary data will be cleaned using standard data preparation tools found in SPSS.
 * Data Preparation**

Various models provided by SPSS Modeler will be utilized to come up with the most accurate models.
 * Modeling**

Comparison between training, validation, and testing sets of data will be used to evaluate how well the models perform. If necessary, more updated data from kgw.com will be collected to further test our models.
 * Evaluation**

At this phase, provided that the models are reliable enough, any Portland based wind energy firms could use the models. Additionally, the general principals used to develop the models can be used with additional data from other sites, to build models to predict wind speed in those areas.
 * Deployment**

Diagram
To predict wind speed, input variables used include Date, Time, Temperature, Event, Humidity, Sea Level Pressure, Visibility, Wind Direction, Gust Speed, Precipitation and Condition. In order to construct an accurate model, our team will emphasize heavily on a process of data preparation. As can be seen from Figure 1.1, preparation was begun by adjusting outliners and extremes. Second, a derive node was used to fix the skewed distribution of the humidity variable. Since the raw data is not perfect, variables which contain a high number of missing values were deleted. Auto Data prep is also used in this model construction to normalize the data. Models from SPSS Modeler will be used in attempting to predict wind speed. Additionally, varying methods of manipulating the data, including PCA and Feature Selection will be used with models to increase accuracy. Lastly, as models produce outputs indicating the importance of different variables to the model, new methods of using those variables will be used to attempt to further increase model accuracy.

Data interpretation plan
We recommend displaying the result in table format. Moreover, graphical analysis should be displayed in time graph format. It will clearly present predicted wind speed at specific date and time in the future, which will be valuable for an analyst to prepare for the change of the wind speed based on changing weather in the future. These two recommended formats are relatively easily interpreted and presented to parties involved.