Wind+Forecast+-+Final+paper

=//Wind Forecast//= //Prepared by Anna-Lisa Miller, Joe Baesuvan, and Quang Nguyen//

= = =** Executive Summary **= This paper demonstrates the process used to build a data mining model which predicts wind speed in Portland using historical meteorological data. The data is collected from [|http://weather.kgw.com], a reliable weather forecast site in the region. The data contains hourly weather information, including wind speed. Data was taken for the last five years, a time period short enough to have limited global warming affects. Outside research corroborates that the important variables used in determining the models is consistent with other developed models.

Many varying approaches were used in developing models to predict wind speed. The first model developed predicted the current wind speed using current weather data (data that did not include wind data). Understanding that the wind turbines need to be operated within a range of wind speed, wind speed was subsequently binned into a practical range. Using this range as a Boolean (windy enough? y/n), a model predicting of if it was windy was. At the same time, a model was developed which predicted future wind speed in the next 2 hours, 6 hours, and 24 hours using the current weather information.

Using overall accuracy, the model predicting future wind speed using current weather data was deemed the most practical one. The predictions of this model indicate future wind speeds, thus being able to determine if it will be windy enough. Within this model development, a neural network over-performs other models, with each neural network having more than 90% accuracy.

=** Introduction **= Renewable energy sources are the next wave of power generation. Currently, many different companies are investigating and using windmills to produce energy. Windmills provide an energy source that is sustainable and does not harm the environment. The problem with wind energy is that it is difficult to predict the wind. When windmills are producing energy, other energy sources cannot be used. Energy is difficult to store, so energy produced by windmills needs to be used as it is produced. Sudden changes in the wind affect an energy company’s ability to maintain adequate energy levels.

Current windmills are able to generate power when the wind blows above seven miles per hour (MPH). Dependable power generation could be assumed when there is a high probability of a wind speed of at least 7 MPH.

Our team will use weather data to predict future wind speed two hours in the future, six hours in the future, and 24 hours in the future. This model will be valuable for companies using wind generation as a power source by aiding them in better predicting the wind speed and monitoring power. With accurate wind speed predictions, models can be used to better predict energy generation from wind farms. =** Data Understanding **= The data being used is weather data from Portland, Oregon. It contains hourly weather conditions for the last five years. This is over 54,000 rows of data.

The data includes only five years because it is assumed that climate change, such as global warming, will make a larger range of data less relevant to current weather conditions.

1. ** Date ** – Date of weather conditions 2. ** Time ** – Time that weather conditions were taken 3. ** Temperature ** – Temperature in degrees Fahrenheit 4. ** Dew Point ** - the temperature at which the water vapor in the air becomes saturated and condensation begins 5. ** Humidity ** – The amount of water in the air 6. ** Sea Level Pressure ** - the atmospheric pressure reduced by a formula to the pressure at sea level 7. ** Visibility ** - the distance at which an object or light can be clearly discerned 8. ** Wind Direction ** – The direction the wind is blowing 9. ** Wind Speed­ ** – The speed at which the wind is blowing 10. ** Gust Speed ** – The speed of intermittent bursts of wind 11. ** Precipitation ** – The amount of rainfall since the previous measurement was taken 12. ** Events ** - Meteorological events that are occurring 13. ** Conditions ** – A set of observable weather conditions =** Data Visualization **= Since the data set is extremely large, data visualization helps gain insights into the raw data. A line chart of wind speed and date demonstrates the volatility of wind speed across time. However, a visible pattern can be detected through out the year. The chart indicates that wind speed has a high range of volatility between the end of the year and beginning of the next year, and has less volatility during the summer. The bubble chart below illustrates the relationship between wind speed and time of day. This chart shows no discernible relationship between these two variables. The gap in wind speed between 0 and 3.5 MPH indicates that there is no wind speed less than 3.5 MPH. This might be caused by limitations in the wind measurement device. This chart also illustrates wind speed is in the range of [0, 40]. A bubble plot helps to visualize the relationship between wind speed, humidity and temperature. This chart indicates that wind speed is extremely volatile when humidity is in the range of [60, 100] and temperature is in the range of [40, 60]. Additionally, wind speed tends to be stronger when humidity is low.

A heat chart aids in understanding the relationship between weather conditions and wind speed.The top right of the chart indicates that the strongest wind occurs during the condition of widespread dust. Thunderstorms with snow and snow and rain are also good indicators of strong wind.

=** Hypothesis **= = = Our first hypothesis is that wind speed is predictable from the set of variables collected. This hypothesis is reasonable.Research shows that experts also use the same set of meteorological variables to predict wind speed and power generation. Secondly, since the question “when is there wind?” is significantly important, we hope to derive from the data set a pattern of when it is windy. Lastly, we assume our models will forecast the wind reliably enough to be used by wind energy firms. =** Data Preparation **= In order to accurately model the data, data preparation must be undertaken to clean the data so that it can best be used by varying models – increasing predictive accuracy. Two separate data files were used in this project, one containing weather conditions, including current wind speed, as well as a second data set in which all previous variables were kept, but three additional variables were added, wind speed in two hours, wind speed in six hours, and wind speed in 24 hours. Separate work was done on each of these data sets, although all initial data preparation was the same.

Data which was imported as a string but was in fact a number was converted into a range variable. The unconverted variable was then filtered out. The data was partitioned into training (40%), testing (35%) and validation sets (25%). The data was also normalized using the auto data prep node in SPSS modeler. At this point for the original data set, models were built using wind speed as the predictive variable, and excluding wind direction and wind gusts (as both of those variables presuppose wind speed.)

On the data set containing future wind speed information, all current wind variables were included, and in each predictive model, the appropriate future wind speed variable was used. The other two future wind speed variables were excluded from those models. =** Model Development and Interpretation **= This project seeks to predict wind speed using weather conditions. Different predictive and classification models have been created in order to compare and seek for the highest accuracy that one particular model can deliver. Using current weather conditions, and current wind speed, two classification models were constructed.These were designed to be used in predicting whether or not there will be sufficient wind for wind turbine to generate usable power at particular point of time. The cut-off value for these classification models are determined by cut-in speed. Cut-in speed is the minimum wind speed for which the wind turbine will generate usable power. This wind speed is typically 7 MPH for most turbines. Two different models were developed for this task: a neural network and a logistical regression.

** Neural network model **
Results from these two classification models show very similar levels of accuracy. These models each classify wind speed at three different points of time in the future: 2, 6 and 24 hours. Both of these models yield similar results, however the accuracy of classifying wind speed in next 24 hours, shown in the tables below, is very low. Comparing these models with the selected model (described later), the selected model delivers more accurate results through the prediction of wind speed. Using the selected model, predicted wind speed can then be categorized as below or above the cut-in speed. Current wind data was also used to predict current wind speed. In this scenario, future weather forecasts (as can be found on weather stations or websites) would be used to predict future wind speeds. In this scenario, neural networks predict wind speed with 91.5% accuracy. Actual accuracy of this model is highly dependent on the accuracy of the weather forecasts used, diluting the overall accuracy of the model.

** Selected Models **
To generate a model that predicts future wind speed, the data set containing current weather conditions as well as wind speed in 2 hours, 6 hours, and 24 hours was used. Three separate models were generated, one for each of the three time frames. Data used in generating these models contained current weather conditions, including current wind speed, direction, and gust speed. Each model used the respective future wind speed as the out variable, while excluding the other future wind speed variables. Final models were selected using the Mean Absolute Error to assess the predictive power of the model. It should be noted that the resulting Absolute Mean Error is using the Z-Scored wind prediction.The Absolute Mean Error is calculated as In all of the scenarios, a neural network has the lowest Mean Absolute Error. These models were built using two layers and a stopping point of 1000 cycles. These settings were chosen for their compromise between finding the best model while not waiting through long processing times.


 * Two Hours Ahead Wind Prediction Model **

The neural network had a predicted accuracy of 93.4% and a Mean Absolute Error of .494.This is an average error of 2.657 MPH.Therefore, a predicted wind speed of 9.5 MPH would give a very high probability of enough wind to reach the cut-in speed. As the chart illustrates, this model’s most important variables are current wind speed, precipitation, humidity and sea level pressure. **

Six Hours Ahead Wind Prediction Model **



The neural network had a predicted accuracy of 92.1% and a Mean Absolute Error of .591. This is an average error of 3.182 MPH. In order to better assure reaching the cut-in speed, a predicted wind speed of 10 MPH would be needed. As the chart illustrates, this model’s most important variables are current wind speed, precipitation, sea level pressure and wind direction. In this model, the variable importance of humidity and wind speed are inverted from the two hours ahead model.


 * 24 Hours Ahead Wind Prediction Model **

The best neural network had a predicted accuracy of 90.5% and a Mean Absolute Error of .649. This is 3.888 MPH, indicating that for the next day wind speed predictions, a predicted wind speed of 11 MPH would be ideal to assure the needed cut-in speed in 24 hours. As the chart illustrates, this model’s most important variables are dew point, sea level pressure, wind direction, and wind direction. This indicates that in predicting wind speed further in the future, visible weather factors are no longer as important as pressure bearings.

=** Project Conclusion **= Current power generation using windmills is unpredictable. Some of this unpredictability stems from the unstable nature of wind. The ability to predict wind speed in the future would better equip power companies to monitor and use the power generated. Historical meteorological data from the previous five years was gathered to be used to predict the wind speed in the future. Initial models were built using current wind speed as the predicted variable. This task was achieved, and intended to be used with future weather predictions (as can be found on weather site). It was realized, however, that the model’s accuracy was dependent on the reliability of weather forecasts.

To reduce this variability, additional variables were added to the data which indicated the wind speed at a future time (namely 2 hours in the future, 6 hours in the future, and 24 hours in the future.) Using this new data set, models for each of the three scenarios were created, using all current weather information, including current wind speed, direction, and gust speed. Models were developed which could predict the numeric wind speed. As research indicated that cut-in wind speed of 7 MPH is needed, models predicting a Boolean, namely if it would blow above 7 MPH or not, were also produced. Since the model predicting wind speed also indicates if the wind is blowing over 7 MPH, and was more accurate, this was not the final model we used.

Selected neural networks predict future wind speed with some accuracy. Models developed indicate accuracy for predicting future wind speed of 93% for 2 hours ahead, 92% for 6 hours ahead, and 90% for 24 hours ahead. Not only is wind speed predictable, it can be predicted with surprising accuracy. Important variables in predicting wind speed include wind speed, precipitation, sea level pressure, and humidity. In the model predicting wind speed in 24 hours, two new variables, dew point and wind direction, become important factors in determining wind speed. 

The models developed can be used by current organizations using wind turbines to generate power in predicting future wind speeds and better determining the amount of power that will be generated.