Future+Status+of+Crude+Oil+Price

= //**__Future Status of Oil Price__** (Proposal)// =

// **Prepared by:** Kendra Kennedy, Minh Nguyen and Akkadet Udomsirithamrong //Link to Final Report


 * __Introduction and Purpose__**

Trading futures contracts can yield high returns to investors; however, wherever there are high returns, there are also high risks. Accurately predicting the direction and magnitude a commodity’s price will change often involves major losses if incorrect. Therefore, it is necessary to be confident in this estimate. One of the most commonly traded commodities is crude oil. For our project, our team aims to determine if the oil price will go up or down based on a set of economic and related indicators. If successful, this model will add value to financial investors because they will be able to accurately predict the future status of the oil price for futures contracts.

__**Literature Review**__

Since oil became the main fuel of the global economy in the late 1800’s, economists have been forecasting its price. Most recent studies are centered on the price of oil as a function of the political economy in the Middle East. These studies attempt to capture the percentage of oil price that is not determined by economic indicators such as GDP or the interest rate to illustrate the power of cartels.

One highly developed industry in the econometrics field is predicting the price of commodities and selling this information to futures brokers. Firms use different models with varying levels of accuracy; the team reviewed the results of several firms to gain an insight into which models perform best for this type of analysis and what types of data would be optimal. One study comparing the results of the biggest econometrics firms found that for commodities in volatile markets, predicting the price with a longer time horizon was more accurate; however, for comedies in stable markets, predicting the price with a shorter time horizon was more accurate. The crude oil market is generally classified as volatile because of the uncertainty surrounding the situation in the Middle East and supply shocks from other oil sources. The team will capitalize on this information in our model.

One thing the team should keep in mind when doing our study is to make sure that we understand cause and effect. While general economic conditions certainly affect the price of oil, the price of oil affects economic indicators as well. Many articles argue that fluctuations in the oil price were the cause of recessions, meaning that the oil price could be a leading indicator of an economic downturn; however, results are often not statistically significant because the oil price has multiple relationships with most economic indicator variables. One study examined the price of oil and the eight recessions between WWII and 1983. The author argued that since a dramatic increase in the price of oil preceded seven of the eight recessions, there was more than a random correlation between these variables. The model held constant other leading indicators of recessions to tease out oil’s independent effect. One interesting thing about this study was that the author used both real and nominal values for all the variables and compared the results; surprisingly this did not change their conclusions. Since we are going to make the opposite argument—that oil is a function of what is going on in the economy—we will need to keep in mind that there may be some correlation between the oil price and economic indicators in an opposing causal relationship to the one we are arguing.

Below is a figure taken from one of the literature reviews illustrating the historical relationship between major historical events and the price of oil. It shows that there is usually a jump in the oil price around recessions, meaning that the oil price must be tied to GDP growth. This team will incorporate this insight into our model.

The figure below is a graph of the CPI against time with major political events in the Middle East highlighted. This graph indicates that the US economy is highly responsive to the situation in the Middle East; at all major historical events there are jumps in the CPI. Because the price of oil is certainly related to the political situation in the Middle East, perhaps CPI could be used as a proxy to capture that effect since the graph indicates that these variables are correlated.


 * __Variable Control within the Purpose__**

The team will use monthly historical data from 1980 through 2009 to build the model. The dependent variable will be a flag variable indicating if the average monthly oil price went up or down from the last period (called “status”). The team will derive this variable by lagging the historical crude oil price by one period and using a lookup table to determine if the price increased or decreased. This data was collected from EIA Petroleum Marketing Monthly website.

The model will incorporate a variety of economic indicator variables as well as variables that could indicate the prices of substitutes to crude oil. Specifically, the team will use the price of diesel, the inflation rate , the S&P 500 index , the consumer confidence index , GDP , the federal funds rate , the unemployment rate , national population , the dollar index , the price of corn , FedEx's stock price , and SouthWest Airline's stock price while attempting to model the status of the monthly crude oil price.


 * __Hypothesis__**

The team hypothesizes that the future price of oil will increase with the general trend of economic indicators. However, since this will be a classification model, a lot of the insights gained from modeling the data will be exploratory in nature and will give financial analysts insights on what indicators they should watch when buying futures contracts.


 * __Procedure__**
 * Data collection: the team collected data from websites. The dataset contains 360 records.
 * Change the crude oil price into a dummy variable by first lagging the variable by one period and then using a lookup table to determine if the price went up or down. 1 will mean that the price went up, 0 will mean that it went down.
 * Gain an understanding of the oil industry through literature reviews
 * Visualize the data by computing statistics and viewing histograms of each variable
 * Identify any outliers and research why they are outliers. If the outlier is a data entry error, replace it with the three-sigma control limit value. If the outlier is legitimate, keep it in the dataset.
 * Evaluate initial hypothesis and change it if needed.
 * Data pre-processes: partition the data into 60% training and 40% testing. Transform the variables if needed. Remove any missing values in the data set.
 * Determine variable importance
 * Classify the data under a variety of classification models (K-NN, Naïve Bayes, Logistic Regression, Classification Trees, Neural Nets and Discriminant Analysis).
 * Evaluate the models using accuracy, lift charts, confusion matrices, and mean absolute error.
 * Select the best model based on the above criteria.
 * Interpret model and identify any implications
 * Present findings to other data miners

// **__Diagram__** **__Data Interpretation Plan__** Although we obtained the data from a variety of sources, we were able to find monthly data for each of the variables so there are no missing values. However, the team will still look for data entry errors by examining outliers. //
 * Variable || Type || Direction ||
 * Month || Set || Input ||
 * Change in Crude Oil price || Flag || Output ||
 * Monthly Inflation Rate || Range || Input ||
 * Monthly S&P 500 Index || Range || Input ||
 * Consumer Confidence || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">Consumer Price Index || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">Diesel Price || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">Population || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">Monthly GDP || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">Monthly Federal Funds Rates || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">Monthly Unemployment Rate || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">Monthly Corn Price || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">Monthly Dollar Index || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">FedEx's stock price || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||
 * <span style="color: black; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal;">SouthWest Airline's stock price || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Range || <span style="color: black; display: block; font-family: Verdana,sans-serif; font-size: 10pt; font-weight: normal; text-align: center;">Input ||