Unemployment+Factors+and+Prediction

 **Unemployment Factors and Prediction ** Final Project Report for GSM 672: Data Mining

 Team E: Mohammad Al Fayez, Sathana Valli Sathya Moorthi, Shirley Rodriguez    Spring 2011

[|Link to project Modeler's stream file.] ==**1. Executive Summary** == This data mining project aims to build a model to predict the unemployment rate. Through this analysis, the model will also identify the most important predictors for understanding this economical variable. Investors, government and citizens in general, might find this tool relevant for important strategic decisions. The project’s proposal presents a brief introduction and an exhaustive literature review, which backs up the importance for predicting the unemployment rate for future decisions. Moreover, the proposal also presents a detailed description of other economical variables that were considered to build and train the model. Furthermore, the process of gathering and processing the data, as well as a visualization analysis is also provided throughout this paper. After analyzing different alternatives, it was found that the Regression and CHAID models are the most appropriate for the predicting objectives of this project. After choosing the model, our goal was to increase the accuracy and relate the predictors in order of their importance to the unemployment rate. The model indicated AUD, GBP, Real PPI Commodities, Real Gold Price and Real IEPI Commodities as the most important predictors. Furthermore, an extra clustering analysis was conducted using the Two-steps model. This model was found to be most adequate for recognizing three clusters of independent variables with common influence in predicting the unemployment rate.

==**2. Data Understanding** == To achieve the goal of this project several economical variables had to be analyzed and interpreted to predict unemployment rate. During the analysis some of those variables may emerge to be more relevant and correlated to unemployment. To better understand the data the following table lists all variables used to predict unemployment with a brief explanation of what that variable represent and the source for the data:  Source: [|Board of Governors of the Federal Reserve System] ||  Source: [|Board of Governors of the Federal Reserve System] ||  Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|Board of Governors of the Federal Reserve System] || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|Board of Governors of the Federal Reserve System] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Variable || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Type || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Frequency || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Description ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Unemployment rate || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">% || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The dependent variable that will be predicted. The use of unemployment rate instead of unemployment is more relevant in this case since the model uses historical data. And using the rate instead of the actual number would be much relevant since we eliminate the population growth factor.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|Bureau of Labor Statistics] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">CPI || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Consumer Price Index (CPI) is a measure of prices paid by consumers and it is used to calculate inflation.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Bureau of Labor Statistics] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Federal (FLSA) || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">When updated || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The US federal minimum wages. This variable is in US dollar per hour.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Department of Labor] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Gold Price || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The gold price can be an indication of the consumer confidence and the overall economic condition.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|Gold Information Network] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Oil Future Contracts || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Crude Oil is the most important source of energy and an increase in its price can harm the speed of the economical growth and therefore employment. This variable is for average crude oil future contracts in U.S. dollar per barrel.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Energy Information Administration] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">IEPI || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Import and Export Price Index (IEPI) is a monthly index that measures the average prices of U.S. imported and exported goods and services. Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Bureau of Labor Statistics] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">IEPI Commodities || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Is also an import and export price index for commodities products only.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Bureau of Labor Statistics] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">PPI Commodities || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Producer Price Index (PPI) is another index that measures the average prices received by US producers in exchange for their finished commodities products.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Bureau of Labor Statistics] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">PPI Food || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Is another Producer Price Index (PPI) that measures the average prices received by US producers for their finished food products only.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Bureau of Labor Statistics] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">PPI Energy || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Similar to the above but for finished energy products only.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Bureau of Labor Statistics] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Change in real GDP || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">% || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Quarterly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Gross Domestic Products (GDP) is the value of all the goods and services produced by a country during a given period. This predictor is the percentage change in real (adjusted for inflation) U.S. gross domestic product (GDP) from previous quarter.Source: <span style="background-color: transparent; color: #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">[|U.S. Bureau of Economic Analysis] ||
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">AUD || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The value of U.S. dollar expressed in Australian's Dollar. How much one U.S. dollar worth of Australian’s Dollar.
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">GBP || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">An equivalent to one U.S. dollar in British Pound sterling.
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">JPY || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">An equivalent to one U.S. dollar in Japanese Yen.
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">CAD || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Number || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Monthly || <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">An equivalent to one U.S. dollar in Canadian dollar.

==<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 18pt; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">**3. Data Visualization** ==

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">After collecting the data, the team analyzed some of the variables to understand the relation between them. One interesting graph is the relation between oil prices, gold prices and unemployment. As shown in the graph, there seems to be some sort of relation between unemployment and real (adjusted for inflation) gold prices. The unemployment rate was 9% or higher when gold price exceeded $900. In the other hand, unemployment seems to be less correlated with oil prices.

Another interesting graph is the relation between U.S. dollar value and unemployment rate. The next graph shows the relation between unemployment and U.S. dollar value expressed in two major currencies, Japanese Yen (JPY) and Australian Dollar (AUD). The unemployment rate increases when U.S. dollar value increases. One explanation for that could be that an increase in currency value can harm the U.S. exportation and therefore increase unemployment.

==

==

===<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 14pt; font-style: normal; margin-bottom: 0pt; margin-top: 0pt; text-decoration: none; text-indent: 36pt; vertical-align: baseline; white-space: pre-wrap;">**4.1 Data Preparation** ===

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The variables listed under data understanding section represent the raw data. Some of those variables need to be adjusted and some other important variables can be derived from them. Before proceeding to build the model, the data need to be standardized and cleaned. The following steps were used to prepare the data: <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> The first step after getting the data was to make sure that all variables follow the same frequency as the unemployment rate that need to be predicted (Monthly in this case since unemployment numbers are released monthly). In case of missing data, i.e. if only yearly or quarterly data is available, the data was duplicated to fill in the gap to the next available data. <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> The second step after standardizing all variables frequencies, all variables was merged into one dataset. <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> Next after merging all variables, adjust the variables that uses prices, e.g. gold price, for inflation. This step is important because the value of U.S. dollar 10 years ago is not the same as it is today; and to be able to accurately measure the effect of prices in unemployment rates those prices should be expressed using the same value. To do that all prices need to be expressed in today’s U.S. dollar value, so the last period available was used as a base to calculate the real prices. The formula used to calculate real prices is [ The price / (CPI / Base month CPI) ]. The following variables were adjusted:
 * 1) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Federal (FLSA): the minimum wages
 * 2) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">IEPI: Import and Export Price Index
 * 3) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">IEPI Commodities: Import and Export Price Index for Commodities products
 * 4) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">PPI Commodities: Producer Price Index for finished commodities products
 * 5) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">PPI Food: Producer Price Index for finished food products
 * 6) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">PPI Energy: Producer Price Index for finished energy products
 * 7) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Gold Price
 * 8) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Oil Price

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">In addition to those variables, the inflation rate was calculated to be used instead of the Consumer Price Index (CPI). The formula used to calculate monthly inflation rate is [ CPI for the month / CPI for previous month ]. <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> The final step in data preparation was to filter out the original variables which were used to calculate the above variables.

===<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 14pt; font-style: normal; margin-bottom: 0pt; margin-top: 0pt; text-decoration: none; text-indent: 36pt; vertical-align: baseline; white-space: pre-wrap;">**4.2 Modeling** ===

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Once the data was cleaned with merging and data processing it became readily available for modelling and consequently solving the business problem.The data was partitioned, getting ready for analysis. As a result we had 15 cleaned predictors, which were possibly related to the unemployment rate.

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">1) To find out the importance of each predictor, we ran a feature selection and understood that some of these factors were important while others were marginally important and the rest were least important.

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">2) <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">__The important fields were :__
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">CAD : Canadian dollar
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">AUD : Australian dollar
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real_Gold price : The gold price at the moment
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real_Federal (FLSA)
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real PPI energy transformed

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">__Marginal important fields were :__
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> Change in real GDP
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">GBP
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">JPY

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">__Unimportant field not considered were :__
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real crude oil future contract
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Inflation
 * <span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real_PPI food transformed

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">3) After getting this input ,we allowed SPSS modeler to suggest the most accurate models. As a result, the software found out that the linear regression and CHAID models have the best accuracy in predicting.

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">4) The number of fields used in CHAID model were twelve and regression model were fourteen. Some of the fields were weak as they were highly co-related resulting in redundancy of findings (multicolinearity).

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">5) An analysis of testing and validation data was conducted and it was found zero error and minimum of 0.245 error in testing data, which is appropriate since the testing data would not be trained with our data set. The error is minimum and acceptable, indicating a good model.

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">6) After this step, clustering analysis was conducted to categorize predictors and understand the association of factors within the different clusters. K-means and Two-steps models were chosen as the most appropriate.

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">7) K-means resulted in five clusters making identification more complex. Two-steps model provided 3 clusters, giving an advantage in identifying those easily. Interpretation and analysis can be found in the next sections.

==<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 18pt; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">**5. Model Interpretation** ==

===<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 14pt; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">**Clustering** ===

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The Auto Classifier function was used to get the best clustering model to identify the main clusters of predictors for the unemployment rate.

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Based on these results, K-means was found as the most appropriate. This clustering model brings a silhouette greater than 0.5 which indicates a good quality and identifies five clusters in total. However, our analysis has been based on the Two-steps model because it provides a more accurate way to label the three different clusters it identifies. This model still brings a good clustering quality with a good average silhouette of 0.5.

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The information obtained with the Two-steps model is as follows:



<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Then, these three clusters can be named as follows: <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">__Cluster #1__ <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">: <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">//Energy// <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">This cluster is labeled ‘Energy’ since the Real_PPI Energy_transformed and Real_Oil Future Contract_transformed predictors have the highest medians; 1.43 and 1.24 respectively. Moreover, the unemployment rate within this cluster has a median of 5.11. These information can be found in the box plot analysis.

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">__Cluster #2__ <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">: <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">//Exchange rates// <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">In this case, the predictors CAD_transformed and AUD_transformed present the highest medians in the box plot analysis: 1.35 and 1.07, respectively. Within this cluster, the unemployment rate presents a median of 5.50.



<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;">__Cluster #3__ <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">: <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">//Export and Import// <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">This cluster’s predictors with the highest medians in the box plot analysis are the Real_IEPI Commodities_transformed and Real_IEPI_transformed. Their medians are 1.45 and 1.33, respectively. <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> Moreover, the unemployment rate within this cluster has a median of 6.30.

===<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 14pt; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">**Predicting** ===

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">In order to find the best prediction model, the team used the Auto Numeric function. This analysis identified the CHAID and Regression models as the ones with the highest correlation of 0.937 and 0.924, respectively.

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Validation data was used to confirm accuracy of the models. It is presented as follows:
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Actual Unemployment Rate || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Predicted Unemployment Rate || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Error ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">9.7 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">9.504399 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.097801 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">9.4 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">9.12446 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.11223 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">5.1 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">6.309502 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">1.345249 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">4.4 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">4.781327 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.090663 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">4.5 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">4.934281 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.16714 ||

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">We suggest using either of these models to predict the unemployment rate due to their high correlation value and its low relative error of 0.13 and 0.15 for CHAID and Regression, respectively.

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The regression model’s advanced analysis provides easier identification of the most important predictors for the unemployment rate. This model brings a R-square of .870 and an Adjusted R-square of .846, which is statistically good.

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The following table presents those predictors, stressing its high significance level (p-value<0.05) and the high correlation with the dependent variable (Unemployment rate):
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; vertical-align: baseline; white-space: pre-wrap;">__**Predictor**__ || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; vertical-align: baseline; white-space: pre-wrap;">__**P-value**__ || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; vertical-align: baseline; white-space: pre-wrap;">__**Standardized Coefficients**__ ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real_PPI Energy_transformed || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.00 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">-1.412 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real_IEPI Commodities_transformed || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.00 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">-2.452 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real_PPI Commodities_transformed || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.00 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">2.133 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Real_Gold Price_transformed || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.00 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">.923 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">AUD_transformed || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.10 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">-.588 ||
 * <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">GBP_transformed || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">0.20 || <span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">.229 ||

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">From this chart we can see that if the AUD_transformed, Real_PPI Energy_transformed or Real_IEPI Commodities_transformed goes up, then the unemployment rate is affected in the opposite direction: our dependent variable under analysis goes down.

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">On the other hand, based on the correlation level, we can determine that the Real_PPI Commodities_transformed, Real_Gold Price_transformed and GBP_transformed have a direct impact on the unemployment rate. If the predictor goes up, then the unemployment rate also goes up.

<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">For people with strong analytical skills, we would also suggest using the CHAID model to go a little deeper in predicting such dependent variable. The 5 levels-tree built by the CHAID (Chi-squared Automatic Interaction Detector) model, can be found in the exhibit #1.

==<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 18pt; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">**6. Conclusion** ==

<span style="background-color: transparent; color: #000000; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">After conducting the analysis, we have concluded that this model can be used in the future with new data feed for gold price ,oil price etc., as it presents a high accuracy and low relative error. It is important to take into consideration, that the model is prepared to handle any data as it is trained to standardize it automatically. Though this model has a high accuracy level in predicting unemployment rate we cannot be fully confident about the model performance at all instances. The model can be enhanced by adding some predictors such as social elements and some other economical factors.

==<span style="background-color: transparent; color: #000000; font-family: Calibri; font-size: 18pt; font-style: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">**Exhibit #1: CHAID 1** ==