GDP+Forecast+Final

Willamette University - April 2011  = GDP Forecast Project =

//By Annelise Hagmann and Jérémy Faure// //GSM672 – Pr. Paul Dwyer.//


 * Introduction and Literature Review**

Economists use historical data to determine a country’s Gross Domestic Product (GDP) every year. This is a measure of how healthy a particular country’s economy is and is used to rank countries. Countries with higher GDPs are growing markets with lots of opportunities for investments and business ventures. The annual GDP is an economic tracking device that also influences political and economic decisions for the future. Since this measure is used for many different strategic decisions, is there a way to predict a country’s GDP based on past historical data. For example, when will China and other emerging markets overtake the US and other developed countries? At PwC, they use historical data to predict future country ranks through GDP analysis. They study the “rate at which poorer countries can catch up to the more advanced technologies used in developed nations”//[1]////. They have tried to predict over 50 years but our purpose is to have shorter-term goals and not predict so far out in the future. We could use their data as a means of testing our own prediction models to see how accurate they are.//



//GDP projections to 2050: how the rankings change. //[2]

How can we predict if a country is on the right economic track? That is to say in a world that is flattening [3], the purpose of this study is to see if it is possible to predict with a good accuracy [4] the GDP of one country based on several other indicators gathered by the World Bank for 213 countries over 1157 variable series covering the time period between 1960 and 2007.
 * **Purpose**

We selected the GDP as the indicator that would reflect the condition of leaving. We do in advance acknowledge that this may not be the best-aggregated indicator to indicate such thing and thus other aggregated indicators may better reflect the condition of leaving, like for instance the Gini coefficient, the Physical Quality of Life Index. However, based on the data we were able to collect the GDP appears to be the best to suit the study.
 * **Hypothesis**

We collected data from the World Bank data catalog which is downloadable on their website. The dataset we selected focus on 1157 variables ranging from major financial indicators to non-financial (forest area, average precipitation, time required to start a business) over 40 years for 213 countries. In order to assess the causes of GDP we looked at the variables that would increases or decrease GDP caluclations directly based off the equation GDP= C (consumption) + G (government spending) + I (investment) + NX (net exports/imports).
 * **Data collection**

Being able to predict GDP can help determine the future and see if the country is “on the right track”. Although we believe that financial variables are not the only issues that should determine the standard of living, it is still a good measure. We also decided to look at what the results would look like between “hard” (economic factors, financial, trade) and “soft” (CO2 consumption, mobile phone use) variables.
 * **Procedure: Business Understanding**


 * Data Understanding and Data Preparation**

Before using the variables in the model, they will be evaluated to see if they are relevant and can be used. Data provided by the World Bank covers: social, economic, financial, natural resources, infrastructure, governance and environmental indicators. This should enable us to have a broad variety of data and help us achieve our goal to have a comprehensive model while not using only financial variables, although our ultimate measurement is the GDP. We are using the GDP constant 2000 US$ in order to avoid the curreny exchange rate challenge. As we have a great amount of variables (1157), some may already include/serve to calculate GDP and that some may be redundant or do not have enough information over time (internet connectivity for instance), the data will be cleaned manually for that purpose. Unnecessary data, meaning outliers and extreme, will be cleaned through the data preparation feature available within SPSS Modeler. Using the PCA feature on SPSS we will be able to combine important variables that load similarly to reduce our data size and see which variables are useful for predictions. We used PCA factor anaylsis based off one country to narrow down the data into the most important variables to make the data more manageable. After the PCA analysis we found that many of the varaibles loaded heavily on similar factors so we were able to narrow down the data assuming that these variables would behave the same. With the first 5 factors we were able to explain 92% of the variance.

//Flowchart : PCA analysis in SPSS Modeler //

//PCA table retrieved from SPSS Modeler //


 * Modeling**

After running the PCA analysis we were able to narrow down the variable field to 20 variables that described both the cause of GDP and factors that would accompany it. We used both the neural networks and the linear regression to find the best model to evaluate the data points that are related to causing the GDP of the selected countries (US, China, Brazil, India, Germany, and Morocco). We originally had gross national expenditure as a variable but when we ran the system we saw a 100% correlation and realized that gross national expenditure was not a good predictor because it is directly correlated. We realized that we had made an error because the results were odd.





We adjusted our data to account for our error and after comparing the neural net and the linear model, we found that the linear regression model had the best accurancy rate with 85.5%.







The other data sets that accompanied GDP were then used in doing another linear regression to see the difference between “hard” and “soft” factors.




 * Results**

GDP seems to be pretty linear if there is no major change in the world that would affect the findings it will be smooth based on our model. But when we tried to use data to predict years that were not in the model to compare the results with what really happened, we found a huge difference on average between our model and reality. This would suggest that even though the model implies that it has a high accuracy rate, when we try to use it as a real predictor of future GDP the results are quite different. Our hope had been that this model would be able to act as a forecasting model when in reality this does not appear to be the case. One interesting finding that we did discover was that with the “soft” predictors, the most important predictor of GDP was electricity production. This does not come as a surprise considering that the more electricity that a country produces, the more likely that it is an advanced nation with a high amount of trade and GDP. Although the GDP is only an economic and financial measure of the relative “health” of a nation, we saw that non-economic predictors can also be used to assess the well-being and GDP of a country.

[] [] Excel spreadsheet - data cleaned for PCA Excel spreadsheet - data cleaned for 6 countries Excel spreadsheet - evaluation data set 1 year Excel spreadsheet - evaluation data set 5 years SPSS Modeler file - project stream (PCA+Linear)
 * Links:**

//[1] “GDP Projections from PwC: how China, India, and Brazil will overtake the West by 2050”, Larry Elliot. Jan 7th, 2011. www.guardian.co.uk/news/datablog/2011/jan/07/gdp-projections-china-us-uk-brazil// //[2] ID.// //[3] « The World is Flat », Friedman// //[4] Above 95%//