Alumni+Donations

**Proposal: Alumni Donations** Prepared by:    Alex Swerzbin, Eric Shin, Rohit Jain **__ Executive Summary __** This project illustrates the process used to better understand important variables in determining, classifying and seeking alumni donations for Willamette University. The Office of Annual Giving provided a data set in excel format for this analysis. The data set used mostly demographic variables with 11,000 rows and 31 variables. This is the current data set they maintain to assist them in seeking donations.

To assist the Office of Annual Giving in seeking donations, two different data mining modeling techniques (predictive & cluster analysis) were utilized. This was done to better understand and to provide insight on the importance of variables currently in use at the university’s Office of Annual Giving, provide ideas and recommendations on variables they should track, and recommend steps to improve their model and data collection strategies.

One of the major insights gained from this analysis is that the variables the Office of Annual Giving tracks are poor in providing insight into the dollar amount to ask for from alumni. After completing this modeling process the best value add we could provide was a set of recommendations for the Office of Annual Giving to move forward with. These recommendations are mostly comprised of ways to track variables and thoughts on which variables to track to enhance the model creation process.

**__ Purpose __** One of the major issues all Universities face is how to ask for donations from their alumni. More specifically, understanding who to ask and how much to ask. Willamette University prides itself in having an excellent relationship with their alumni and receiving quality donations from them. Every year millions of dollars are received in grants and donations to the university, which is a lifeline for the University. Ninety-percent of the students enrolled at Willamette University receive some form of financial aid, which is a huge number. The alumni network, with limited disposable income and various options to donate their money, creates a situation where it is crucial for the university to be careful in who and how much to ask for from alumni.

The ongoing business challenge for the university is ask for the right dollar amount, depending on the individual alumnus’ capacity without violating the strong relationship they have created.

**__ Project Goal __** In order to address the business problem identified above, our team will work to determine and evaluate two things. First the best predictive model to assist in determining what the asking dollar amount to a prospective donor should be. Second perform a cluster analysis to get a better sense of important variables and characteristics of those clusters.

<span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Once we come up with these models, we would compare them to the existing model that the Office of Annual Giving currently uses; provide recommendations on how to improve that model and provide any additional knowledge we gain from this process.

<span style="font-family: 'Times New Roman',Times,serif;">**__ Literature review __** <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">This is not a new problem; the Office of Annual Giving at Willamette University has been facing this issue for years. They are in fact the ones who have provided the dataset to us. The data set contains variables that help determine the annual giving of alumni and has eleven-thousand rows of data. In the data we will have to deal with a lot of null values and demographic variables. We will also face the challenge of poor variable selection currently tracked by the Office of Annual Giving? This project can add value in providing the Office of Annual Giving insights and recommendations on better variables to track moving forward so that they may further refine the model we present to them.

<span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">To perform this project we will utilize tools such as XLMiner and SPSS modeler to create our model and analyze the situation. Our model will be finalized in SPSS Modeler. Our team will create a predictive model based on numerous prediction techniques (MLR, K-NN, Regression trees, Neural Nets) with the potential to dive further into our understanding of the issue with classification techniques (Classification trees, Neural Nets, Naïve Bayes).

<span style="font-family: 'Times New Roman',Times,serif;">**__ Current Model __** <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">The Office of Annual Giving currently has a model in place for determining the right ask amount from the donors. After getting the first donation from a donor, the Office of Annual Giving uses a formula to ask for money in the coming years. The first donation is totally dependent on donors and no asking amount is provided by the office of annual giving. <span style="font-family: 'Times New Roman',Times,serif;"> <span style="font-family: 'Times New Roman',Times,serif; font-size: 16px; line-height: 24px;">If the last gift is more than $100 and less than $2499, then they use the following criteria:

<span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif; font-size: 16px; line-height: 24px;">If the last gift is more than $ 2500 and under $ 3333, then they us the following criteria: <span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">The Office of Annual Giving also has other criterions. For instance, if a donor is categorized into upper management, the ask amounts are then calculated with different formulas. (To date, we have not yet received information on this method.)

<span style="font-family: 'Times New Roman',Times,serif; line-height: 150%; margin-bottom: 0in;">**__ Data Understanding __** <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">The current project will focus on determining what dollar amount could be asked from prospective or existing donors, based on the following variables:

<span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif;">**__ Hypothesis __** <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Our team hypothesizes that the right dollar amount to ask from donors would be dependent mainly on predictors (Job Title, school of graduation and previous donations). However, since these will be predictive and classifying models, a lot of the insights gained from modeling the data will be exploratory in nature and will give analysts insights on what indicators they should consider when deciding the right dollar amount to ask from alumni.

<span style="font-family: 'Times New Roman',Times,serif;">**__ Project Procedure __**
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Data collection: the team collected data from the Office of Annual Giving. The dataset contains 11,000 rows of data. The dataset contains 7 variables to determine the annual giving of an alumnus.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Gain an understanding of the current model that is being used by the Office of Annual Giving to determine the right dollar amount to ask from alumni.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Visualize the data by computing statistics and viewing histograms of each variable.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Identify outliers and research why they are outliers. If the outlier is a data entry error, replace it with the three-sigma control limit value. If the outlier is legitimate, keep it in the dataset.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Evaluate initial hypothesis and change it if needed.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Data pre-processes: partition the data into 40% training and 40% testing and 20% validation. Transform the variables if needed. Remove any missing values in the data set.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Determine variable importance.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Classify the data under a variety of Prediction Models.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Evaluate the models using accuracy, lift charts, confusion matrices, and mean absolute error.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Select the best model based on the above criteria.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Interpret model and identify any implications.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Compare the model with the model being used by the annual giving staff.
 * <span style="color: black; font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Present findings to other peers and the Office of Annual Giving staff

<span style="font-family: 'Times New Roman',Times,serif; line-height: 150%;">**__ Data Preparation __** <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">In order to accurately create models from the excel data received from the Office of Annual Giving; a certain level of data preparation was necessary to clean the data. The first step involved eliminating redundancy within the data (example reunion class, spouse reunion class, spouse gender). Another variable to be cleaned was the graduate school attended by alumni; we had been given the year in which they had graduated and from which school. This was presented to us in 4 different rows with a lot of null values. We converted those values to categorical values (ordinal) assigning numerical values to different schools (1 = college of liberal arts, 2 = law, 3 = AGSM). Next we adjusted whether his/her spouse is an alum (0 = no, 1 = yes). This project looked at two separate techniques, predictive and cluster analysis; both utilized the same data set and data prep methods. <span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Before cleaning the data | After cleaning the data <span style="font-family: 'Times New Roman',Times,serif; line-height: 150%;">**__ Model Development __** <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">We tried to follow the process of modeling that had been described in the class. First we performed a data audit and found a small number of outliers and extremes in the data, which were discarded and a super node was generated to improve model effectiveness. We had enough rows of data to allow us to discard the outliers and extremes. The next step was to flag variables which were nominal in nature; we used a “Set to Flag” node for VAR.COUPLE.LAST.DESIGN.DESC variable, which stated which department alumni donated money to. We did not convert gender as it was already flagged. <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Once we had flagged the required variables the next step was to use auto data prep to standardize required data for improved performance of models. <span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Once the data was standardized, due to a large number of variables, we decided to use feature selection to reduce the number of variables for the analysis. We also reduced the number of variables from 41 to 14. In order ensure that the model would not over-fit the data we created a 60 – 40 partition in the data set; 60% being training and 40% testing.

<span style="font-family: 'Times New Roman',Times,serif;">**__ Results __** <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Predictive model: We tried a variety of predictive models like Regression analysis, Chaid analysis, Neural set analysis, C&R analysis, but because the target field was numerical in nature we could not use logistic regression. All the models yielded poor results, in terms of accuracy and adjusted R-square value. One of the models we ran was a linear regression model. The best R-Square we could get for the model 0.232, which shows how poor the model is. <span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif; font-size: 16px; line-height: 24px;">The lift charts re-emphasized our results; that models predicting the life time donations are very poor. <span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif; font-size: 16px; line-height: 24px;">Cluster model: The clustering model was used to understand if we could gain insight in patterns, in regards to clusters among donor groups based on demographics and historical gifts and donations. First we used the 2-Step clustering model because this model self-selects the number of clusters without the use having to input them. It gave us 2 clusters with a fairly good cluster quality. The larger cluster (i.e. cluster 1) was almost 89% in size and was comprised mostly of CLA graduates who were males. To confirm the cluster quality, we used the K-means clustering model in which the user specifies the number of clusters. We ran the model specifying 2 clusters and the resulting model yielded similar results with a cluster quality of more than 0.5, which is considered to be good.

<span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Once the clusters were formed, we wanted to understand how alumni of different graduate schools gave as lifetime donations. This was to provide insight to the Office of Annual Giving to help them better allocate their resources accordingly. So we did a scatter plot with “Lifetime Donation” on the Y axis and “Graduate School Attended” on the X axis. <span style="font-family: 'Times New Roman',Times,serif;">

<span style="font-family: 'Times New Roman',Times,serif; font-size: 16px; line-height: 24px;">The resulting scatter plot showed that the maximum donation was given by College of Liberal Art graduates, which was no surprise as the number of undergraduates graduating each year is much higher than the law, business or education school graduates. <span style="font-family: 'Times New Roman',Times,serif;">**__ Conclusion & Recommendation __** <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">The problem of understanding who to ask, how much to ask and why is a difficult task for any university. After conducting our analysis using both methods we have found that the data you collect is just as important, if not more important, than the way you analyze it. By undertaking this project and understanding the process behind annual donation seeking, we found that demographic variables alone do not provide very good insight. From this we have come up with a set of recommendations for the Office of Annual Giving to improve their data collection and process:
 * <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">First, invest more time and energy in building relationships with Willamette University’s College of Liberal Arts alumni. Our cluster analysis revealed that they donate more and more frequently compared to alumni from other schools at Willamette. This makes sense because these alumni, in most cases, have spent more time at Willamette University and do not have other undergraduate universities asking for donations as well.
 * <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">In our analysis, we also found that historical giving was a strong indicator of future giving or aggregate giving. Because of this the Office of Annual Giving should track all donations not just the past five fiscal years, if possible. They should continue to track future giving and maintain that data in their continuing data collection for future analysis. By doing this they will be able to better understand how much to ask for in the future at certain points in time.
 * <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">They should begin classifying job titles as categorical variables to allow for more ease in future analysis. One simple way to do it would be classify all jobs in 4 different categories entry level jobs, managerial jobs, senior level jobs, entrepreneurs or self-employed. This will provide them with the ability to understand how job position plays a part in a donors ability to give, making it easier for the Office of Annual Giving to understand again how much to ask and who to build strong relationships with.
 * <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">The Office of Annual Giving should begin tracking preferred methods of communication from alumni, whether it is email, mail, telephone calls or personal visits. Response rate should be measured if preferred method is not indicated by the alumni. This would provide insight into behavioral actions associated with giving donations and could provide efficiency of time in future donation seeking for the staff of the office of annual giving.
 * <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Track the behavior of alumni in terms of Willamette University function attendance. Knowing which alumni continues to be active as an alumnus would provide potential insight into who would be a good candidate to ask large amounts from.
 * <span style="font-family: 'Times New Roman',Times,serif; font-size: 12pt; line-height: 150%;">Lastly create an alumni satisfaction survey to gain a better understanding of who is pleased or not with Willamette University. This would provide a quantified behavioral data-set, which is better in creating both types of models.