Willamette+Athlete+Group+Report


 * Willamette Athletics Group **


 * Introduction **

At Willamette University, sports are a major part of campus life. As Willamette University is a Division III school, it does not allow students to receive athletic scholarships. Due to the lack of scholarships, the athletic department wanted to evaluate how well they have used their resources in recruiting athletes to come play for the University. The University also wants to evaluate how successful recruiting has been over the last four years and what elements play important roles in their recruiting. Willamette has asked our group to perform an analysis on the data they provided to supply insight to their current recruiting activities.

//Hypothesis: //Our goal for this project is to be able to identify predictors that are relevant in the current recruiting processes at Willamette. Our hypothesis is: due to the lack of range data, it is unlikely that the data analysis will discover findings that will improve the success rate of recruiting at Willamette. Using market basket analysis there is a hope to derive the major contributing factors to what recruits enroll at Willamette.


 * Enterprise **

The purpose of this project is to determine the efficiencies in the athletic recruiting process at Willamette University. Currently, there are no standards or procedures that have been implemented to show whether the university’s recruiting efforts have been effective or not. This project will help them to define success in terms of their recruiting efforts since it has yet to be done. It is important to the department since athletics play an important role in attracting new students. Developing winning teams will increase the notoriety of the school and bring more athletic students to Willamette University. In order to do this, the recruiters must know what traits the most successful athletes possess. There has not been significant research into this area in the past for Division III recruiting, so there is no current measure of success. Also, the current data set does not allow for a full analysis, so it is vital that a focus be placed on determining if the recruiting is effective and how it can be improved from the information provided.

Research done was to evaluate if other studies existed on this topic to help determine how to define success for the project. Unfortunately, when searching for literature on this topic, there was none. In order to become better informed on the recruiting process, interviews have been conducted with several coaches in the athletic department at Willamette University as to the current success in their efforts.

The result is very mixed reviews. For example, the baseball coach has said that he would like to get a recruiting class of 15 to enroll this year. In order to do that he needs to have 60 kids admitted, which means 80 have to apply. In order to get the applicants he is currently dealing with an active recruiting list of 130-160 prospective students //[1]// //. It is successful if he gets the target number of kids but the real question is how effective or efficient is it. From the number perspective, it looks like a success rate of about 10%. Tennis////[2]// and men’s soccer //[3]// //also do not think that their recruiting process is working as it should. On the other hand, there are programs in the athletic department that believe they are successful, such as// football//[4]//, enrolling up to 50 prospects a year, and track and field //[5]// //, enrolling up to 100 prospects on their good years.

Another source for outside information was looking at the winning percentage over the last four years for the sports that keep track of win loss records. By looking at this information, it might provide with another predictor for the success rate. While it is not technically literature, it does paint a picture of where the different areas of the department currently stand. The data provide focused on baseball, soccer, football, crew, golf, basketball, track, cross country, swimming, and tennis. This data was cleaned before applying the market basket modeling method to the data. It is important to define success clearly for this project and then analyze the predictors in data to see if there is a correlation.
 * Data Understanding **

Willamette University provided a dataset that involved recruits from the last four years for every sport. From this dataset, the following information was provided:
 * Variable || Description ||
 * Sport and Rank || Two variables combined into one, containing both the sport an individual participated in and the rank the coach assigned to each individual. ||
 * Start Term || Contains the year and term the individual started. ||
 * Fr/Tr || Type of student either transfer or freshman. ||
 * App.Stat || The application status. ||
 * Deposited? || Whether or not the student put a deposit or not. ||
 * Name || The individuals’ name. ||
 * Ethnic || The ethnicity of the individual. ||
 * Gender || The gender of the individual. ||

ETHNICITY CODE
 * 10, 11, 12 Asian American
 * 20 African American
 * 30 Native American
 * 40-49 Hispanic American
 * 50 Caucasian
 * 60 Foreign Student (non U.S. citizen)
 * 80 Other (they said something other than an ethnicity--things like Irish/Norwegian)
 * 81 Multiracial
 * 99 No info/unknown

APPLICATION STATUSES
 * AD-Admitted Regular
 * AW-Admitted from the Waitlist
 * ND-Admitted Non-degree
 * WA-Withdrawn Admitted
 * WW-Withdrawn from the Waitlist after being admitted
 * WP-Withdrawn deposited student
 * DF-Deferred (withdrawn with the intention of coming in a later year)
 * MS-Moved to Student (admitted, paid and enrolling)
 * WB-Withdrawn before decision (regular)
 * BW-Withdrawn before decision (Waitlist)
 * DW-Denied from the Waitlist
 * DN-Denied Regular
 * HD-Hold for additional info
 * IF -Incomplete file
 * CP- Complete File

Looking at the data, it seems that the most important variables to the analysis are both sport and rank. The application status is the output, and is simplified to those that are enrolled and those that are not. It is easy to see that there was a great lacking in the range of data. Because of the quality and type of data received, all analysis would have to be compatible with having a majority of the inputs be categorical. More information was pulled by reorganizing and cleaning the data that was provided. This proved to be helpful in looking at the success of the recruiting in the department.

By looking closer at the data set that was provided by Willamette University, there are several things that can be viewed almost instantly:

Market Recruiting success by sport and overall

As shown in the table above, the entire department’s success rate is about 32.4%. That would mean that the athletic department is currently semi-successful but with a lot of room for improvement. In order to determine the success of individual sports, the benchmark of the overall department was used. The sports that are at the same success rate, or higher, are successful. This project looks into what elements make the department considered a success.

Looking at this data, there are a lot of assumptions that were made to go along with the analysis. One assumption made was that the people, who enrolled in school at Willamette, also went on to play the sport that they were recruited for. Therefore, if they were enrolled, they were considered a “success” in the analysis. Another big assumption, that may not hold true in all instances, is anyone recruited adds some value to the team they are playing for. Also, the rank associated with the recruit represents the statistical value of the recruit. This plays into the assumption that coaches always make an educated decision when deciding who to recruit and in ranking the recruits.


 * Data Manipulation **

The data set that was provided had several issues. First the sport and rank variable were two separate variables that had been combined into one. To make the data useable, the sport and rank variable needed to be split into separate variables. This problem was fixed by separating each variable into its own column with the separation being done through a series of “if…then” statements (See Exhibit A). Also, the application status variable had more information than was needed with fourteen different types of admittance. It is not important how an individual was enrolled or not, only whether or not they came to Willamette University. To fix this problem, a new variable was created that reflected only if the individual enrolled or not (See Exhibit B). Finally, the winning percentage variable was added that contained the winning percentage for the sport that the individual would be participating in from the year before.


 * Modeling **

After viewing our data, it was decided that using the following models would be best in predicting a recruit acceptance at Willamette University: ·  Neural Networks ·  Clustering (Two-Step and K-Means) ·  Market Basket Analysis (Apriori)

The decision to use neural networks was made to try and mimic the decision process used by a recruiter. An issue with this method that was discussed before continuing with this model was a non-uniformed approach by every individual recruiter. Because of this issue, minimal changes were made to the model for fear of adding more error from differences in individual tactics. There was a partitioning of data into training, validation and testing sets for this model and targeting result for the status of the applications. Regardless, this model resulted in being 100% wrong, thus has no merit in being further discussed.

While discussing Clustering, there were two methods applied to the data. The first method was the Two-Step method, where the algorithm would be allowed to choose the number of clusters. The second method was K-Means, where two clusters were predefined. For the model with the limited source of data, outliers were included as they may be the people the university is looking for. To check the validity of the models, it is imperative to look at a comparison of the data as well as a table for all the accepted applications to see which cluster contains the successful recruits.

The final model utilized was the Market Basket Analysis (Apriori), targeting students with an accepted application status to see common themes that make a successful recruit. To reduce the amount of noise, the popular sports such as men’s basketball, women’s soccer, track and football will be separated and run individually. There is a confidence rate of 30 percent and results which give a lift greater than one. The majority of data provided for this project by Willamette University was in the form of categorical identifiers. With the lack of ranged variables, many of the other methods were unable to be used and therefore these classifying models which work well with limited categorical variables were chosen.

**Data Analysis**

**__Clustering__**

Several different clustering modes were used and evaluated; Two-Step and K-Means. K-Mean was set to provide two different clusters. However, these clusters did not represent whether or not an individual would enroll at Willamette University. In the Two-Step model, there is no control over the number of clusters. With the data used, only two clusters were formed. Like the clusters created by K-Means, these clusters did not reflect if an individual would enrolled or not.


 * __Overall Department-Market Basket__**

Market basket analysis results the overall department

When looking at the overall analysis of our data for Division III sports recruiting, it was immediately noticed that there was some overlapping data that needed to be discarded. For general sports, a ranking of 1 shows the best recruiting, especially males with a ranking of 1. Females with a ranking of 1 are also successfully recruited. For individual sports, the results show that track is the most successful at recruiting. There are several factors that stand out when trying to analyze how track is the most successful. The traits are male, a ranking of 1, Caucasian, and a freshman transfer. The overall data only truly represents track and football with the ranking of 1 or 3. The lift goes below when looking at data below rank 1.
 * __Overall Department w/o Football and Track-Market Basket__**

Market basket analysis results the overall department without Football and Track

Track and Football together make up approximately half of the data set thus greatly skewing the overall results of the market basket analysis. In order to get a better look at the rest of the department, an overall department market basket analysis was run again after taking out Football and Track. As shown above, basketball is the most successful sport at recruiting in this analysis. Basketball also has a higher probability of getting recruits that are rank 1. The other interesting point about the rest of the department is that females with a rank of 1 and then with a rank of 2 are the most successful overall. This leads to the conclusion that males are not being as successfully recruited. All rules are useful here because the list is higher than one, so the rules are greater than picking random recruits.
 * __Football-Market Basket__**

Market basket analysis results the Football Department

While looking at the output for the Football market basket model, rank 1 individuals appear to be well recruited at Willamette University, which has a lift of 1.19. This accuracy of recruiting the top ranked players at Willamette University can be from the years of experience with recruiting and understanding what the admissions wants from a student as well as knowing what is needed to improve the team. Also, ethnicity 50 and incoming freshmen also appear on the list. With these rules developed, it may be more from an overwhelming amount of recruits which play football at Willamette have these attributes.

Market basket analysis results for track
 * __Track-Market Basket__**

The table above summarizes the attributes which comprise many of the recruits in track. It shows that over the past four years, Willamette University has been good at attracting top ranked male athletes, especially those who are freshmen and are of Caucasian ethnicity. This success in male recruiting could be stemmed from Olympic athlete, Nick Symmonds. Also from the data, it shows that Willamette University is also good at recruiting lower rank athletes, rank three, both male and females. Despite these rules, they do not have a lift greater than one; therefore do not add value to the recruiting process.


 * __Women’s Soccer-Market Basket__**

Market basket analysis results for women's soccer

Above are the results for the women’s soccer market basket analysis. It shows a great ability to recruit women who hold the rank of 2, especially those who are freshmen and of Caucasian ethnicity. All of these rules have a lift greater than one, which suggests that efforts should be focused upon those women who hold the rank of 2. A source of this may be the lack of scholarships that can be offered by Willamette University causing those players who rank higher to attend schools which do provide athletic scholarships.


 * Deployment **

After all of the analysis with the data collected and through utilizing the different models, the results will go to the athletic department at Willamette University. Hopefully, they will be able to use the results to create a better recruiting system. With this report, the Willamette recruiting office should be able to spot more clearly where the downfalls of their current system are. With the combination of some current policies and new policies, they will be able to create a stronger recruiting method that allows them to attract and recruit athletes that will benefit Willamette University sports teams.
 * Conclusion **

In the world of sports, recruiting has become very important in trying to attract some of the top athletes in the country to attend your schools. Schools are able to spend money and time in order to talk with different athletes that they feel will benefit their program the most. Some schools have an advantage in that they are able to offer money for scholarships either for their academics or sports scholarships. Finding and attracting high quality recruits is even more difficult for schools like Willamette who can only give academic scholarships and no athletic scholarships. In order to overcome this, their recruiting process needs to be that much more effective. After running the data Willamette has collected and processing it through four different models being; (Neural Networks, Clustering (Two-Step and K-Means), Market Basket Analysis (Apriori)) we were able to draw a few different conclusions. We found that overall, Willamette’s recruiting process was not very successful in attracting and getting athletes to apply and get accepted into Willamette. One drawback is, that due to the large number recruited for the football and track programs it greatly skews the overall findings. The most successful model was the market basket; however the confidence levels were so low that it really doesn’t mean much.

In order to improve Willamette’s recruiting in the future the athletic department will need to make some major changes. It would be beneficial for Willamette to collect more statistics for individuals so they could put those into the models we ran in order to potentially come out with better results due to more accurate predictors. They would also benefit by collecting and inputting data such as income, overall GPA, and SAT scores in order to determine which athletes have a better chance of being accepted. Geographic location could be another set of data that Willamette should look to in order to be able to better predict which athletes will be interested in coming to Willamette to play sports. All of this data combined, with their current data, should allow the recruiting office to have more predictors which should produce better results. If the new data was run through new models it may be possible to find more effective ways to recruit top tier athletes.

In order for Willamette to improve their overall recruiting process they could look to find people who went to Willamette and see if they have any kids that play sports offered at Willamette. They could also look to find commonalities or differences at different division III schools as well as maybe even division I and II in order to determine new ideas and concepts to try. Willamette can look at what recruits are ranked nationally in order to have a better sense of whether they will have any chance to recruit the athlete. Higher ranked athletes nationally will be less likely to come to a division III school than say an athlete below the top 100.


 * Errors **

This process could have many errors due to the inherent ways that the rankings are formed. Due to the fact that many times coaches and the athletic director determine the ranks of different players. Human error could pose a problem because coaches may have preset agendas or favorites, which may not be the best for the team. They may also set their sights way to high, which could set them up to fail. Due to a lack of information and that all of the information was categorical, a significant error that was found in the models that were run.


 * Recommendations **

It is our recommendation that Willamette University take the following actions in order to add structure to their recruiting methods and utilize the recruit data to the fullest extent possible. The first is to create some form of ranking rubric in order to have more consistency across all sports. This rubric would also allow for there to be more overlap between the admissions office and the athletics department by combining both criteria for recruits (See Exhibit C). The Admissions Office requires a certain GPA and SAT score to admit students as well as for academic scholarships. If the highest rank recruits do not meet these requirements, it is not worth the time of the coach to recruit them. Also, if the recruit will not add much value to the team but has great scores then it is not advantageous to athletic department either.

Another recommendation is to develop an athletic wide database that holds more information about recruits; a combination of athletic information and admission information. The database would hold things for the athletics department such as the rank, key statistics, and any information that has been determined as important. The admissions information would include things such as GPA, income, SAT score, geographic location, and other information.

If the range data increases for the athletic department, then there is the possibility to use different types of analysis. K-NN analysis is a clustering method that would be more effective because the range data allows for more variation to better cluster the possible outcome for recruits. A logistical regression would also be helpful in looking at which predictors have the most influence on the success of recruiting. Also, the better the transparency and uniformity in recruiting, the more useful neural networks would be in interpreting the data.

It is our hope that with improved information and techniques, modeling can help better enhance the recruiting process and gain higher efficiencies and success rates.


 * Exhibits **

Exhibit A: Data Manipulation: Separating Rank from Sport

Exhibit B: 14 Admit Status à Stat=1 or 0



Exhibit C: Example of a Potential Rubic 

[1] //Swick, Aaron. Willamette University Head Baseball Coach. Interview. February 12, 2009// [2] Roberts, Becky. Willamette University Tennis Coach. Interview. March 10, 2009 [3] //Larson, Nelson. Willamette University Men’s Soccer Coach. Interview. February 18, 2009// [4] Cass, Tony. Willamette University Head Recruiter, Assistant Football Coach. March 12, 2009 [5] //McGuirk, Matt. Willamette University Head Track and Field Coach. February 3, 2009// [6] McKinney, Robert. Team Sports, Statistics. Willamette.edu/athletics. April 13, 2009



Elizabeth Giligan - Kody Bentonte - Ken Beatty - Mike Bowers - Kyle McGeeney