NBA+Draft--True+Value+Index+Evaluator

NBA Draft Input & Output Data
 * //[[file:NBA Draft.str]] SPSS Modeler Stream

Nick Footer, Brett Hatton & Ahmed Yeslam//

**

=**Executive Summary**= // ** // =**Introduction**=
 * Late June is always an exciting time for NBA teams as they begin to cement the building blocks for the future. This all begins with the NBA draft and the high expectations of new college players becoming professionals. With any new venture, which many college players are for NBA teams, a business needs to determine if certain investments are worth the risk they entail. Our data mining project begins to look at a player's True Value Index (TVI) and other college statistics to determine where each player should be drafted. With the model predicting a players draft spot within an average of nine draft spots 68% of the time. Players and agents will also benefit from our experiment. Players who are trying to decide whether or not to forfeit their amateur status and become a pro basketball player can weigh the decision against the other respective players to estimate where they could be drafted. Agents are constantly looking to sign players who will be successful in the future and through the use of draftee NBA statistics, the model will project which players could have successful NBA careers. There have been many studies that predict the usefulness of a player based on college statistics, or studies that look at the NCAA college basketball championship in regards to where a player will get drafted, but none of which takes season statistics into account. Our study not only provides a new statistic (TVI), but also predicts exactly where a player should get drafted within an average of nine draft spots 68% of the time. There are always teams that will not take the best statistical player due to character, injury or position availability. We expected some variation due to this as we compared prior drafts and for the 2010 draft. The average nine spot swing has accounted for this variation and keeps our model in line with previous drafts.

**What is the True Value Index?**
The NBA Draft—True Value Index (TVI) Evaluator is a project invigorated by the goal to more accurately project the future success of NBA draftees’ performance in the NBA. The NBA draft occurs annually at the end of each NBA season and acts as the entry-point for NBA prospects to enter the NBA. The draft is divided into two rounds and each team—barring no trades of picks—is given a slot in each round to pick a NBA prospect based on their ending regular season record. The first fourteen selections are distinguished as “lottery picks” and are assigned differently from the rest of the selections. Like the other selections, lottery picks are somewhat based on a team’s ending regular season record. The lottery differs however in that the ending regular season record provides teams with a percentage chance to acquire the top pick—with the lowest ending regular season record receiving the greatest probability to acquire the top selection.

This project is centered on the TVI—a value which will be discerned to project a NBA draft prospect’s true value in the draft hierarchy. The TVI will be yielded from modeling key variables in measuring the effectiveness of a player based on game data, position played, conference the player is playing in and the height of the player in respect to the position they play. Estimating the NBA TVI performance for future draft picks will be yielded from comparing data from players in earlier drafts to their performance in the NBA. This will be done through comparing training--pre-Draft data--to testing--post-Draft data--sets to eventually yield an estimated NBA TVI for the incoming 2010 NBA Draft picks. The evaluation of the TVI to the prospect’s NBA TVI will provide greater insight into which statistics are important when analyzing a potential draft prospect and their expected performance in the NBA.

**Project Relevance**
TVI will provide NBA teams with another tool in their arsenal to make more informed decisions on draft picks—decisions that can affect their performance, not only as a team, but as an organization. It’s important to remember that NBA teams are organizations that function as such and thus have similar indicators that define success such as revenue, ROA and ROI. Draft picks are an investment in the team’s future performance and it is imperative to try and best predict the future affect this player will have for the team.

**Fulfilling Course Requirements**
The project is catered to modeling real world data to provide a solution to a real world ambiguity; an ambiguity entrenched in discerning what statistics are most relevant in selecting a player in the NBA draft. Discerning TVI will require an extensive knowledge of data mining practices: “data cleaning”, partitioning data and fitting models to name a few. The tools learned in the data mining course will be applied to a real world scenario; an ultimate manifestation of acquired and practiced knowledge.

=**Analysis of Business Solution**=

**Business Challenge**
**
 * The analysis that we have performed will provide information to a variety of different sources. NBA teams that do not want to lose revenue through poor drafting decisions, but also NBA agents and potential NBA players will use our project. NBA teams need to be sure that their investment in a player will be worth it through the accumulation of points and tickets sold, but more importantly championships. From the agent side, as draft picks do well they receive bonuses, larger contracts and endorsement deals all of which an agent gets a percentage of. They want to determine successful players and sign them to their agency. NCAA players who are interested in the allure of the NBA need to realize that they are competing against many other players and other the top 10 make big contracts. To do a cost benefit analysis, and determine to stay in school or not, a player needs to determine where they should get drafted. Our research will show all three of these groups an estimation to weight against the money they could make, whether or not to sign a player to your agency or where and when to draft a player compared to the draft field of the year in question.

**Business Goals**

 * ** NBA teams want to be able to determine the best player to draft that will provide the highest ROI for their team. **
 * ** Agents want to pick players who will have long and successful careers who sign multiple large contracts and endorsements. **
 * ** Players have to make a decision to stay in school or risk the draft and knowing where they will be drafted is the number one factor for most players. **

**Data Mining Goals**

 * ** The project needs to have a high R squared, preferably above 90%. **
 * ** Be able to predict the 2010 draft within an average of 5 players. **
 * ** Identify the variables with the highest importance. **

**Hypothesis**
** =**Data Mining Process**=
 * The highest factors in determining draft position will be TVI, points per game, and height and the model we create will accurately estimate where each player gets drafted within average of five spots.

**Literature Review**
Determining where a NBA player is drafted is extremely important to the teams drafting, but also to the player, as it will determine how much they get paid for the next 2-4 years of their lives. It also will give them a chance to weigh the risks that come along with giving up a colligate career for the professional rankings of the NBA. Much of the data mining and experiments have been done to determine how well a drafted player will do after they take the plunge into the professional ranking, but not on where they will be taken.

Determining where you get drafted also will measure the success that players will have throughout their career. By getting drafted in the top quarter of the draft you have a much more statistically significant chance of being a star or solid starter. There are always busts and players that over perform, but over the past 25 years over 36% of all NBA stars were drafted in the top quarter. The breakdown of each category is interesting, but the model is not forward-looking and therefore useless for our college stars trying to determine to declare for the NBA draft or not.

Studies are readily available that measure all of division I basketball players and rate their chances of being drafted or declaring for the draft. These studies do leave out high school, international and junior colleges as they are not the norm when it comes to the draft and measuring the talent that they each compete against was outside the scope of their studies. Many used principle component measures to reduce the amount of variables and saw the top two components measuring game stats and body stats. Game stats are variables that include points, minutes, and rebounds and body stats include height, weight and leaping abilities. These studies leave out the measurement that we are presenting which is their TVI, which is an important risk factor to weight.

A more recent study confirmed that the NCAA tournament plays a minor role in determining if and where a player gets drafted. If points go up during the NCAA tournament, so does the draft stock of a perspective player, but less so than you might think. Statistically, season numbers and team season success were much higher factors than the study assumed they would be. Overall this study was the most comprehensive that we found, but lacks the empirical evidence to help a student decide whether they will declare for the draft and more importantly, where they will be drafted. =**Data Set**=

Conducting research on NBA draft prospects will require a lot of data collection and statistical models to assess the true value of future NBA prospects. Data will be collected from players entering the NBA draft from 2005 to 2010. The 2005 draft was chosen as the starting point for this project because it was the first draft to prohibit High School seniors from entering the draft—data that would be too difficult to obtain.

Data will be collected from a multitude of sites: [|NBA.com], [|NCAA.com] , and [|Google Trends] These sites will contain data from players entering the draft as well as the draft placements of each player. Rough game data are often not enough for scouts and NBA teams to make a decision on whom to pick for the draft; an intangible variable such as hype—the player’s attractiveness to the media due to recent performance—will be more difficult to obtain. There are ways to try and quantify hype, and this may include running Google searches to see the amount of search results obtained or using the Google trends function to track the amount of appearances in the media over a period of time.


 * Below is an example of our data collection:**


 * || Draft || Class || Year || Position || Height || MPG || FG% || 3p% || FT% || RPG || APG || SPG || BPG || PPG || Google || Conf || TVI || Partition ||
 * Bogut, Andrew || 1 || So. || 5.00 || C || 84 || 35.00 || 62.00% || 36.00% || 69.20% || 12.20 || 2.34 || 0.97 || 1.86 || 20.43 || 10 || MWC || 4.71 || 1_Training ||
 * Williams, Marvin || 2 || Fr. || 5.00 || F || 81 || 36.00 || 50.60% || 43.20% || 84.70% || 6.56 || 0.72 || 1.08 || 0.47 || 11.31 || 14 || ACC || 4.18 || 1_Training ||
 * Williams, Deron || 3 || Jr. || 5.00 || G || 75 || 39.00 || 43.30% || 36.40% || 67.70% || 3.64 || 6.77 || 0.97 || 0.21 || 12.54 || 22 || Big Ten || 4.08 || 1_Training ||
 * Paul,Chris || 4 || So. || 5.00 || G || 72 || 32.00 || 45.10% || 47.40% || 83.40% || 4.50 || 6.63 || 2.38 || 0.03 || 15.25 || 1.5 || ACC || 5.42 || 1_Training ||
 * Felton, Raymond || 5 || Jr. || 5.00 || G || 73 || 36.00 || 45.50% || 44.00% || 70.10% || 4.31 || 6.92 || 2.00 || 0.31 || 12.89 || 17.5 || ACC || 5.26 || 1_Training ||
 * Villanueva,Charlie || 7 || So. || 5.00 || F || 83 || 31.00 || 52.10% || 50.00% || 68.80% || 8.29 || 1.29 || 0.65 || 1.84 || 13.58 || 30 || Big East || 4.69 || 1_Training ||
 * Frye,Channing || 8 || Sr. || 5.00 || C || 83 || 37.00 || 55.40% || 17.60% || 83.00% || 7.62 || 1.92 || 0.95 || 2.30 || 15.78 || 34 || Pac 10 || 4.24 || 1_Training ||
 * Diogu, Ike || 9 || Jr. || 5.00 || F || 80 || 32.00 || 57.50% || 40.00% || 79.70% || 9.75 || 1.34 || 0.56 || 2.34 || 22.63 || 50 || Pac 10 || 5.50 || 1_Training ||
 * May, Sean || 13 || Jr. || 5.00 || C || 80 || 37.00 || 56.70% || 0.00% || 75.80% || 10.73 || 1.68 || 1.16 || 1.03 || 17.49 || 3 || ACC || 4.82 || 1_Training ||
 * McCants, Rashad || 14 || Jr. || 5.00 || G || 76 || 33.00 || 48.90% || 42.30% || 72.20% || 3.00 || 2.67 || 1.30 || 0.67 || 16.00 || 10 || ACC || 4.13 || 1_Training ||
 * Wright, Antoine || 15 || Jr. || 5.00 || F || 79 || 31.00 || 50.10% || 44.70% || 69.10% || 5.97 || 2.23 || 1.16 || 0.71 || 17.84 || 52 || Big 12 || 4.23 || 1_Training ||
 * Graham, Joey || 16 || Sr. || 5.00 || F || 79 || 33.00 || 52.90% || 47.30% || 88.70% || 6.18 || 2.03 || 0.91 || 0.15 || 17.73 || 50 || Big 12 || 4.02 || 1_Training ||
 * Granger, Danny || 17 || Sr. || 5.00 || F || 80 || 30.00 || 52.40% || 43.30% || 75.50% || 8.87 || 2.37 || 2.10 || 2.00 || 18.83 || 50 || MWC || 4.76 || 1_Training ||
 * Warrick, Hakim || 19 || Sr. || 5.00 || F || 80 || 34.00 || 54.80% || 29.00% || 68.10% || 8.65 || 1.47 || 0.97 || 0.79 || 21.35 || 30 || Big East || 4.74 || 1_Training ||

Step 1: Open excel document Step 2: Go to Nba.com and determine player to be charted (If European player, skip player and move to next) Step 3: Enter player name in NCAA.com and copy career totals Step 4: Paste into document Step 5: Open Google Trends Step 6:Type player into Trends and enter the number that corresponds with the last week in June for their draft year (Week of the draft) Step 7: Repeat for all players from 2005-2009.
 * Data Collection Process:**

The key finding in this project will be the True Value Index (TVI)—a number that will suggest where a draft prospect should be taken in the draft. TVI will be derived by comparing data available for prior draft picks (dating back to 2005) to how those draft picks are currently performing in the NBA. Thus, the training set will consist of the data from draft picks before they were taken, and the testing set will consist of the data obtained from the draft picks in their NBA career. TVI will be the output variable. TVI for the training set will be equivalent to the actual draft pick (i.e. Andrew Bogut, who went first in 2005 draft pick would have a TVI of 1). TVI for the testing set will be derived from comparing the two.

The final component of the project—assessing a TVI for the incoming 2010 draft class—will result in using the model obtained from the prior draft analysis and extrapolating that to the 2010 draft. The TVI will ultimately provide NBA teams with another parameter to assess a potential player’s worth.


 * Description of Draft Variables:**

Big 10- Its eleven member institutions are located primarily in the Midwestern United States, stretching from Iowa and Minnesota in the west to Pennsylvania in the east. Big East- Is a collegiate athletics conference consisting of sixteen universities in the northeastern, southeastern and midwestern United States. ACC- Atlantic Coast Conference is a highly competitive conference in basketball. PAC 10- Pacific-10 Conference is a college athletic conference which operates in the western United States. SEC- Southeastern Conference is a college athletic conference headquartered in Birmingham, Alabama, which operates in the southeastern part of the United States. BIG 12- Is a college athletic conference of twelve schools located mostly in the central United States. C-USA- Conference USA is a college athletic conference whose member institutions are located within the Southern United States. ||
 * **Draft** || The position at which the player was taken in the draft ||
 * **Class** || Year of college that the player left for the NBA draft ||
 * **Year** || The season year where the player got drafted on ||
 * **Position** || Set of three values: G for Guard, C for center , and F for forward ||
 * **Height** || The height of the player in inches ||
 * **MPG** || Minutes per game ||
 * **FG%** || Field Goal percentage ||
 * **3p%** || Three Point percentage ||
 * **FT%** || Free Throw percentage ||
 * **RPG** || Rebounds per game ||
 * **APG** || Assists per game ||
 * **SPG** || Steels per game ||
 * **BPG** || Blocks per game ||
 * **PPG** || Points per game ||
 * **Google** || A measure that gauges the frequency of google searches for that player in their season before the draft; a measure that attempts to measure the essence of the player's popularity and hype with media and fans. The lower the value the more a player was searched for on google. The value is derived from google trend searches, which provides a value--search volume index--which gauges the amount of searches a search yields throughout the years. Larger values generally mean that there were relatively few searches before hand, and some event--say the NBA Draft--occurred, which would result in a spike in the data. Spikes in the data represent certain events that may spark more searches, thus a player who is relatively popular--Kevin Durant--would have a low search volume index value because of his popularity, and thus when the draft occurs, there will be a slight increase in the search volume index. ||
 * **Conf** || The conference where the player play at. Each conference has a number of teams and the conferences reflects the region. The conference determines the type of competition that the player has been up against. Examples of major conferences:
 * **TVI** || True Value Index--a measure which attempts to gauge a player's effectiveness based their per game stats, position played, height, conference and google value. The equation for a Center's TVI = 2*(PPG/MPG)*(FG%*0.6 + 3p%*0.2 + FT%*0.2)+(RPG*0.8 + APG*1.5 + SPG*1.5 + BPG*1.5)/5 + IF(Conf = Major Conference, 1, 0) + IF(Google < 20, .5, 0) + IF(Position = C, Height/82-1),0). This equation is variable, based upon the position the player plays, as each position has different stats that measure effectiveness. For example, Centers generally rebound the ball more than guards. Thus, in the equation for a Center, RPG was reduced by 20% and APG was increased by 50% to try and measure the versatility of the player at their position. TVI is thus centered around gauging a player's effectiveness at their position and at their ability to play other styles of basketball. As originally intended, TVI would be gleaned from developing a model in SPSS, however, we felt as a group that developing the metric beforehand, and then using it as the target variable for projecting future draft picks, would be the best strategy. ||

=Model Development= The model for assessing a future NBA player's draft pick was based around the TVI metric (explained in the "Description of Draft Variables" section). The model was built using linear regression, with the key metric for assessing the success of the model being the Adjusted R-squared value as well as the standard deviation between estimated TVI projected draft and TVI values of current NBA performance. The SPSS Modeler stream is pictured below:



Preparing the data for modeling included reclassifying variables such as Position, Class and Conference into categorical variables. For example, Position--C, F and G--were changed to 1, 2 and 3 respectively. After reclassification, the data was run through the Auto Data Preparation node, which normalized values into z-scores (altering each variable to have the same weight, a z-score which is the standard deviation the record experiences in a specific variable). Normalizing the data is important in this model because of the variety of units each variable is measured in. The data was then run through the linear regression node. Partitioning the data occurred in this node, due to the special circumstance of the project. Pre-draft data was partitioned as the training data and post-draft data was partitioned as the testing data--thus comparing the players NBA performance to how they performed in college. The image belows shows how the data was partitioned as such.



The selection of variable types were done in the regression node--TVI as the target variable and the other data as input variables. Stepwise regression was chosen for this analysis, as it produced the largest Adjusted R-squared value of .769, which indicates that the variables in the data set can explain about 77% of the correlation to TVI; 23% of TVI cannot be explained by the model, and perhaps other intangible qualities such as a player's potential can account for this gap. The SPSS regression output is displayed below:



The model was run through the 2010 data to produce the estimated TVI values for future NBA performances of 2010 draft prospects. Below is an excel printout of what the model projects.

 The estimated TVI values are denoted as $E-TVI_transformed, and the first column, Draft Position, indicates where the model believes the players should be drafted based on their projected NBA performance. =**Model Characteristics and Findings**= The defining aspect of this model is the unique use of partitioning between training and testing sets. Partitioning the data to compare against two known instances--pre-Draft performance and post-Draft performance--provides our analysis with the ability to better assess future performance of incoming draft prospects by finding out which variables hold the greatest importance in producing a high NBA TVI. The key variables are as follows:
 * Draft Position || Name || Class || Year || Position || Height || MPG || FG% || 3p% || FT% || RPG || APG || SPG || BPG || PPG || Google || Conf || Est. Draft || $E-TVI_transformed ||
 * 1 || Turner, Evan || Jr. || 10 || G || 79 || 31 || 51.90% || 36.40% || 75.40% || 9.161 || 5.968 || 1.742 || 0.903 || 20.387 || 12.5 || Big 10 || 1.00 || 1.57 ||
 * 2 || Faried, Kenneth || Jr. || 10 || C || 80 || 35 || 56.40% || 25.00% || 59.50% || 13.029 || 0.543 || 1.6 || 1.914 || 16.857 || 55 || OVC || 2.00 || 1.19 ||
 * 3 || Parakhouski, Artsiom || Sr. || 10 || C || 73 || 31 || 58.10% || 25.00% || 56.20% || 13.355 || 1.065 || 0.613 || 2.097 || 21.355 || 55 || BSC || 3.00 || 1.18 ||
 * 4 || Monroe, Greg || So. || 10 || C || 73 || 34 || 52.50% || 25.90% || 66.00% || 9.647 || 3.765 || 1.235 || 1.529 || 16.147 || 12.5 || ACC || 4.00 || 1.13 ||
 * 5 || George, Paul || So. || 10 || F || 80 || 29 || 42.40% || 35.30% || 90.90% || 7.241 || 3.034 || 2.207 || 0.828 || 16.793 || 5 || WCC || 5.00 || 0.98 ||
 * 6 || Johnson, Wes || Jr. || 10 || F || 79 || 35 || 50.20% || 41.50% || 77.20% || 8.543 || 2.229 || 1.657 || 1.829 || 16.486 || 11.25 || Big East || 6.00 || 0.96 ||
 * 7 || Samhan, Omar || Sr. || 10 || C || 73 || 34 || 55.30% || 0.00% || 72.70% || 10.853 || 1 || 0.441 || 2.912 || 21.294 || 30 || WCC || 7.00 || 0.96 ||
 * 8 || Fields, Landry || Sr. || 10 || G || 79 || 32 || 49.00% || 33.70% || 69.60% || 8.75 || 2.781 || 1.594 || 0.781 || 22 || 55 || PAC 10 || 8.00 || 0.89 ||
 * 9 || Koshwal, Mac || Jr. || 10 || C || 82 || 19 || 54.40% || 0.00% || 55.00% || 10.053 || 2 || 1.842 || 0.895 || 16.105 || 55 || Big East || 9.00 || 0.87 ||
 * 10 || Varnado, Jarvis || Sr. || 10 || F || 81 || 36 || 58.20% || 0.00% || 61.00% || 10.25 || 0.889 || 0.667 || 4.722 || 13.778 || 21.25 || SEC || 10.00 || 0.86 ||
 * 11 || Udoh, Ekpe || Jr. || 10 || C || 73 || 36 || 49.00% || 26.90% || 68.50% || 9.75 || 2.694 || 0.806 || 3.694 || 13.889 || 37.5 || Big 12 || 11.00 || 0.86 ||
 * 12 || James, Damion || Sr. || 10 || F || 79 || 34 || 50.10% || 38.30% || 67.40% || 10.294 || 0.971 || 1.676 || 1.176 || 17.971 || 55 || Big 12 || 12.00 || 0.82 ||
 * 13 || Jones, Dominique || Jr. || 10 || G || 76 || 33 || 45.00% || 31.10% || 74.10% || 6.091 || 3.636 || 1.697 || 0.576 || 21.364 || 13.75 || Big East || 13.00 || 0.82 ||
 * 14 || Aminu, Al-Farouq || So. || 10 || F || 81 || 31 || 44.70% || 27.30% || 69.80% || 10.71 || 1.323 || 1.419 || 1.419 || 15.839 || 55 || ACC || 14.00 || 0.80 ||
 * 15 || Vasquez, Greivis || Sr. || 10 || G || 78 || 33 || 42.90% || 35.90% || 85.70% || 4.636 || 6.303 || 1.697 || 0.364 || 19.606 || 22.5 || ACC || 15.00 || 0.78 ||
 * 16 || Wall, John || Fr. || 10 || G || 76 || 37 || 46.10% || 32.50% || 75.40% || 4.297 || 6.514 || 1.784 || 0.514 || 16.649 || 8.75 || SEC || 16.00 || 0.77 ||
 * 17 || Cousins, DeMarcus || Fr. || 10 || F || 73 || 38 || 55.80% || 16.70% || 60.40% || 9.842 || 1 || 0.974 || 1.763 || 15.132 || 11.25 || SEC || 17.00 || 0.75 ||



The key variable in assessing NBA TVI--and thus draft position--are rebounds per game. Generally, a player who averages more rebounds, especially one that doesn't play at Center--suggest a couple of things: 1) The player is fundamentally sound and 2) The player is gritty. Rebounding is a fundamental aspect of the game, and is not only the responsibility of Centers. Boxing out--sealing off the offensive player from grabbing the rebound--is fundamental to preventing the other team from rebounding, and thus facilitating a team's ability to get out to run the ball down court on offense. Rebounding may also suggest the grittiness of a player, as often rebounds are collected in the paint, the area right around the rim where a lot of competition over grabbing the ball occurs. Securing the ball from 9 other players is one manifestation of a player's passion and grit for the game. Another variable of importance, the google value, is an interesting metric. The variable was implemented to try and quantify the "hype" of the player with media and fans. The inclusion of the variable importance chart suggests that "hype" does have some influence on projecting a draft prospect's future NBA performance.

A unique output from this project is the ReDraft spreadsheets that were built using the estimated TVI values projected in the training--pre-Draft--data set and comparing these values to the NBA TVI of players from the testing--post-Draft--data set. These spreadsheets provide the analysis with a quantifiable metric of the model's ability to predict future NBA performance. The 2006 ReDraft spreadsheet is show below:



The two columns to pay attention to are the Predicted Draft and TVI Draft. The Predicted Draft is based on the estimated NBA TVI produced by the model. The model projects Paul Millsap to produce the greatest NBA TVI value. This is interesting because he was actually selected as the 47th pick in the NBA Draft, suggesting that the team who selected him--Utah Jazz--got a steal. This is checked against his NBA TVI value, which ranks him as currently the 7th best player in the draft based off NBA performance. This instance suggests a great fit, as the model can even find players who were picked late in the draft, who actually performed well in the NBA. There are instances, however, of the reverse as exhibited by P.J. Tucker. He was selected in the real draft as the 35th pick and is currently performing as 44th selection based on NBA TVI statistics. The model however projects P.J. Tucker to perform as a 5th pick. This suggests that there is a great amount of variance in the model. Another flag for our team was John Wall being drafted 16th in the 2010 predicted draft. From our knowledge of college basketball we know that he is expected to be a top draft pick. Even though John Wall doesn't have a high TVI we feel he does have high draft stock potential and deserves to be drafted in one of the top five spots. We see that our model needs the addition of a few more variables that would include games won, offensive rebounds and a few others to increase our accuracy and lower standard deviations of draft to predicted draft.

The standard deviation was calculated from finding the absolute difference in picks from Predicted Draft to the TVI Draft--yielding a standard deviation of 9.02 or 3 standard deviations of about 27 picks. This number is slightly paradoxical, as the model suggests an accuracy of 77%. One explanation is manifest in the amount of intangible variables that go into selecting a draft pick. Intangibles such as potential, athletic abilities, basketball "IQ", demeanor and chemistry--the ability for a player to fit into the schemes of an NBA team--can all influence the performance of a player in the NBA. The model does however perform 2 times better than a random guess with 50% of the data as exhibited below in the lift chart:



=**Conclusion**= The True Value Ind ex draft evaluator model took on many forms. The project was initially conceived as a model that would output a TVI based off college statistics. This method was abandoned for a group-derived TVI metric, which was calculated based on intimate knowledge of important statistics to the game of basketball. TVI is an attempt to measure a player's effectiveness in basketball; a value which can explain a player's ability to influence a team's performance. The metric was based on easily gathered data, RPG, PPG, APG, etc... resulting in an Adjusted R-squared value of 77% and a standard deviation of draft picks with a 95% accuracy of 27 picks. There were a variety of limitations to our analysis and we believe that other data can greatly affect the calculation of TVI, which would result in a more accurate forecast of projected NBA TVI and thus performance in the NBA. These variables would include winning percentage, turnovers and offensive rebounds per game among others. These variables were either unavailable or too time consuming to obtain. No model can perfectly predict a player's future performance in the NBA and thus the appropriate draft pick to allot the player. With more data of NBA drafts and how these draft picks performed over their entire career the model could be finely tuned. Also, intangible variables such as potential, demeanor, fit, chemistry and motivation all influence a player's future performance and are extremely hard to quantify. Thus the outputs from this model should be used in alliance with other metrics such as personal interviews, attending games of the player and combines--practices held with the player to assess his physical prowess--for making a decision on selecting a player in the NBA draft.

=**Takeaways**= The amount of data available in the world today is enormous, allowing for a seemingly infinite ways to analyze and use data to influence and support business and real-world decisions. Data can be used in a multitude of ways; for this project specifically we used historical data and compared it against itself to better predict a model for future performance. This is an important takeaway in data mining, as better and more accurate forecasts can be gleaned by comparing known data to project future, similar scenarios. This is manifest in the partitioning of pre-Draft data and post-Draft data into training and testing sets. This simple partitioning allowed the data to be trained against the performance of the same players in the NBA to better predict the key variables of a college athlete's performance that influences their performance in the NBA. Understanding the nuances of a business, industry or scenario is also a great way to increase model accuracy. When building our model, we derived a formula for TVI based on an intimate knowledge of basketball, which undoubtedly provided a more accurate model for predicting future NBA success of draft prospects. Statistical models are rarely the panacea for business dilemmas, and thus a model built from intimate business knowledge, accompanied by non-quantitative data of the scenario, can result in highly accurate and successful models.