DYSS

**Joe Parisi **  The subject of this project is game-by-game attendance at Major League Baseball (MLB) stadiums. The goal of this project is to examine the attendance records in MLB ballparks and find correlations by looking at certain variables that could account for the fluctuations in game-to-game attendance. By looking at both internal factors such as team performance statistics and external factors such as weather and time of year, our hope is to find what a team owner, general manager, and event coordinators can do to affect attendance at their ballparks. Our hope is to not only examine what efforts have worked in the past, but also offer insight into predicting what efforts can increase attendance in the future. There are a number of questions to be answered by this analysis, such as: The audience for this project is Major League Baseball owners, general managers, event coordinators, and all other management. A sub-audience of our project would be any and all sports franchises who are interested in what effects stadium attendance. This project can give good insight to not only MLB baseball attendance, but sporting event attendance for all levels of competition. The results will be presented May 5, 2012 consisting of an analysis of the 2011 MLB season. The analysis will focus on 6 MLB franchises: These teams were strategically selected to reflect the diversity in markets within Major League Baseball. We chose an equal amount of Large Market (Dodgers, Yankees, Braves) and Small Market(Kansas City, Seattle, Tampa Bay) teams, and within each there are varying degrees of performance success. This selection of teams will serve to minimize bias and reflect the wide range of franchise salaries and market sizes throughout the league. The results of this project will be used to determine factors that significantly affect stadium attendance in MLB stadiums. Of the factors that affect attendance, the focus of our results will be of those that can be altered and adjusted by franchise management. This focus is due to external factors being taken into consideration that cannot be determined by management (i.e. weather, city population, etc.) The procedure and methods used to conduct this project will be within SPSS Modeler. The data will come from external databases (baseball-reference.com, stats.com) and converted into Microsoft Excel. Models within SPSS modeler will consist of: Baseball is no stranger to statistical analysis. The mainstream popularity of baseball statistics has recently been brought to light by a best-selling novel and a subsequent Academy Award nominated motion picture. In 2003 Michael Lewis wrote the book //Moneyball// which detailed the story of the mediocre baseball franchise, The Oakland Athletics and their rise to fame through the strategies of the General Manager Billy Beane. This book on the statistics of baseball may have brought this theory to light, but this study has been around for many years. One man, Bill James, has been writing the //Baseball Abstract// and using his theory of Sabermetrics since 1985, but it wasn’t until Billy Beane put the theory into practice to create a team of misfits and outcasts and turn them into a playoff team. Since then, people have used Sabermetrics to try and replicate what Billy Beane has done. Baseball is one of the best arenas to do statistical analysis because there are baseball statistics dating back to the late 1800’s. Because of the numerous amounts of baseball statistics coupled with James’ new theories about how to win baseball games, statisticians have been trying to predict how to win more baseball games, draft better players, and spend money more wisely. This practice has even leaked into fantasy baseball where wannabe baseball owners try to figure out how to get the most bang for their buck when putting together a baseball team. Considering this, there are two main reasons a baseball franchise wants to gain more talent and create a better baseball team: 1) To win a world championship and 2) to fill the stands and sell more tickets. While it is true that a better team needs to be created to win a championship, is it really true that a franchise needs a better team to increase attendance? There are many studies about improvements of a baseball team that leads to an increase in attendance, but not many studies that take into account variables outside of team performance The major thing that needs to be mastered in order to accomplish this project is the understanding of baseball statistics and many of the new analysis that is being done on baseball statistics. As fans of the game for 20+ years, we have this arena covered. Secondly, a sense of statistical analysis also needs to be mastered. In order to create, test and re-test this idea, there needs to be good understanding of how to sort through data, discard non-relative and non-important items and really hone in on those variables that can either prove or disprove this idea. Today, baseball is a multi-billion dollar industry and there is no better time to try and gain a leg up on the competition. Also, the state of baseball statistics is at an all-time high. Baseball statistics and statisticians have been around for years, but today, there is a new found light on the subject. This rejuvenated passion came about because of the history or baseball, the sheer amount of data available and a new theory on how to use and apply new theories. Many baseball statistics today closely tie attendance with on-field performance. What we intend to accomplish is to compare baseball attendance with other factors outside on field performance. While not discarding on field performance, as we understand this could have an effect of attendance, we want to find out what other factors bring people to the ballpark. <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Compiling data from Baseballrefence.com and stats.com we will use SPSS modeler, Excel and Minitab to run our analysis and come to a conclusion about ballpark attendance. <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">All of the analysis we be based on ballpark attendance during the 2011 baseball season. Baseball has a rich history in keeping data dating back to the late 1800’s. Considering this, it will be easy for us to collect data from the Internet. All of this data can be collected online through sites like baseballreference.com stats.com and bobblehead.com. <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Basic data mining techniques will be used in the analysis of this data. Taking into account outliers and extremes, normalization and standardization, the data will be analyzed using PCA factor and Feature Select. The data will then be analyzed using regression tress to determine the most important factors of a game being attended (more than 30,000 fans) and if a prediction game be made as to what variables can be used in the prediction of game attendance. Logistical regression will also be used to compare the results. Neural Networks will also be ran to see if it is possible to predict attendance at baseball games. <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Collecting of data, analysis of data and interpretation of results should take no longer than 2 weeks. <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Both team members will accomplish the tasks of creating the project idea, collecting data, analyzing data and interpreting results. <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">There are no costs that will be associated with this project. <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Bibliography: <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">[] <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">[] <span style="font-family: 'Times New Roman',serif;">[] <span style="font-family: 'Times New Roman',serif;">[] <span style="font-family: 'Times New Roman',serif;">[]
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Casey Morgan **
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">How can a small market team compete in ticket sales with a large market team?
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Does team performance trump and external factors of attendance?
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Can a promotion or special event significantly boost ticket sales and be cost effective?
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Is it more cost effective to focus on team performance (player salaries) or a fantastic stadium experience?
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">New York Yankees
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Atlanta Braves
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Kansas City Royals
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Seattle Mariners
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Los Angeles Dodgers
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Tampa Bay Rays
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">CT 5.0
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">CHAID
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Quest
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Neural Net
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Feature Select
 * <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Bayes Net