moviegroup

**Project Background: ** · We are a DVD producer looking to discover how to better understand sales trends of DVDs. With the sales trend knowledge, we hope to better predict sales forecasts and which types of DVDs will be most profitable. · Our data set includes top grossing films from 2005 to 2008 along with categorical indicators such as distributors, genre, release date and rating and budget. We will use this data to predict sales revenue and units demanded of DVDs upon release. ** Business Goal ** · More accurately forecast total demand so we do not under or over-produce, thus lower total costs and increase profit margin. · Identify characteristics of movies in the box office that are most likely to spur DVD sales. · Score and rank movies by probability of unit sales. · The team hypnotizes that larger grossing movies of popular genres will sell more DVDs. According to economic studies, “The growth in DVD spending was propelled by the plethora of box office titles that became available in 2006, including 15 films that generated more than $100 million each at the box office.”[|[1]] Moreover, when the box office is doing well, DVD sales often slump, indicating the seasonality of box office hits and DVD sales in an inverse relationship in the same moment of time.[|[2]] · The following data sets will be used for analysis. · ‘Tickets Sold’ will be our independent variable.
 * Data Mining Project Proposal **
 * Business Challenge **
 * Data Mining Goal **
 * Hypothesis **
 * Data Set **
 * Number || Column Name ||
 * 1 || Movie Name ||
 * 2 || Units Sold ||
 * 3 || Sales Revenue ||
 * 4 || Release Date (segmented into multiple dimensions) ||
 * 5 || Distributor Date (segmented into multiple dimensions) ||
 * 6 || Genre ||
 * 7 || MPAA Rating ||
 * 8 || Annual Gross ||
 * 9 || Tickets Sold ||
 * 10 || Inflation – Adjusted Gross ||
 * 11 || Budget ||
 * 12 || U.S. Gross ||
 * 13 || Worldwide Gross ||

1. Pre process the data to show industry understanding (including setting dummy variables and transforming skewed ranges). 2. Partition the dataset into training and validation sets at 40% and 60% respectively. 3. Test different models evaluating the effectiveness of each model in how well it can forecast unit sales. 4. Using the top two models, experiment with advanced features and output configurations to maximize probability of model success. 5. Evaluate the top model by predicting its effectiveness and costs savings as it relates to proper demand forecasting. · To evaluate and interpret the models, the team will use lift charts, prediction accuracy, and gains charts. ·<span style="font-family: 'Times New Roman', Times, serif;"> <span style="font-family: 'Times New Roman', Times, serif;">[|Sample Data from "The-Numbers.com"]
 * Data Mining Process **
 * Data Interpretation Plan **
 * <span style="font-family: 'Times New Roman', Times, serif;">Sample Data **

[|[1]] [] [|[2]] []

<span style="font-family: 'Times New Roman', Times, serif;">** Test Model Results ** Initial stream (test run) <span style="font-family: 'Times New Roman', Times, serif;"> <span style="font-family: 'Times New Roman', Times, serif;"> <span style="font-family: 'Times New Roman', Times, serif;"> <span style="font-size: 11pt; line-height: 115%; font-family: Calibri; mso-ansi-language: EN-US; mso-fareast-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">As hypothesized, initial analysis suggests that box office sales are highly correlated with DVD sales

This is almost a perfect lift chart. We will continue analysis with a training validation set and the inclusion of additional variables.

Kristy Bolsinger Ben Crop Frédéric Gojard Mary Hadley
 * Group:** Prakash Achuthan