Search+Advertising

By: Miranda Gestrin, Bre’ Greenman, and Sidnee Schaefer

Predict the click-through rate of ads given the query and user information.


 * Introduction **

The click-through rate of advertisements is becoming a key evaluation technique in the advertising industry. To better understand consumers, it is important to know the influential factors leading to click-through rates and how they relate to search advertising. In this project, the goal is to evaluate the effect of user queries and user information on the predicted click-through rates. While doing so, our team will further its understanding of data mining techniques and analysis.

These results will benefit SoSo.com in its efforts to understand search advertising. The predictions will help the company to increase return on investment for its advertisers. Additionally, these results can be generalized for others in the advertising industry. The results will be presented on May 3, 2012 to an audience of approximately 25 MBA candidates. The results will also be posted on both WikiSpaces.com and KDDCup2012.org.

The project will be conducted using SPSS Modeler software to create predictive models likely including, but not limited to hierarchical clustering, logistic regression, and Baynesian networks. From these models, the best predictor will be selected for recommendation.


 * Literature Review **

Advertising click-through rates have been researched from a variety of angles in the past. Modeling has been created to predict the number of clicks per ad, the factors that contribute to the click-through rate, and predicting which ads should be displayed during web browsing. The models used in predicting click-through were hierarchical logistic regression models, SUR regression, structural equation model approach, maximum likelihood estimate with hierarchical clustering, neural networks, Baynesian networks, feature selection, and modeling trees.

To study the click-through rates of ads given the query and user information, we need to be able to build a reliable model utilizing the relevant data. We will need to use data mining skills such as data cleaning, modeling, and interpreting the results. Likely models will include hierarchical clustering, logistic regression, and Baynesian networks. While analyzing the data, we will need to minimize the mean absolute error and the area under the curve.

Most studies of advertising click-through rates lead data miners to weigh variables within the regression analysis. This is crucial for the analysis to distinguish between the importance of the variables.

The majority of the work that has been completed in this area of study has used specific per click data samples gathered on a collection of ads. Other models have been developed based on ads that were tested in focus groups to determine the ads appeal to web users. Some studies have looked at where ads should be placed within the page and the order of the placement to achieve higher click through rates. This project will be analyzing data from a search engine with click ratios on real ads. This environment is not simulated and has data on a broader selection of ads that were viewed during live search sessions.


 * Procedure **

Objective: Predict the click-through rate of ads using query and user information.

Tasks:
 * Utilize KDD Cup 2012 website (kddcup2012.org) and download file with soso.com data.
 * Import and prepare data.
 * Conduct data audit removing outliers and missing data values.
 * Analyze variables, setting variables to dummy variables as appropriate.
 * Run a feature select on the data and normalize the data in separate streams to identify difference in the results.
 * Following normalization, a principle components analysis will be conducted and utilized as appropriate.
 * The streams will be run through a partition and balanced.
 * Create models for comparison, and evaluate based on minimizing the weighted mean absolute error and the area under curve.

Materials:
 * Computer
 * SPSS Modeler
 * Data set
 * Data Mining for Business Intelligence text
 * Calculator
 * Microsoft Office Suite

The data necessary for the project can be found at kddcup2012.org under Track 2. ([])

Timeline:
 * Project Scope: April 12, 2012
 * Data stream and models complete for analysis: April 20, 2012
 * Predictions complete with written analysis: April 26, 2012
 * Presentation prepared: April 30, 2012
 * Final presentation of findings: May 3, 2012

Team formation: Miranda Gestrin, Bre’ Greenman, and Sidnee Schaefer will work together to complete all task as a cohesive team. Each person will contribute to all areas of the project including research, data preparation and modeling, and analyzing the results to make a prediction.

Costs: The project will incur opportunity costs for each team member. There are no additional monetary costs associated with this project.

Bucklin, R. E., & Rutz, O. J. (2011). From generic to branded: A model of spillover in paid search advertising. //Journal of marketing research//, //XLVIII//, 87-102. Chan, D. X., Yuan, Y., Keohler, J., & Kumar, D. (2011). Incremental clicks: The impact of search advertising. //Journal of advertising research.// Chapelle, O., & Zhang, Y. (2009). A dynamic bayesian network click model for web search ranking. //Distribution//, //34//(5), 1-10. Duff, B. R. L., & Faber, R. J. (2011). Missing the mark: Advertising avoidance and distractor devaluation. //Journal of ddvertising//, //40//(2), 51-62. Fain, D., & Regelson, M. (2006). Predicting click-through-rates using keyword clusters. //Electronic Commerce//, Retrieved from http://diyhpl.us/~bryan/papers2/marketing /Predicting click-through rate using keyword clusters.pdf Gopal, R., Li, X., & Sankaranarayanan, R. (2011). Online keyword based advertising: Impact of ad impressions on own-channel and cross-channel click-through rates. //Decision support// // systems //, //52//, 1-8. Lin, Y., & Chen, Y. (2008). Effects of ad types, positions, animation lengths and exposure times on the click-through rate of animated online advertisings.//Computer and industrial// // engineering //, //57//, 580-591. Richardson, M., Dominowska, E., & Ragno, R. (2007). Predicting clicks : Estimating the click-through rate for new ads. //Framework//, //23//(1), 521-529. Rosenkrans, G. (2010). Maximizing user interactivity through banner ad design. //Journal of// // promotion managment // ,//16//, 265-287. Weiss, D. (2008). Predicting ads’ click-through rate with decision rules. Wolk, A., & Theysohn, S. (2007). Factors influencing website traffic in the paid content market. //Journal of marketing management//, //23//(7-8), 769-796. Yoo, C. Y. (2011). Interplay of message framing, keyword insertion and levels of product involvement in click-through of keyword search ads. //International journal of advertising//, //30//(3), 399-424. Zhang, Y., Jansen, B., & Spink, A. (2008). Identification of factors predicting click-through in web searching using neural network analysis. //Journal of the American Society for// //Information Science//, //60//(3), 1-15.
 * Bibliography **