Biological+Response+Rate

YWF Consulting Thomas Leveleux Eliana Penzner Chris Riley

__Predicting A Biological Response__

INTRODUCTION: YWF Consulting is looking to identify the biological responses of molecules from their chemical properties. Using molecular characteristics and their respective biological response rates we plan to relate molecular information to biological responses. As leaders in the pharmaceutical industry, Boehringer Ingelheim is looking to YWF Consulting and Kaggle.com, to “drive innovative solutions to this scientific challenge." Boehringer Ingelheim is a family owned business of 125 years that focuses on human pharmaceuticals and animal health. Inherent in their success is the emphasis Boehringer Ingelheim has on research and development. In 2010, Boehringer Ingelheim had conducted or funded over 1,320 clinical studies.# YWF Consulting plans to support Boehringer Ingelheim’s efforts by building a model to fit their given data. YWF Consulting will present their findings and submit their final report to Kaggle.com on May 3, 2012.

BACKGROUND: The field of predicting biological responses to molecular information has been around for 45 years. Specifically, this activity is classified as quantitative structure-activity relationship (QSAR) and is used heavily in the fields of chemistry - agro, pharmaceutical and toxicology. The first QSAR formulation was published in 1962 by Hansch and Muir. In their study they described the “structure-activity relationships of plant growth regulators and their dependency on Hammett constants and hydrophobicity.” More recently developments in structure-activity relationship (SAR) is attempt to derive structures from the function of certain receptors. Currently, the major advances in this field of research are related to the development of 3-D models coupled with X-ray crystallography to better derive and understand molecules and their reactions. Others in the biomedical field are definitely at work solving similar problems as ours, however our dataset is unique and should prove to be quite interesting to study. We plan on conducting more research into previously determined algorithms in order to guide and enhance our modeling process.

PROCESS: The data will be analyzed using IBM SPSS modeler v14.2. We should start by using a PCA node and a feature select node in SPSS to break down the variables and keep only the most important ones for a further and more detailed analysis. This process will allow us to run the analysis on a single computer without using a server. While breaking the data we will keep each time only the most relevant lines in order to reduce the number of lines in the data without reducing significantly the accuracy of the analysis.

Materials: The best way to analyze the data and make modeler run perfectly would have been to use a server. No one is available for our project so we will use only laptops with Modeler to do it.

The data has been taken from kaggle.com. This website organizes data mining competitions and the dataset we are using is the one provided for the “Predicting a biological response” contest. Several csv files are proposed to the competitors in order to achieve their project. No complementary data are needed to realize this project.

Concerning the time frame of this project: it could vary depending on the length it will take to run the models because the original dataset is really important. The breaking of the data to keep only the most accurate ones should take a few hours. After that we plan on at least half an hour to run each model we want to apply to the reduced dataset.

We will both run different model on the data in order to see which one could bring the better analysis. Maybe a combination of the best could work too.

On this project we didn’t plan on any costs (apart from time spent) because we already have computers and IBM SPSS Modeler.

<span style="background-color: transparent; color: #000000; font-family: 'Times New Roman'; font-size: 16px; text-decoration: none; vertical-align: baseline;">BIBLIOGRAPHY

<span style="background-color: transparent; color: #000000; font-family: 'Times New Roman'; font-size: 16px; text-decoration: none; vertical-align: baseline;">Abraham, Donald J. Burger’s Medicinal Chemistry and Drug Discovery Sixth Edition, Volume 1: Drug Discovery. 2003.

<span style="background-color: transparent; color: #000000; font-family: 'Times New Roman'; font-size: 16px; text-decoration: none; vertical-align: baseline;">Boehringer Ingelheim Annual Report 2010, Pg. 64.

<span style="background-color: transparent; color: #000000; font-family: 'Times New Roman'; font-size: 16px; text-decoration: none; vertical-align: baseline;">http://econsultancy.com/us/blog/9376-boehringer-ingelheim-partners-with-kaggle-to-crowdsource-scientific-problem.

<span style="background-color: transparent; color: #1155cc; font-family: 'Times New Roman'; font-size: 16px; vertical-align: baseline;">[|__www.kaggle.com__]