Cardiotocography+-+Predicting+Fetal+Heart+Rate

Cardiotocography - Predicting Fetal #|heart rate by Venkat Vidyasagar, Mubashir laskar, Vivek Balasubramanian



**__Introduction__ **

Even though, the technology has developed to a great level in today’s world, constant research and development #|activitiesare going on in #|healthcare Industry which will #|continueto innovate and grow in various uncharted areas through ever evolving R&D #|activities in the years to come. And in the centerpiece of all these healthcare innovations is the analytic framework for the collection and analysis of all pertinent data in the healthcare industry, especially for fields like biotechnology, chiropractic, medicine and #|nursing. The research #|activities pertaining to healthcare industry must ensure safety and reliability as its top most priority. The researcher or scientists make sure that #|the model developed or used in healthcare industry must have the highest level of accuracy of predicting the outcome as these models would involve in predicting the highly sensitive attributes related to the human life, which is priceless, and a perfect diagnosis has to be provided based on the outcome. In the #|Data mining project, we as a team of three decided to step into this challenging endeavor in sensitive healthcare industry to come up with a new statistical #|model on Cardiotocography. Cardiotocography is a means of recording the fetal heartbeat (cardio-) and the Uterine contractions (-toco) #|during pregnancy period. Currently the Cardiotocography readings are printed on paper and or/stored in a computer for later reference.

**__Purpose of this model__ **

We found that there is a considerable amount of time spent by physicians and specialists on each and every Cardiotocography report analysis. To improve the efficiency on these report analysis and to give proper diagnosis to a patient for those identified symptoms, we have developed a new statistical model, that would take 22 different predictors as an input variable (i.e., dependent variable) from the Cardiotocography reading to predict the classification of fetal heart rate which is an output variable (i.e., target variable). This model classifies the target variable into three different #|classesN-Normal, S-Suspect and P-Pathologic. The treatment provided to the patients would are based on the classification of Fetal Heart rate.

**__Characteristics of a data set__ **

The dataset consists of measurements of fetal heart rate (FHR) and uterine contraction (UC), the important features of Cardiotocograms classified by expert obstetricians.
 * **Data Set Characteristics ** || Multivariate ||
 * **Attribute Characteristics ** || Real ||
 * **Associated tasks ** || Classification ||
 * **Number of Instances ** || 2126 ||
 * **Number of Attributes ** || 23 ||

2126 fetal Cardiotocograms (CTGs) were automatically processed and the respective diagnostic features measured. The CTGs were also classified by expert obstetricians and a consensus classification label has been assigned to each of them. The classification is with respect to a fetal heart rate class code (N-Normal, S-Suspect and P-Pathologic). Therefore this dataset can be used for 3-class experiment.

**__Sample Image of Cardiotocogram__ **



<span style="font-family: Arial,Helvetica,sans-serif; font-size: 14pt; line-height: normal; margin-bottom: 3pt;">**__Attributes Information__**

The CTG tracing requires both qualitative and quantitative description of <span style="font-family: 'Times New Roman',serif; margin-bottom: 3pt;">** Ø <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Baseline fetal heart rate ** Several factors are used in accessing the uterine activity · **//Frequency//****-** Number of contraction in a standard interval. · **//Duration//**- the amount of time from the start of a contraction to the end of the same contraction. · **//Intensity//**- a measure of how strong a contraction is. · **//Resting Tone//**- a measure of how relaxed the uterus is between contractions. · **//Interval//**- the amount of time between the end of one contraction to the beginning of the next contraction

Ø **Baseline fetal heart rate** The baseline FHR is determined by approximating the mean FHR rounded to increments of five beats per minute during a 10 minute window, excluding accelerations and decelerations and periods of marked FHR variability **//Abnormal conditions//** · Baseline FHR less than 110 beats per minute and symptoms are termed as **//bradycardia.//** · Baseline FHR greater than 160 beats per minute and symptoms are termed as **//tachycardia.//**

Ø **Baseline FHR variability** Baseline FHR variability is determined in a 10- minute window, excluding accelerations and decelerations. Baseline FHR variability is defined as fluctuations in the baseline FHR that are irregular in amplitude and frequency. The fluctuations are visually quantified as the amplitude of the peak- to-trough in bpm. The different characteristics are · **//Absent//** –Undetectable · **//Minimal//** – greater than undetectable, but less than or equal to 5 bpm. · **//Moderate//** – 6bpm to 25 bpm · **//Marked//** – greater than 25 bpm. From the above characteristics, the abnormal conditions are absent, minimal and marked

Ø **Presence of accelerations** Visually apparent abrupt increase in FHR. An abrupt increase is an increase from an onset of acceleration to the peak in less than or equal to 30 seconds. **//Note: -//** To be considered as acceleration, the peak must be greater than or equal to 15 bpm.

Ø **Periodic or episodic decelerations** Periodic refers to decelerations that are associated with contractions; episodic refers to those not associated with contractions Four types of decelerations · Early deceleration · Late deceleration · Variable deceleration · Prolonged deceleration

The attributes measured on each and every classification mentioned above are: **//Uterine activity//** <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">1. UC – number of Uterine contractions per second (frequency) **//Baseline fetal heart rate (FHR)//** <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">1. LB – Fetal Heart rate baseline (beats per minute) <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">2. FM – number of fetal movements per second **//Baseline FHR variability//** <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">1. ASTV – Percentage of time with abnormal short term variability <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">2. MSTV – Mean value of Short term variability <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">3. ALTV – Percentage of time with abnormal long term variability <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">4. MLTV – Mean value of long term variability **//Presence of accelerations//** <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">1. AC – number of accelerations per second (frequency) **//Periodic or episodic decelerations//** <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">1. DL – Number of light decelerations per second (frequency) <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">2. DS – Number of severe decelerations per second (frequency) <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">3. DP – Number of prolonged decelerations per second (frequency) **//Characteristics of histogram//** <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">1. Width – Width of FHR histogram <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">2. Min - minimum of FHR histogram <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">3. Max - Maximum of FHR histogram <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">4. Nmax - Number of histogram peaks <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">5. Nzeros - Number of histogram zeros <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">6. Mode - histogram mode <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">7. Mean - histogram mean <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">8. Median - histogram median <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">9. Variance - histogram variance <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">10. Tendency - histogram tendency **//Output variable or target variable//** <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">1. NSP – fetal Heart rate class code (N= normal, S=Suspect, P=Pathologic)

**__Fetal Heart rate pattern classification__**

**Normal -** Tracings with all these findings present are strongly predictive of normal fetal acid-base status at the time of observation: · Baseline rate 110-160 bpm, · Moderate variability, · Absence of late, or variable decelerations, · Early decelerations and accelerations may or may not be present.

**Abnormal -** Tracing is predictive of abnormal fetal acid-base status at the time of observation; this requires prompt evaluation and management: · Absence of baseline variability with recurrent late or variable decelerations or bradycardia; or  · Sinusoidal fetal heart rate.

<span style="font-family: Arial,Helvetica,sans-serif; font-size: 150%; line-height: normal; margin-bottom: 3pt; text-indent: -0.25in;">__**Hypothesis:**__

<span style="font-family: 'Times New Roman',serif; font-size: 12pt; line-height: normal; margin-bottom: 3pt; text-indent: -0.25in;">The team has hypothesized that the fetal heart rate depends on Cardiotocography which is a measure of several other attributes. As there are three possible outcomes which states fetal heart rate, we developed classification model and used regression to understand the relative importance of various factors. We set our accuracy level very high 95% to make a reliable model as it deals with medical issues.

__**Data Procedure:**__
 * · **//Source//**: Center for Machine Learning and Intelligent System, University Of California, Irvine
 * · Remove all other classifications which are not relevant fetal heart rate.
 * · Performing the Data Audit to view the quality of data.
 * · Removal of Outliers and Extremes. No missing data present.
 * · Normalization of data.
 * · Convert all categorical variables into dummies (Tendency).
 * · Applying regression analysis to data to find the predictor significance.
 * · Partitioning of data into training (60%) and validation (40%).
 * · Use Feature Select or PCA to data before applying the models.
 * · Boosting the data to bring evenly distributed data quantities.
 * · Apply the classification models like K-NN, Logistic Regression, Neural Network, Discriminant Analysis and Classification Trees.
 * · Analysis of the above models using accuracy, classification matrix, gain chart, RMSE.
 * · Select the best model based on the above analysis and submit report based on our insights.

__**Graphical representation**__

__**Data Interpretation Plan**__

<span style="font-family: 'Times New Roman',serif; font-size: 12pt; font-weight: normal; line-height: 115%;">When we first imagined about what kind of data set would help us build such type of model that could predict the condition of a baby and then decide upon whether or not any subsequent treatment needed, we researched through online resources rigorously and finally we landed on a data set with 22 predictors that, we thought, was quite exhaustive and somewhat exclusive in nature. Initially we decided to spend significant amount of time on understanding the data as it deals with very technical aspects of medical field. We researched about all the medical terms used and how they impact each other to have a common sense of the preliminary data that we should be dealing with. The raw data involved many other classifications and variables which we found unrelated to the purpose or goal that we set for this project, which is classifying fetal heart rate to determine three primary states. So we have taken out all <span style="font-family: 'Times New Roman',serif; font-size: 16px; line-height: normal;">redundancy and extraneous data as we move along the project. We checked through various regression analyses to determine significance level and inter-relationships of various predictors. After having statistical understanding of the data set we stepped into data preparation as mentioned above in the ‘data procedure part’ and the ‘graphical representation’. As we discussed through the data and purpose of this model, we envisioned that Neural Network model would be the best way to get final classification of the fetal heart rate. However, we also decided to perform other methods like, K-NN, Logistic Regression, Discriminant Analysis and Classification trees and also ignored rest of the modeling methods that we had foreseen not so useful in our purpose. As we remained receptive of any significance outcome, we decided to analyze each model’s performance and accuracy level to come up with the best option. We also focused on various insights that any model could help us understand throughout this process.


 * <span style="font-family: 'Times New Roman',serif; font-size: 16px; line-height: normal;">Model Analysis & Interpretation: **


 * <span style="font-family: 'Times New Roman',serif; font-size: 16px; line-height: normal;">SPSS stream of total five models includes K-NN, Cluster, Classification tree, Neural network and Logistic regression **

<span style="font-family: 'Times New Roman',serif; font-size: 16px; line-height: normal;">


 * <span style="font-family: 'Times New Roman',serif; font-size: 16px; line-height: normal;">1. Decile Charts of various Models: **


 * Insights**: We can observe that the performance of various models are very good in the Training set but not so good in the Validation set by looking at the decile chart. But if we look closely, we see the Classification model does best in the validation set as the gain chart shows the green line at the top of all curves in the Decile Validation graph.


 * 2. K-NN Predictor Space:**
 * Insights:** This graph shows us the distribution of our target variable NSP in a predictor lower dimensional space of K-NN model and as you can see it has represented the distribution of Normal, Suspect and Pathologic child data in a three dimensional space against Uterine contractions (UC), Accelerations (AC) and fetal movement (FM).

**3. Cluster formation** from the cluster 2 formation, it shows little variation from central tendancy and variability which tends to be classified as Normal.
 * Insights -** From the cluster 1 formation, it shows that the central tendancy and variability shows greater deviation from the mean which tends to be classified as Suspect and

**4. Scatter plot of Cluster**

Insights - From this graph we can see the two clusters are reasonably seperated and we can draw a line to seperate those clusters with minimal misclassification rate.


 * 5. Confusion matrix for Classification tree model**
 * Insights** - We can see the misclassification rate of classifying Suspect or Pathologic child as Normal is very less which shows the strength of classification tree model and which would make sure that the suspect or pathologic child gets attention immediately and get necessary treatment.

**6. Confusion matrix for K-NN model**
 * Insights** - We can see the misclassification rate of classifying Suspect or Pathologic child as Normal is very less which shows the strength of classification tree model and which would make sure that the suspect or pathologic child gets attention immediately and get necessary treatment.


 * 7. Analysis on Various model**


 * Insights -** From the above comparison matrix, we can see the classification tree model has a low misclassification rate of nearly 8.75% when compared to the results of other models like K-NN, neural network and logistic regression.


 * 8. Classification tree**

The above sample shows the part of classification tree of C5 model

<span style="font-family: 'Times New Roman',serif; font-size: 16px; line-height: normal;">After doing the analysis, we found the classification tree is the best model, considering the performance in the validation data set, to decide fetal heart rate. The most significant variables are ASTV, MLTV, ASE, DS, DL & DM for predicting th fetal heart rate. However, considering the classification matrix error rate, K-NN method appears better model as misclassification of Suspect and Pathologic, the most critical component of our classification, is minimal in this case. However, that is only for the training data set. We know validation data's performance is far more critical when it comes to choosing a model for deployment. If we look at the classification tree we understand one compelling fact; 'percentage of time with abnormal short term variability' determines the fundamental distinction between the heart rate of a child, whether it is normal or suspect. Hence, in the 10 mins observation, if a mother shows more than 78% of the time, i.e. 7.8 mins, abnormal short term variability, the child would be pathologic. However, if a mother shows less than 7.8 mins of abnormal short term variability, there is also considerable liklihood of child being suspect and even pathologic. In this case, if the number of 'light decelerations' per second is less than 4 and 'minimum of FHR histogram' is greater than 68, then the child would be pathologic. Let's focus what could make a child Suspect if not Pathologic always. With abnormal short term variability of less than 7.8 mins, a child could be under suspicion or just should not be termed as Normal. Here, if abnormal long term variability is greater than 3.7 mins and variance of histogram is greater than 5, then the child would be Suspect and mother should be kept under constant watch to monitor the situation.
 * <span style="font-family: 'Times New Roman',serif; font-size: 16px; line-height: normal;">Conclusion: **