Group+1

**Abstract ** Salem Hospital, located in Salem, OR, began experiencing problems with specific patient groups not being profitable for the hospital. Our class objective was to classify the top 1% of least profitable patients for the hotel using the knowledge and resources we received from our data-mining course. During our research, we found that similar research had been conducted in similar cases, although we were the first to venture into this research at Salem Hospital. We chose to use two classification models and one clustering model, as they provided our team with the most valid information. The results included the important predictors (payer group and length of stay), as well as the specific clusters which generate the largest amount of least profitable patients, in order to determine which variables would affect this least profitable group the most. In comparison to similar studies, our group feels that our results were similar to those found in previous research. In the future, we suggest that the hospital looks into its economic data as well, to determine how these factors might also be affecting its profits. This paper will show the research we completed for this project, the models we chose, an analysis of our data, followed by our conclusions and recommendations for Salem Hospital. **Introduction **
 * Jessica Barry, Mohammad Rahman, Sabrina Sourjah, Jianwu "Woody" Weng **

Salem Hospital, located in Salem, OR, is a non-profit medical center, and is the only hospital in Oregon’s state capitol. The hospital started experiencing problems with patient groups being too expensive for the hospital. These patient groups are based on different variables, such as type of service they receive, type of insurance they have, and length of stay. Our goal as students was to create a model that would help to identify who the least profitable patients were out of these patient groups, specifically identifying the top 1% of least profitable patients. We entered the project in hopes of finding models to classify different patient groups, as well as finding ways to predict direct income. This was an important objective for the Salem Hospital, as they were losing too much money on this 1% group of least profitable patients, and needed to create a way to classify these patients in order to save money for the hospital. To figure out 1% cost loss of Salem Hospital, the first thing we looked at is what the overall cost of a public hospital is. Identifying which group of patients are most costly will give Salem Hospital a chance to weigh the money spending on operation. Whether a hospital is cost-effective or not will measure the success of the hospital’s operation management; in other words, if the money is used in the right track to produce optimum results for the expenditure. This project relates to our data mining course in that we were able to apply the models, knowledge, and tools from the class to a real business scenario that was of much larger scale than the data used previously. By applying this knowledge to a real life business problem, our team was able to select the necessary models, and create results and educated recommendations to provide to Salem Hospital to help them classify patient group of least profitable patients. We chose to run one cluster model and two classification models, as we felt that these gave our team the best results that we could then pass on to the Salem Hospital. **Literature Review ** No literature that directly addressed the issue of unprofitable patients in the Salem area could be found. However, we browsed through literature that was connected to other related subject matters: New Jersey hospitals, the US healthcare system, non-elderly population and profitability by payor groups. The New Jersey hospital study titled ‘Factors Affecting the Economics and Performance of New Jersey Hospitals’ summarizes the factors, which affect the economics of these hospitals. The main factors mentioned are adequate reimbursement by public payers, alignment of the hospital-physician relationship to improve efficiency and quality, transparency of performance data for physicians & hospitals, smart regulation that is evenly applied and minimizes perverse incentives, effective and accountable hospital governance and management, and an adequate ambulatory safety net that ensures people get the right care, in the right place, at the right time minimizing the inefficient use of hospital resources. The research paper titled ‘Family Level Expenditures on Healthcare and Insurance Premiums Among the US Nonelderly Population’ examines the effect of health care cost from a non-elderly patient’s point of view. According to this paper, “total expenditures on health care services were highest among families with public coverage and lowest among uninsured families. Mean total expenditures were $8,831 among families with public insurance, $6,785 among families with private insurance, and $1,425 among uninsured families.” Additionally, “Out-of-pocket expenditures on health care services among families with private coverage ($1,410) were significantly higher compared to out-of-pocket expenditures among families with public insurance ($643) and the uninsured ($663).” The next article, ‘New Evidence on Hospital Profitability by Payer Group and the Effects of Payer Generosity’ evaluates health care expenditures, specifically from a payer group perspective. Patients in U.S. hospitals have third-party payers that can be classified broadly into four payer groups- Medicare, Medicaid, private insurance, or uninsured. The research found that the profitability of inpatient care for privately insured patients was about 4% less than for Medicare, 14% higher than for Medicaid and only 9% higher than for self-pay patients. The study also finds that Medicare and privately insured groups are more profitable for inpatient care in the considered sample. These two groups are specifically noted to be more profitable than Medicaid patients. **Data Description ** Salem Hospital provided four master data files i.e. patients, drugs, encounters and patient diagnoses. These files contained the following variables: Emergency Admit - Emergency department admission /Admitted or not Payer group – Payers for the treatment categorized into groups Gender - Gender of the patient <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Ethnicity - Ethnicity of the patient <span style="font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5;">Encounter count – Number of encounters per patient <span style="font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5;">Diagnose count – Number of diagnoses per patient <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Principal Procedure Count – Number of principal procedures conducted on the patient <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Primary Procedure Count - Number of primary procedures conducted on the patient <span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Length of Stay - Indicates how long a patient stays

**<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Modeling ** //<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">The objective was to come up with a model that can identify the least profitable patients for Salem Hospital. To accomplish this, we analyzed the patient data with different models and compared the results to make fruitful recommendations. // **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Cluster analysis-with K-Means **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">: First of all, we treated our task to identify the least profitable patients as a marketing segmentation undertaking, where one of our segments will have the least profitable patients, and so we used both two-step and K-Means models as a means of cluster analysis. <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">As part of our first analysis, we put all the patients into a percentile ranking based on the direct income they generate for the hospital, and considered all the 1-percentile patients as the least profitable patients. We excluded demographic variables in running our first K-Means model. Demographic variables will be used later for validation purposes. <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">The variables used were: <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">· Encounter Count <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">· Diagnoses count <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">· Principal Procedures Count <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">· Primary Procedures Count <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">· Emergency Admit <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">· Payer Group <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">· Lengths of stay

<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">The demographic variables excluded were:· Gender <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">· Ethnicity

<span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">The two-step model produced only 2 clusters – one cluster includes 93 percent of least profitable patients (i.e. 1 percentile patients) and the other cluster is predominantly (98%) more profitable patients (i.e. patients who are in all the percentiles except 1). Though this is a good cluster separation, the cluster structure reliability is low with silhouette value of only 0.3. When K-Means model was run with 2 clusters, cluster structure reliability improved to 0.4, but the patients included in both the clusters are predominantly least profitable patients. So this cluster does not help in differentiating least profitable patients from more profitable patients.

<span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">For our second analysis, we did not rank patients in any categories. We excluded demographic data again and then ran two-step model, which produced 5 clusters. Then we ran K-Means model with 5 clusters. Both the models suggest that least profitable patients are those “who do not have emergency admission into the hospital”. Moreover, the Two-step model suggests that payer group 2 will reimburse least profitable patients’ costs, whereas K-Means model suggests payer group 4 for the same purpose. These insights are important, as both the models’ cluster are reliable silhouette value of 0.6.

**<span style="font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5;">Snapshot: Two-Step Model ** **<span style="font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5;">Snapshot: K-Means Model ** <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">To further validate our cluster structure, we ran both two-step and K-Means models again with demographic variables. As per both the models, the cluster with the least profitable patients again had mostly payer group 4 reimbursing the costs and are not expected to have emergency admissions into hospital. Demographically, the same cluster also is mostly gender 1 concentrated. None of the models shed any light in terms of ethnicity.

**<span style="font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5;">Snapshot: K-Means Model including demographic variables ** <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; line-height: 1.5; text-align: justify;">Classification-using Logistics Analysis: <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">Here, we planned to classify patients based on their direct income using a Logistics model and finding out the important variables in this classification. As per the logistics model, the most important variables are Principal Procedures Count, Payer Group, Primary Procedures Count. **Snapshot: Logistic Analysis Predictor Importance**

<span style="display: block; font-family: 'Times New Roman',serif; font-size: 16px; text-align: justify;">This model performed with an accuracy of 83 percent. **Snapshot:** ** Accuracy of **** Logistics Analysis ** **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Classification-using C&R Tree: **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;"> Here, we put all the patients into a percentile ranking again based on the direct income they generate for the hospital, and considered all the 1 percentile patients as the least profitable patients. We grouped patients in the 1 percentile as class “1” and all other patients as the class “0”. Our interest group is the patients within the class “1”. The most important predictors from C&R Tree are payer group and principal procedure count. The tree also shows that when principal procedure count is within the range from 2-9 and payer group is 2, then 94 percent of the patients are classified as 1 (members of least profitable). **Snapshot: C & R Tree**

**Snapshot: C & R Tree Predictor** **Importance**

<span style="font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">This model performed with more than 80 percent of accuracy (for both the classes).
**Snapshot: Accuracy of C & R Tree** **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Other models tested ** **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Prediction: **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Two models were used for this purpose - the auto numeric node and the regression node. For the auto numeric node, the resulting model was a combination of Neural Net, CHAID and C&R Tree. Payor group and Length of stay were important predictor variables. The model resulted in a mean error of -42.63. For the regression model also, the payor group was the most important predictor while the length of stay was the second-most important variable. The mean error for this model is 0. The findings make practical sense, because generally there is a relationship between the income and the length of stay. In addition, payor group (i.e. the type of group which reimburses the fee, e.g. Medicaid, Medicare and private insurance) has high predictable power. **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Neural Network: **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;"> The neural network was used to predict the class that each encounter fell into. Binning the Direct Income variable into 6 bins of equal width preprocessed data. The multilayer option was chosen with the number of training cycles set to 250. 98% of the records were categorized into the correct bins. The most important predictors of the model were length of stay, payor group, benefit plan, and sub-service line. This makes practical sense because when the length of stay is longer, the patient ends up paying more. Also, the payor group and benefit plan will denote how the payments will be made to the hospital, and sub-service line is connected to the type of service performed, which will impact the cost of the treatment. **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Discussion ** <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">Our research shows that the payor groups are actually the entities that reimburse health care costs to hospital, and hospitals usually classify them into four groups: Medicare, Medicaid, private insurers and self-payment. Salem Hospital also has 4 payor groups. As per our model, the least profitable patients have either payer group 2 or payer group 4. Our literature review also shows that relationships exist between profits from inpatient care and the payor group for the respective care, so the findings from the model carry useful insights. From this finding, we can recommend to Salem Hospital that they improve their mix of different types of patients in terms of who are covered by what types of payer groups. <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">Our model also shows that principal procedures count and primary procedures count are also important in classifying patients. From our research on principal and primary procedures, we found that primary procedures are those, which are conducted first on the patient before the hospital knows if the patient needs to be admitted, or not. On the other hand, principal procedures are those that determine if the patient needs to be admitted. When a patient gets admitted, the hospital starts earning from that patient. So the above models’ identification of principal procedure as one of the most important predictors gives some realistic information. <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">The regression model also indicates that the least profitable patients are gender 1 concentrated. However, we could not find any research or reasons to validate these findings. **<span style="font-family: 'Times New Roman',serif; font-size: 12pt;">Directions for future work ** <span style="display: block; font-family: 'Times New Roman',serif; font-size: 12pt; text-align: justify;">Looking at future steps for this data, it was seen in related research literature that there was a notable relationship between health care expenditure, payer group and insurance coverage. This will be an appropriate working theory for the next class to investigate, in order to diagnose whether the same relationships exist in the case of Salem Hospital. We also suggest that the hospital look into its economic data as well, to determine if this information can help them to increase their products.

http://en.wikipedia.org/wiki/Salem_Hospital_(Oregon) <span style="font-family: Cambria,serif; font-size: 12pt;"> <span style="font-family: 'Times New Roman',serif; font-size: 10pt;">"Factors Affecting the Economics and Performance of New Jersey Hospitals." //Final Report// (2008): 83-105. Web. 2 May 2013. <span style="font-family: 'Times New Roman',serif; font-size: 10pt;"> Bernard D., Banthin J. //Family Level Expenditures on Health Care and Insurance Premiums among the Nonelderly Population, 2006.// Research Findings No. 29. March 2009. Agency for Healthcare Research and Quality, Rockville, MD. http://www.meps.ahrq.gov/mepsweb/data_files/publications/rf29/rf29.pdf <span style="font-family: Cambria,serif; font-size: 12pt;"> <span style="font-family: 'Times New Roman',serif; font-size: 10pt;">Friedman, Bernard, Neeraj Sood, Kelly Engstrom, and Diane McKenzie. "New Evidence on Hospital Profitability by Payer Group and the Effects of Payer Generosity." //International Journal of Health Care Finance and Economics// 4.3 (2004): 231-46. Print. <span style="color: #1155cc; font-family: 'Times New Roman',serif; font-size: 10pt;">[] <span style="color: #1155cc; font-family: 'Times New Roman',serif; font-size: 10pt;">[|http://www.deschutes.org/Health-Services/Behavioral-Health/Developmental-Disabilities-Program/Adult-Foster-Homecare-in-Oregon-(3)/Physician-visit1.aspx] <span style="font-family: 'Times New Roman',serif; font-size: 10pt;"> <span style="color: #1155cc; font-family: 'Times New Roman',serif; font-size: 10pt;">[]