Project Title: Predict Healthcare Access in the United States using Classification Algorithms



Team Members:



Project Overview:

Motivation:

As per the Constitution of the World Health Organization, “The enjoyment of the highest attainable standard of health is one of the fundamental rights of every human being without distinction of race, religion, political belief, economic or social condition”. Access to healthcare is an important step toward achieving better health that impacts an individual's overall physical, mental, and social health status and quality of life.
According to the Commonwealth Fund, which regularly ranks the health systems of the developed countries, the US is the lowest overall performer though it spends the highest amount on healthcare amongst all of them.

Healthcare system performance compared to spending


For specific indicators of Healthcare quality such as Healthcare access and equity, the US is the poorest performer:

Healthcare system performance compared to spending



Therefore, in this project we have identified the determinanats and explore the inquality of Healthcare access in the US using classificaation techniques such as Logistic Regression, Random Forest Tree and Gradient Boosted Trees algorithm based on independent variables such as Gender, Residing State, Annual Household Income, Race, Education Attainment, Marital Status, Household Ownership, Veteran Status, Employment Status.

This project focuses on following components of access to health care (Dependent variables):

Aims:



Data set/s Descripiton:

Our group has primarily used the 2018 Behavioral Risk Factor Surveillance System (BRFSS) data set from the Centers for Disease Control and Prevention (CDC) to classify the healthcare access in the USA. The aforementioned data set has data on 437,436 individuals with 276 features. The BRFSS is a cross-sectional telephone survey that is administered annually by state health departments to assess health-related behaviors, medical conditions, and preventive service use among adult residents in all 50 states, Puerto Rico, Guam, the District of Columbia, and the U.S. Virgin Islands.
Additionally, we have also used the data on mean unemployment rates of all the states in the USA during calendar year 2018. The data on unemployment was downloaded from the Bureau of Labor Statistics of the United States Department of Labor. The data on mean unemployment rates was used to predict the mean health insurance rates by residing states in the USA.



Methodology:

We used data from 2018's Behavioral Risk Factor Surveillance System Survey.


Results:

Top 10 States with Highest Medical Insurance:

Top 10 States by Health Insurance Rate

The mean rate of Health insurance rate during 2018 was 91.44%.
The District of Columbia ranked the first in the US with 95.88% rate of health insurance.

Lowest 10 States by Health Insurance Rates:

Lowest 10 States by Health Insurance Rate

Guam ranked the worst in the US with 83.15% rate of health insurance. Florida ranked the fifth last in the US with 86.05% rate of health insurace.

Model Accuracy Table for each Healthcare component (Dependent Variables):

Model Accuracy Table for each Healthcare component (Dependent Variables)

Accuracy of classifier/model refers to the ability to predict the class labels correctly.
For all the Healthcare components, accuracies for Logistic Regression and Gradient Boosted tree algorithms were better in compared to Random Forest Trees. However, Gradient Boosted Tree algorithm was chosen as the best algorithm for final model building as per the recommendions in reviewed literature. All the Gradient Boosted models correctly classfied the class labels with an accuracy of more than 80% during final model building.

References:

  1. E.C.Schneider, D.O. Sarnak, D. Squires, A. Shah, M.M. Doty, Mirror, Mirror 2017: International Comparison Reflects Flaws and Opportunities for Better U.S. Health Care, The Commonwealth Fund, July 2017
  2. CDC 2018 BRFSS Data
  3. Unemployment Data, Buereau of Labor Statistics, the US Department of Labor