Patient Condition Analysis and Prediction using Streamlit

Introduction:

In the dynamic realm of healthcare, where the quality of patient care is paramount, technological innovations continue to redefine the landscape. Among these, the Patient Condition Analysis Dashboard and Prediction stands out as a pioneering solution that leverages data analytics and predictive modelling to enhance patient outcomes and streamline medical decision-making processes. In this blog, we’ll explore this transformative tool, its significance, and how it’s reshaping the way healthcare professionals approach patient care.

Unveiling the dataset:

The dataset powering the Patient Condition Analysis Dashboard and Prediction originates from Kaggle. This dataset comprises patient’s test results (Patients’ condition), and the factors influencing it. It has 10000 records and 14 attributes essential for comprehensive patient analysis, including:

  1. Name:  represents the name of the patient associated with the healthcare record.
  2. Age: The age of the patient at the time of admission, expressed in years.
  3. Gender: Indicates the gender of the patient, either “Male” or “Female.”
  4. Blood Type: The patient’s blood type, which can be one of the common blood types (e.g., “A+”, “O-“, etc.).
  5. Medical Condition: It specifies the primary medical condition or diagnosis associated with the patient, (Diabetes, Hypertension, Asthma and more)
  6. Date of Admission: The date on which the patient was admitted to the healthcare facility.
  7. Doctor: The name of the doctor responsible for the patient’s care during their admission.
  8. Hospital: Identifies the healthcare facility or hospital where the patient was admitted.
  9. Insurance Provider: This column indicates the patient’s insurance provider, which can be one of several options, including “Aetna,” “Blue Cross,” “Cigna,” “UnitedHealthcare,” and “Medicare.”
  10. Billing Amount: The amount of money billed for the patient’s healthcare services during their admission. This is expressed as a floating-point number.
  11. Admission Type: Specifies the type of admission, which can be “Emergency,” “Elective,” or “Urgent,” reflecting the circumstances of the admission.
  12. Discharge Date: Date on which the patient was discharged from the healthcare facility, based on the admission date and a random number of days within a realistic range.
  13. Medication: Medication prescribed or administered to the patient during their admission. E.g. (Aspirin, Ibuprofen, Penicillin, Paracetamol, and Lipitor).
  14. Test Results: Describes the results of a medical test conducted during the patient’s admission. Possible values include “Normal,” “Abnormal,” or “Inconclusive,” indicating the outcome of the test.

Deciphering Medical Insights:

Analyzing test results is pivotal in healthcare as it provides critical insights into a patient’s condition, guiding diagnosis, and treatment decisions. By scrutinizing test results, healthcare professionals can identify anomalies, track disease progression, and adjust treatment plans accordingly. Moreover, it facilitates early detection of potential complications, enabling proactive intervention to improve patient outcomes and prevent adverse events.

Using Python and Streamlit to develop the App:

Harnessing the power of Python, we can unravel the intricacies of the dataset and glean actionable insights. Below are key snippets illustrating the analytical process:

Importing required libraries:

Feature Engineering:

Augmenting the dataset with additional variables to facilitate in-depth analysis and model building.

Converting date_of_admission’ column to datetime format and extracts the year from it, storing it in a new column named ‘admit_year. This column helps us to gain insights based on the year.

  • Categorizing the age column into three categories. This is done to group the patients and understand the influence of the age factor in the patient condition.
  • Assigning values to each patient’s name.

Creating Interactive Visuals and KPI

Utilizing streamlit components and Plotly to generate interactive visualizations for a deeper understanding of the dataset

Encoding the features:

Transforming categorical variables into numerical format to prepare the data for modelling.

Removing Unnecessary Variables:

Streamlining the dataset by eliminating redundant or irrelevant variables.

For model building we use only 7 features – age, gender, blood type, medical condition, admission type, medication and BMI

Building Predictive Models:

Harnessing the power of machine learning, we can develop predictive models to anticipate patient outcomes based on their characteristics. Here, we’ll build two robust models – LightGBM and Random Forest – to predict patient conditions. These models utilize historical data to forecast potential medical outcomes, enabling healthcare professionals to make informed decisions and tailor treatment plans accordingly.

Random Forest:

Random forest is a versatile and powerful machine learning algorithm that belongs to the ensemble learning family. It operates by constructing multiple decision trees during training and outputs the mode of the classes (classification) or the mean prediction (regression) of the individual trees. What makes random forest unique is its ability to reduce over fitting and increase accuracy by aggregating predictions from multiple trees, thus providing robustness to noisy data and high-dimensional feature spaces. This algorithm is widely used across various domains due to its effectiveness, scalability, and ease of use.

  1. Importing libraries: Import the necessary libraries into the programming environment.

2. Assigning x and y: Identify your independent variables (features) and dependent variable (target) in your dataset. Assign them to variables X and y respectively.

3. X, Y Train-test split: Split your dataset into training and testing sets. This helps in evaluating the model’s performance on unseen data. Here, test_size =0.20 specifies that 20% of the data will be used for testing, and random_state=42 ensures reproducibility.

4. Fitting the Model – Accuracy and F1 Score:

Train (fit) the Random Forest Classifier model on the training data using the fit () method.

After fitting the model, evaluate its performance on the test data. One common metric is accuracy, which measures the proportion of correctly predicted instances.

With a limited dataset, achieving high accuracy numbers can be challenging. However, by gathering more diverse data and adhering to the methods outlined in the blog, improved accuracy is attainable.

  • Accuracy measures the proportion of correctly classified instances out of the total instances. Here model achieves an accuracy of 52.5%, meaning that it correctly predicts the target variable for approximately 52.5% of the instances in the test set.
  • F1 score is the harmonic mean of precision and recall and provides a balanced assessment of the model’s performance. A higher F1 score indicates better performance in terms of both precision and recall. In this case, the model achieves an F1 score of 0.5268, suggesting a reasonable balance between precision and recall.

Age has the highest feature importance score among all the features. This indicates that age is the most influential factor in predicting the target variable according to the model. BMI (Body Mass Index) is the second most important feature according to the model. Gender has the lowest feature importance score among all the features. This suggests that gender has the least influence on the model’s predictions compared to the other features. And so on every other feature scores indicate their influence in predicting the target value. The higher the score, the higher the influence in prediction.

LightGBM:

LightGBM (Light Gradient Boosting Machine) is a gradient boosting framework that aims to provide a high-performance, distributed, and efficient implementation of gradient boosting algorithms. It is developed by Microsoft and is known for its speed and accuracy. LightGBM uses a novel technique called Gradient-Based One-Side Sampling (GOSS) to filter out the data instances with small gradients during the boosting process, which significantly reduces the training time without sacrificing accuracy. Additionally, it employs Histogram-based algorithms to bucket continuous features into discrete bins, which speeds up the training process and reduces memory usage.

1.Importing libraries: Import the necessary libraries into the programming environment.

2.Assigning x and y: Identify your independent variables (features) and dependent variable (target) in your dataset. Assign them to variables X and y respectively.

3.X, Y Train-test split: Split your dataset into training and testing sets. This helps in evaluating the model’s performance on unseen data. Here, test_size =0.20 specifies that 20% of the data will be used for testing, and random_state=42 ensures reproducibility.

4.Fitting the Model – Accuracy and F1 Score:

Train (fit) the Random Forest Classifier model on the training data using the fit () method.

After fitting the model, evaluate its performance on the test data. One common metric is accuracy, which measures the proportion of correctly predicted instances.

With a limited dataset, achieving high accuracy numbers can be challenging. However, by gathering more diverse data and adhering to the methods outlined in the blog, improved accuracy is attainable.

  • Accuracy measures the proportion of correctly classified instances out of the total instances. Here model achieves an accuracy of 54.2%, meaning that it correctly predicts the target variable for approximately 54.2% of the instances in the test set.
  • F1 score is the harmonic mean of precision and recall and provides a balanced assessment of the model’s performance. A higher F1 score indicates better performance in terms of both precision and recall. In this case, the model achieves an F1 score of 0.5435, suggesting a reasonable balance between precision and recall.

Age has the highest feature importance score among all the features. This indicates that age is the most influential factor in predicting the target variable according to the model. Blood Type is the second most important feature according to the model. BMI has the lowest feature importance score among all the features. This suggests that gender has the least influence on the model’s predictions compared to the other features. And so on every other feature scores indicate their influence in predicting the target value. The higher the score, the higher the influence in prediction.

Prediction using the models:

Using the models built for prediction involves leveraging the power of ensemble learning to make accurate predictions across various domains. Users can input relevant patient information, such as age, gender, medical condition, etc. The built models then predict the potential outcome, such as the severity of the medical condition or the likelihood of a particular test result.

Random Forest:

Light Gradient Boosting Machine:

Batch Prediction:

By leveraging the predictive models, we can predict outcomes for all records in the dataset. This comprehensive analysis provides valuable insights into patient conditions and helps healthcare providers prioritize interventions and allocate resources efficiently.

1)Upload a .csv file of the below format.

2)Choose any model from the “select a model” radio button to predict the outcomes of all the records.

Random Forest:

Light Gradient Boosting Machine:

Conclusion:

The Patient Condition Analysis Dashboard and Prediction system herald a new era in healthcare, empowering practitioners with data-driven insights and predictive capabilities. By amalgamating advanced analytics techniques with comprehensive patient data, this innovative tool enhances diagnostic accuracy, facilitates personalized care, and ultimately improves patient outcomes. As healthcare continues to evolve, leveraging technology to glean actionable insights from vast datasets becomes imperative, ushering in a future where proactive and personalized healthcare is the norm

Visit the Streamlit App: Patient Condition Analysis and Prediction App to explore these features firsthand and gain a deeper understanding. You can find the app code in the mentioned URL : https://github.com/gitalf96/Healthcare

Please feel free to get in touch with us regarding your Streamlit solution needs. Our Streamlit solutions encompass a range of services and tools designed to streamline your data visualization, quick prototyping and dashboard development processes.



Leave a Reply