{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "authorship_tag": "ABX9TyP7oVBlkEgg7lPXKoi30T9g", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "source": [ "# Exploring EMR Data" ], "metadata": { "id": "aEG5RcohovvE" } }, { "cell_type": "markdown", "source": [ "## Getting Set up\n", "\n", "Running the code below will create a local postgres 13 database for you and\n", "import the sample data. Don't worry about the details of that and you shouldn't need to touch any of the code here aside from running it. Below, we'll talk about how to access the database from within the notebook to run queries." ], "metadata": { "id": "R4ecS26joyRL" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1V6MDIvRq9FI" }, "outputs": [], "source": [ "# Install and start postgresql-13 server\n", "!sudo apt-get -y -qq update\n", "!wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -\n", "!echo \"deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main\" |sudo tee /etc/apt/sources.list.d/pgdg.list\n", "!sudo apt-get -y -qq update\n", "!sudo apt-get -y -qq install postgresql-13 postgresql-client-13\n", "!sudo service postgresql start\n", "\n", "# Setup a password `postgres` for username `postgres`\n", "!sudo -u postgres psql -U postgres -c \"ALTER USER postgres PASSWORD 'postgres';\"\n", "!sudo -u postgres psql -U postgres -c \"CREATE USER dssg_admin PASSWORD 'dssg_admin';\"\n", "!sudo -u postgres psql -U postgres -c \"ALTER USER dssg_admin WITH SUPERUSER;\"\n", "!sudo -u postgres psql -U postgres -c \"DROP DATABASE IF EXISTS mimic;\"\n", "!sudo -u postgres psql -U postgres -c \"create database mimic owner dssg_admin;\"\n", "\n", "# Setup a database with name `mimic` to be used\n", "\n", "# Environment variables for connecting to the database\n", "%env DEMO_DATABASE_NAME=mimic\n", "%env DEMO_DATABASE_HOST=localhost\n", "%env DEMO_DATABASE_PORT=5432\n", "%env DEMO_DATABASE_USER=dssg_admin\n", "%env DEMO_DATABASE_PASS=dssg_admin" ] }, { "cell_type": "markdown", "source": [], "metadata": { "id": "0sVHQgxLo_Hq" } }, { "cell_type": "markdown", "source": [ "## Download sample data and load it into our postgresql database server\n", "You might see some warnings or errors but ignore them for now" ], "metadata": { "id": "_INuBYNYpM6Q" } }, { "cell_type": "code", "source": [ "# Download sample data and load it into our postgres server\n", "!curl -s -OL https://dsapp-public-data-migrated.s3.us-west-2.amazonaws.com/mimicsample.dump\n", "!PGPASSWORD=$DEMO_DATABASE_PASS pg_restore -h $DEMO_DATABASE_HOST -p $DEMO_DATABASE_PORT -d $DEMO_DATABASE_NAME -U $DEMO_DATABASE_USER -O -j 8 mimicsample.dump" ], "metadata": { "id": "tGAJf9I3rKgV" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Install triage and its dependencies\n", "!pip install triage\n" ], "metadata": { "id": "4v4FcUZD3qpU" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [], "metadata": { "id": "FDxz75GNOGjD" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Want other packages?\n", "If you need to install any other python modules for your analysis, you can easily do so from a code block by prefixing your pip install command with an ! character. For instance:\n", "\n", "!pip install PyYAML" ], "metadata": { "id": "LKVfB0W7OC_h" } }, { "cell_type": "markdown", "source": [ "# **You might need to run the cell below twice if you get an error the first time**\n", "\n", "---\n", "\n" ], "metadata": { "id": "zixLrI_-s8fp" } }, { "cell_type": "code", "source": [ "from sqlalchemy.engine.url import URL\n", "from sqlalchemy import create_engine\n", "import pandas as pd\n", "pd.set_option('display.max_columns', None)\n", "\n", "db_url = URL(\n", " 'postgres',\n", " host='localhost',\n", " username='dssg_admin',\n", " database='mimic',\n", " password='dssg_admin',\n", " port=5432,\n", " )\n", "\n", "db_engine = create_engine(db_url)\n" ], "metadata": { "id": "F-XTXuNE3jSP" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Querying the Database\n", "\n", "The code block above used the sqlalchemy module to create a connection to the database called db_engine. An easy way to run SQL queries against this database is to use the read_sql command provided by pandas. For instance, if you run the example below, it should return the rows fom the **patients** table:" ], "metadata": { "id": "sT5Kk9BQNypD" } }, { "cell_type": "markdown", "source": [ "## Let's look at the patients table" ], "metadata": { "id": "-JcbBy1jt3pW" } }, { "cell_type": "code", "source": [ "pd.read_sql('SELECT * FROM mimiciii.patients', db_engine)" ], "metadata": { "id": "9OmcvuvQ4XJc" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Now we can do the same for other tables in the data\n", " - admissions\n", " - icustays\n", " - ...\n", "" ], "metadata": { "id": "pc2S2PFXt7AI" } }, { "cell_type": "markdown", "source": [ "## We can load a table into a dataframe in python" ], "metadata": { "id": "0A_q5Tf3uOur" } }, { "cell_type": "code", "source": [ "patients = pd.read_sql('SELECT * FROM mimiciii.patients', db_engine)\n", "patients.head()" ], "metadata": { "id": "z5Xx2to3trD0" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# prompt: code to join database tables\n", "\n", "import pandas as pd\n", "# Join patients and admissions tables on the subject_id column\n", "joined_table = pd.read_sql('''\n", "SELECT *\n", "FROM mimiciii.patients AS p\n", "INNER JOIN mimiciii.admissions AS a\n", "ON p.subject_id = a.subject_id\n", "''', db_engine)\n", "\n", "# Print the first few rows of the joined table\n", "joined_table.head()\n" ], "metadata": { "id": "q9fsaDumulhP" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# prompt: plot frequency distribution of admission_type\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "# Create a bar chart of admission type frequencies\n", "plt.bar(joined_table['admission_type'].value_counts().index, joined_table['admission_type'].value_counts().values)\n", "plt.xlabel('Admission Type')\n", "plt.ylabel('Frequency')\n", "plt.title('Frequency Distribution of Admission Type')\n", "plt.show()\n" ], "metadata": { "id": "iPr02lMevAlC" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# prompt: color by expire_flag\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "# Group the data by admission type and expire_flag\n", "grouped_data = joined_table.groupby(['admission_type', 'expire_flag']).size().unstack()\n", "\n", "# Create a bar chart with different colors for each expire_flag\n", "plt.bar(grouped_data.index, grouped_data[0], label='Not Expired', color='blue')\n", "plt.bar(grouped_data.index, grouped_data[1], bottom=grouped_data[0], label='Expired', color='red')\n", "\n", "plt.xlabel('Admission Type')\n", "plt.ylabel('Frequency')\n", "plt.title('Frequency Distribution of Admission Type by Expire Flag')\n", "plt.legend()\n", "plt.show()\n" ], "metadata": { "id": "cSKbAOofvQyn" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Homework Assignment\n", "\n", "---\n", "\n" ], "metadata": { "id": "ggqBIwKJvoP4" } }, { "cell_type": "markdown", "source": [ "# **Question 1: Diagnoses**\n", "\n", "1A. What are the 10 most common diagnoses (names of the diagnosis - not just the code) in this sample data?\n", "Hint: you'll need to look at the **icd_diagnoses** table to get diagnosis codes and the **d_icd_diagnoses** table to get the english descriptions." ], "metadata": { "id": "aXKx5Acbw4wN" } }, { "cell_type": "code", "source": [], "metadata": { "id": "uewvEsj2ytPL" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "1B. Show top 10 most common diagnoses for all genders?" ], "metadata": { "id": "_6m8Fbrky-ht" } }, { "cell_type": "code", "source": [], "metadata": { "id": "GdFyxqTpzG1R" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "1C. What are the 10 most common diagnoses for those patients that have died versus those that have not? Does the answer make medical sense? why or why not?" ], "metadata": { "id": "MtO9qB6czM7Q" } }, { "cell_type": "markdown", "source": [], "metadata": { "id": "ZUme5TThytrk" } }, { "cell_type": "markdown", "source": [ "**1D (Optional Advanced)**: How does the distribution of diagnoses vary across different age groups? Identify the most common diagnoses by age group and discuss any notable trends." ], "metadata": { "id": "mb9nI3zrvsx4" } }, { "cell_type": "code", "source": [], "metadata": { "id": "J7ykKl91vrMG" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Question 2: Length of Stay" ], "metadata": { "id": "yXShVOK30jlJ" } }, { "cell_type": "markdown", "source": [ "2A. Calculate the mean, median minimum, maximum, and standard deviation of length of stay for all patients?" ], "metadata": { "id": "EPUdj34309TR" } }, { "cell_type": "code", "source": [], "metadata": { "id": "RRnZlUkQ0jSC" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "2B. Plot the distribution of length of stay over all patients" ], "metadata": { "id": "z_q6eOih1KON" } }, { "cell_type": "code", "source": [], "metadata": { "id": "bMY9KgYr1QKf" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "2C. What percentage of patients have stays under 4 days?" ], "metadata": { "id": "wwe_hcMB1Qcn" } }, { "cell_type": "code", "source": [], "metadata": { "id": "I3K7HQhS1U0w" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "2D. Compare visits/icu stays that are less thna or equal to 4 days to those that are longer than 4 days. What is the difference between the patient characteristics in each group? what about diagnosis? what about procedures? Prescriptions? Feel free to explore all the data and tables available to you (usiung the documentation) and write a short report on in what ways are those two groups of patients similar and in what ways are they different?" ], "metadata": { "id": "bb4LV8uF1VHk" } }, { "cell_type": "code", "source": [], "metadata": { "id": "1IIFTtHg1ykG" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Question 3. What is a clinical question you would want to ask using this data?\n", " - what is the question\n", " - what action will you take with the answer you get\n", " - how could you use AI to help with this use case?" ], "metadata": { "id": "rt0LenWk19vt" } } ] }