Research Guides: Artificial Intelligence: AI in Medicine and Health Sciences

Institutional Responses

"In the United States, the Food and Drug Administration (FDA) bears responsibility for the regulation of healthcare ML models... Currently, the FDA’s proposed ML-specific modifications to the software as a medical device regulations draw a distinction between models that are trained and then frozen prior to clinical deployment and models that continue to learn on observed outcomes." (Chen 2021)

Health AI Partnership (a health sciences network dedicated to exploring AI in health care)

FDA Digital Health Center of Excellence (including a link to an updated list of AI/Machine Learning-Enabled Devices)

In the News

NEJM AI Grand Rounds Podcast, hosted by Arjun (Raj) Manrai, Ph.D. and Andrew Beam, Ph.D., features informal conversations with a variety of unique experts exploring the deep issues at the intersection of artificial intelligence, machine learning, and medicine.

NEJM AI online collection: "Artificial Intelligence (AI) has tremendous potential to advance clinical practice and the delivery of patient care. A new Review article series, “AI in Medicine,” explores the role of AI technology in clinical medicine and digital health, and examines the promise and pitfalls of its application across the health care continuum."

"Elon Musk's Neuralink has FDA approval to put chips in humans' brains." (Mike Snider, USA Today, June 9th, 2023)

Contribute to this AI Guide

Do you have comments or questions?

This guide to artificial intelligence is very much a work in progress, and we would love to hear your thoughts on other resources we can include or issues to be addressed!

Fill out this form to share your feedback, comments, and/or questions with the library.

AI in Medicine

A lot of AI in medicine right now uses machine learning to parse data:

Examples of image analysis via machine learning (Ravindran 2022)

mapping neural connectivity
virtual histology
cell segmentation (locating cells in microscopic images and highlighting them for analysis)
mapping protein localization

Examples of electronic health records analysis via machine learning (Miotto 2016)

drug targeting
personalized prescription
patient similarity
clinical trial recruitment
disease prediction

Patient feeling about AI integration in medical practice is very mixed:

Large Language Models in Medicine

Historically, medicine has deployed other forms of AI than large language models, but this is quickly changing.

AI passing medical exams:

In their forthcoming study, Singhal et. al. found that Google's Flan-PaLM LLM achieved 67.6% accuracy on MedQA (US Medical License Exam questions), though the authors also state that "human evaluation reveals key gaps in Flan-PaLM responses" (Singhal 2023)

AI Chatbots responding to patient questions:

Harms of AI and decision tools

Documented bias in medical decision-making tools:

The spirometer is a famous example of race-based diagnosis:

""Race correction" is built into the software of the spirometer globally. To evaluate lung function and to make a recording, the operator/clinician must determine a patient's race. For most modern spirometers, this entails selecting a race option from a drop down menu or pressing a button. And the options vary by manufacturer." (Shaban 2014)
- see also
  - Anderson, M. A., Malhotra, A., & Non, A. L. (2021). Could routine race-adjustment of spirometers exacerbate racial disparities in COVID-19 recovery? The Lancet Respiratory Medicine, 9(2), 124–125.
  - Shaban, H. (2014, August 29). How Racism Creeps Into Medicine. The Atlantic.

Algorithms are also susceptible to bias:

In "Dissecting racial bias in an algorithm used to manage the health of populations" (Obermeyer 2019) illustrates how unwittingly inputted bias, (health care cost as a proxy for health care need) impacts health outcomes on the population level

"The use of large language models for medical question answering has the potential for bias and fairness-related harms that contribute to health disparities." (Singhal 2023)

Potential sources of harm:

disparities in funding and problem selection priorities are an ethical violation of principles of justice (Chen 2021)
a focus on convenience sampling and patterns in training data that reflect disparities in health outcomes and access to care, (Chen 2021; Singhal 2023)
capability for medical question answering systems to reproduce racist misconceptions regarding the cause of racial health disparities (Singhal 2023)
algorithmic design choices; evaluating performance on large populations despite different outcomes for sub-populations (Chen 2021; Singhal 2023)
differences in behavior or performance of machine learning systems across populations and groups that introduce downstream harms when used to inform medical decision making (Singhal 2023)
in image-analysis algorithms, labeling errors, measurement biases, spectrum bias, etc. (Varoquaux 2022)
denial of insurance coverage based on AI predictions (Ross 2023)
disparity in where the information is being gathered, with less data provided by countries with limited resources (Palmer 2022)

Modern race-adjusted algorithms in clinical medicine:

the following information is reproduced from:

Vyas, D. A., Eisenstein, L. G., & Jones, D. S. (2020). Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms. New England Journal of Medicine, 383(9), 874–882.

The American Heart Association’s Get with the Guidelines–Heart Failure. Predicts in-hospital mortality in patients with acute heart failure. Clinicians are advised to use this risk stratification to guide decisions regarding initiating medical therapy.
1. "Use of Race: Adds 3 points to the risk score if the patient is identified as nonblack. This addition increases the estimated probability of death (higher scores predict higher mortality)."
2. "Equity concern: The original study envisioned using this score to “increase the use of recommended medical therapy in high-risk patients and reduce resource utilization in those at low risk.”⁹ The race correction regards black patients as lower risk and may raise the threshold for using clinical resources for black patients."
Estimated glomerular filtration rate (eGFR) MDRD and CKD-EPI equations. Estimates glomerular filtration rate on the basis of a measurement of serum creatinine.
1. "Use o Race: The MDRD equation reports a higher eGFR (by a factor of 1.210) if the patient is identified as black. This adjustment is similar in magnitude to the correction for sex (0.742 if female). The CKD-EPI equation (which included a larger number of black patients in the study population), proposes a more modest race correction (by a factor of 1.159) if the patient is identified as black. This correction is larger than the correction for sex (1.018 if female)."
2. "Equity Concern: Both equations report higher eGFR values (given the same creatinine measurement) for patients identified as black, suggesting better kidney function. These higher eGFR values may delay referral to specialist care or listing for kidney transplantation."
Vaginal Birth after Cesarean (VBAC) Risk Calculator. Estimates the probability of successful vaginal birth after prior cesarean section. Clinicians can use this estimate to counsel people who have to decide whether to attempt a trial of labor rather than undergo a repeat cesarean section.
1. "Use of Race: The African-American and Hispanic correction factors subtract from the estimated success rate for any person identified as black or Hispanic. The decrement for black (0.671) or Hispanic (0.680) is almost as large as the benefit from prior vaginal delivery (0.888) or prior VBAC (1.003)."
2. "Equity concern: The VBAC score predicts a lower chance of success if the person is identified as black or Hispanic. These lower estimates may dissuade clinicians from offering trials of labor to people of color."

Artificial Intelligence