Skip to Main Content

Artificial Intelligence

a guide to artificial intelligence in medicine and health sciences education

Recommended readings on AI in medicine

Alenichev, A., Kingori, P., & Grietens, K. P. (2023). Reflections before the storm: the AI reproduction of biased imagery in global health visuals. The Lancet Global Health, 0(0).

Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media ForumJAMA Intern Med. 2023;183(6):589–596. doi:10.1001/jamainternmed.2023.1838

Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Scharli, N., Chowdhery, A., Mansfield, P., Arcas, B. A. y, Webster, D., … Natarajan, V. (2022). Large Language Models Encode Clinical Knowledge (arXiv:2212.13138). arXiv.

Golan, R., Ripps, S. J., Reddy, R., Loloi, J., Bernstein, A. P., Connelly, Z. M., Golan, N. S., & Ramasamy, R. (2023). ChatGPT’s Ability to Assess Quality and Readability of Online Medical Information: Evidence From a Cross-Sectional StudyCureus15(7), e42214.

Ross, C. (2023, April 27). A research team airs the messy truth about AI in medicine — and gives hospitals a guide to fix itSTAT.

Teng, M., Singla, R., Yau, O., Lamoureux, D., Gupta, A., Hu, Z., Hu, R., Aissiou, A., Eaton, S., Hamm, C., Hu, S., Kelly, D., MacMillan, K. M., Malik, S., Mazzoli, V., Teng, Y.-W., Laricheva, M., Jarus, T., & Field, T. S. (2022). Health Care Students’ Perspectives on Artificial Intelligence: Countrywide Survey in Canada. JMIR Medical Education, 8(1), e33390.

AI in Medicine

A lot of AI in medicine right now uses machine learning to parse data:

Examples of image analysis via machine learning (Ravindran 2022)

  • mapping neural connectivity 
  • virtual histology 
  • cell segmentation (locating cells in microscopic images and highlighting them for analysis)
  • mapping protein localization

Examples of electronic health records analysis via machine learning (Miotto 2016)

  • drug targeting
  • personalized prescription
  • patient similarity
  • clinical trial recruitment
  • disease prediction


Patient feeling about AI integration in medical practice is very mixed:

Figure from PEW Research Center's 2022 Survey on American's perspectives on AI in healthcare. Figure shows bar chart of percent of U.S. adults who say the use of artificial intelligence in health and medicine to do things like diagnose diseases and recommend treatments will make various aspects of healthcare worse, better, or stay the same. The highest response rate for AI making things better was in the category "The number of mistakes made by healthcare providers". The highest response rate in the "worse" category was for "Patients relationship with their healthcare providers". The highest response rate int he "same" category was for "The job that healthcare providers do treating people of all races and ethnicities fairly"

Large Language Models in Medicine

Historically, medicine has deployed other forms of AI than large language models, but this is quickly changing.

AI passing medical exams:

In their forthcoming study, Singhal et. al. found that Google's Flan-PaLM LLM achieved 67.6% accuracy on MedQA (US Medical License Exam questions), though the authors also state that "human evaluation reveals key gaps in Flan-PaLM responses" (Singhal 2023)

AI Chatbots responding to patient questions:

Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. 2023;183(6):589–596. doi:10.1001/jamainternmed.2023.1838

Harms of AI and decision tools

Documented bias in medical decision-making tools:

The spirometer is a famous example of race-based diagnosis:

Algorithms are also susceptible to bias:

  • In "Dissecting racial bias in an algorithm used to manage the health of populations" (Obermeyer 2019) illustrates how unwittingly inputted bias, (health care cost as a proxy for health care need) impacts health outcomes on the population level

"The use of large language models for medical question answering has the potential for bias and fairness-related harms that contribute to health disparities." (Singhal 2023)

Potential sources of harm:

  • disparities in funding and problem selection priorities are an ethical violation of principles of justice (Chen 2021)
  • a focus on convenience sampling and patterns in training data that reflect disparities in health outcomes and access to care, (Chen 2021; Singhal 2023)
  • capability for medical question answering systems to reproduce racist misconceptions regarding the cause of racial health disparities  (Singhal 2023)
  • algorithmic design choices; evaluating performance on large populations despite different outcomes for sub-populations (Chen 2021; Singhal 2023)
  • differences in behavior or performance of machine learning systems across populations and groups that introduce downstream harms when used to inform medical decision making (Singhal 2023)
  • in image-analysis algorithms, labeling errors, measurement biases, spectrum bias, etc. (Varoquaux 2022)
  • denial of insurance coverage based on AI predictions (Ross 2023)
  • disparity in where the information is being gathered, with less data provided by countries with limited resources (Palmer 2022)

Modern race-adjusted algorithms in clinical medicine:

the following information is reproduced from: 

Vyas, D. A., Eisenstein, L. G., & Jones, D. S. (2020). Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms. New England Journal of Medicine, 383(9), 874–882.

  1. The American Heart Association’s Get with the Guidelines–Heart FailurePredicts in-hospital mortality in patients with acute heart failure. Clinicians are advised to use this risk stratification to guide decisions regarding initiating medical therapy.
    1. "Use of Race: Adds 3 points to the risk score if the patient is identified as nonblack. This addition increases the estimated probability of death (higher scores predict higher mortality)."
    2. "Equity concern: The original study envisioned using this score to “increase the use of recommended medical therapy in high-risk patients and reduce resource utilization in those at low risk.”9 The race correction regards black patients as lower risk and may raise the threshold for using clinical resources for black patients."
  2. Estimated glomerular filtration rate (eGFR) MDRD and CKD-EPI equationsEstimates glomerular filtration rate on the basis of a measurement of serum creatinine.
    1. "Use o Race: The MDRD equation reports a higher eGFR (by a factor of 1.210) if the patient is identified as black. This adjustment is similar in magnitude to the correction for sex (0.742 if female). The CKD-EPI equation (which included a larger number of black patients in the study population), proposes a more modest race correction (by a factor of 1.159) if the patient is identified as black. This correction is larger than the correction for sex (1.018 if female)."
    2. "Equity Concern: Both equations report higher eGFR values (given the same creatinine measurement) for patients identified as black, suggesting better kidney function. These higher eGFR values may delay referral to specialist care or listing for kidney transplantation."
  3. Vaginal Birth after Cesarean (VBAC) Risk CalculatorEstimates the probability of successful vaginal birth after prior cesarean section. Clinicians can use this estimate to counsel people who have to decide whether to attempt a trial of labor rather than undergo a repeat cesarean section.
    1. "Use of Race: The African-American and Hispanic correction factors subtract from the estimated success rate for any person identified as black or Hispanic. The decrement for black (0.671) or Hispanic (0.680) is almost as large as the benefit from prior vaginal delivery (0.888) or prior VBAC (1.003)."
    2. "Equity concern: The VBAC score predicts a lower chance of success if the person is identified as black or Hispanic. These lower estimates may dissuade clinicians from offering trials of labor to people of color."


2022 PEW Research Center info graphic titled "About  the issue: among those who say racial or ethnic bias is a major/minor problem in health and medicine, % who say that if artificial intelligence  is used more, the issue of bias and unfair treatment based on a patient's race or ethnicity would..." and then depicts three responses, "get better", "stay the same", and "get worse", with 51% of responsents answering "get better", 33% of respondents saying "stay the same", and 15% of respondents saying "get worse". The figure also depicts reasons respondents gave for each category of responses. Among those who said AI would make things better, 36% said AI was more neutral. Among those who said things would stay the same, 28% said training data and humans are both biased, and among those who said things would stay the same, 28% said training data is biased