Professionals can lose their competency over time if not maintained appropriately. That’s why most professional associations require continuing education. Artificial Intelligence (AI) also needs continuing education and close monitoring of competency, experts report.
AI is used throughout health care. If you have had a mammogram in the past few years an algorithm probably assisted the radiologist’s interpretation of the scan. There are AI-powered chatbots for mental health counseling. In hospital settings, AI is often used to predict how patients will respond to treatment, going as far as to provide probabilities of death. The technology even prompts physicians to have difficult conversations with patients to manage expectations and even help patients decide whether to forgo treatment. The following is an excerpt from Kaiser Family Foundation Health News:
AI is already widespread in health care. Algorithms are used to predict patients’ risk of death or deterioration, to suggest diagnoses or triage patients, to record and summarize visits to save doctors work, and to approve insurance claims.But it’s far from being a set-it-and-forget-it tool. A routine tech checkup revealed the algorithm decayed during the covid-19 pandemic, getting 7 percentage points worse at predicting who would die, according to a 2022 study.
Many physicians use ambient documentation, a large language model AI program that documents the doctor’s visit and files the appropriate information in the medical record. Physicians’ time is too valuable to manually write all the information discussed during an exam, but the visit is less beneficial if the information exchange is not documented in the medical record. Yet, just like humans, AI sometimes fails to record everything correctly.
But, Ehrenfeld said, “There is no standard right now for comparing the output of these tools.”And that’s a problem, when even small errors can be devastating. A team at Stanford University tried using large language models — the technology underlying popular AI tools like ChatGPT — to summarize patients’ medical history. They compared the results with what a physician would write.“Even in the best case, the models had a 35% error rate,”
The idea that AI can degrade and fail over time is easy to understand when you realize machine learning involves constant learning. That’s to say the algorithm’s dataset is evolving, hopefully making it more accurate but the opposite can also occur. Process and learn from a bad batch of data and the results begin to degrade, often giving inconsistent answers. At the very least any algorithm should give the same answer to the same question when asked repeatedly in short succession. AI sometimes fails this test:
Sandy Aronson, a tech executive at Mass General Brigham’s personalized medicine program in Boston, said that when his team tested one application meant to help genetic counselors locate relevant literature about DNA variants, the product suffered “nondeterminism” — that is, when asked the same question multiple times in a short period, it gave different results.
Humans need to monitor AI, not just when it’s being developed but during all phases of use. AI is continually learning and that means unknown factors can degrade the accuracy. For example, I’ve seen numerous advertisements online for companies selling ineffective snake oil products and bogus elixirs for questionable medical treatments. Recent news reports have even highlighted academic journal articles with questionable findings and doctored data. Imagine the outcome if an AI training process consumes erroneous, new age information and incorporated it as medical science. Experts report that once AI learns something incorrect it is very difficult to make AI unlearn the inaccurate data. Now for a scary thought: imagine if advertisers figure out how to influence clinical AI to bias the results in favor of their products.
Evaluating whether these products work is challenging. Evaluating whether they continue to work — or have developed the software equivalent of a blown gasket or leaky engine — is even trickier.
The human cost of monitoring AI is enormous but necessary. Algorithms need to be validated not just before releasing a product but throughout use. That’s something few hospitals are prepared for. When using Microsoft Edge’s AI Copilot, it typically provides sources. Something like that would be useful for retraining AI to forget inappropriate information. That is something for AI developers to consider.
Read more at KFF Health News: Health Care AI, Intended To Save Money, Turns Out To Require a Lot of Expensive Humans