Doctors: AI is Not Ready for Prime Time (But it Soon Will)

The New York Times talks to doctors who worry about whether artificial intelligence (AI) is up to the job of assisting in patient care.

In medicine, the cautionary tales about the unintended effects of artificial intelligence are already legendary.

There was the program meant to predict when patients would develop sepsis, a deadly bloodstream infection, that triggered a litany of false alarms. Another, intended to improve follow-up care for the sickest patients, appeared to deepen troubling health disparities.

AI is being tested in various ways. There is no Doctor AI yet, but the algorithms are embedded in decision-support software and even hardware that analyzes mammograms.

Wary of such flaws, physicians have kept A.I. working on the sidelines: assisting as a scribe, as a casual second opinion and as a back-office organizer. But the field has gained investment and momentum for uses in medicine and beyond.

The US. Food and Drug Administration (FDA) does not have the resources or expertise to regulate AI in ways that hardness its potential while avoiding the pitfalls. The federal agency is struggling to keep up. The FDA seems more comfortable with firms using AI for specific tasks, rather than broader diagnostics. For instance, algorithms have assisted radiologists reading mammograms for a few years now. AI is perfectly suited to make these algorithms even more accurate and has made significant inroads in the field of radiology.

Agency officials have only begun to talk about reviewing technology that would continue to “learn” as it processes thousands of diagnostic scans. And the agency’s existing rules encourage developers to focus on one problem at a time — like a heart murmur or a brain aneurysm — a contrast to A.I. tools used in Europe that scan for a range of problems.

In other words, it is easy to show that AI can read mammograms and improve the productivity of radiologists. It’s harder to validate a primary care AI that could boost productivity of primary care providers. So far the FDA has approved numerous tools for medical diagnostics and decision making.

Still, doctors are raising more questions as they attempt to deploy the roughly 350 software tools that the F.D.A. has cleared to help detect clots, tumors or a hole in the lung. They have found few answers to basic questions: How was the program built? How many people was it tested on? Is it likely to identify something a typical doctor would miss?

The lack of publicly available information, perhaps paradoxical in a realm replete with data, is causing doctors to hang back, wary that technology that sounds exciting can lead patients down a path to more biopsies, higher medical bills and toxic drugs without significantly improving care.

Interestingly, the FDA has jurisdiction only over products that are marketed to heath care systems. The agency does not have any say over proprietary AI products built by health care systems or health insurers for their own use. Part of the complaint is that the mathematical models employed by MedTech firms are proprietary. That is, they don’t share the secret sauce that powers their decision support tools. In addition to concealing proprietary information, it also has the potential to conceal weaknesses and shortfalls.

Dr. Jeffrey Shuren, the chief of the F.D.A.’s medical device division, has acknowledged the need for continuing efforts to ensure that A.I. programs deliver on their promises after his division clears them. While drugs and some devices are tested on patients before approval, the same is not typically required of A.I. software programs.

One new approach could be building labs where developers could access vast amounts of data and build or test A.I. programs, Dr. Shuren said during the National Organization for Rare Disorders conference on Oct. 16.

That’s a good idea. Who has the data and how does the FDA go about facilitating sharing it? Another way AI is being tested is in the background pouring through medical records of patients in hospitals.

University of Michigan researchers examined a widely used A.I. tool in an electronic health-record system meant to predict which patients would develop sepsis. They found that the program fired off alerts on one in five patients — though only 12 percent went on to develop sepsis.

Another program that analyzed health costs as a proxy to predict medical needs ended up depriving treatment to Black patients who were just as sick as white ones. The cost data turned out to be a bad stand-in for illness, a study in the journal Science found, since less money is typically spent on Black patients.

Think about that last paragraph. Prior medical spending is a good gauge of health status and medical needs in people who see the doctor. Medical spending is not a reliable gauge of health status in people who don’t see their doctor regularly. Thus, the algorithm was a better gauge of health status in white people who’ve seen their doctors regularly than it was in black patients who may have seen the doctor less.

AI is a buzzword in common use to describe machine learning algorithms. It’s not magic. When Netflix suggests movies you may enjoy, the advice is based on movies you’ve watched. Your data is compared to data from millions of others who watched the same movie you’ve watched and enjoyed other movies that are suggested to you as aligned with your preferences. These algorithms must be tested and retested and refined over time. Facebook and LinkedIn suggest people to connect with based on data from others like you and who your contacts have linked to. Medical AI is similar. If 1 million people with similar vital signs and blood chemistry to you had a given disease, there is a chance that you could have that disease. In statistical models, you start with a correlation and build from there.

Join the conversation.Cancel reply