Stanford Hosts Colloquium On Machine Learning And Causal Inference
Epidemiologists -
Spectators Or Participants In The Machine Revolution?
“Is Prediction
Enough?” That’s the title given to a recent colloquium organized by
Stanford University’s Division of Epidemiology. The half-day event in
late April was designed to bring together experts from the worlds of
epidemiology, artificial intelligence (AI), machine learning (ML),
statistics and other disciplines to better understand the successes
and challenges of using big data to answer health related questions.
Examples
Potential uses of big data were provided at the meeting
in a presentation by Stanford’s Nigam Shah whose group has
developed an informatics consultation service which doctors can use to
help support medical decision making at the point of care. This “green
button” technology has
access to demographics, diagnoses, procedures, medications, laboratory
values, clinical notes, mortality, and length of stay information for
millions of patients. It is
able to analyze
millions of records to answer a doctor’s question--what has happened
to other patients like mine?” Greenbutton is further explained in a
two-minute video at this link :
https://tinyurl.com/y9zgrzbw
The potential for AI-assisted health decision-making is
enormous, according to experts in the field, and also has the
potential to predict and guide the response to disease outbreaks. Some
have gone so far as to speculate that the field has the potential to
make entire professions or specialties obsolete, like radiologists and
pathologists. They cite the example of piloting aircraft which used to
require extreme human cognition but now airplanes can be flown on
their own.
Role for
Epidemiologists
It is in this hyped environment that the need for a
discussion of the challenges and limitations of analyzing big data
algorithmically became obvious, according to Steve Goodman,
Chief of the Division of Epidemiology at Stanford. This is a natural
role for epidemiologists as observational data scientists who
understand the inherent limitations of using machine learning
algorithms to analyze data, especially medical records, to guide
treatment or prevention interventions.
Another speaker told the audience that big data is
better described as “cheap data,” and epidemiologists understand the
impact that poor data quality can have on inferences.
Three Tasks
To help participants think more clearly about the key
question at the meeting, Harvard University’s Miguel Hernan
spoke first and laid out a conceptual framework which identified three
tasks data analysts/data scientists can carry out when seeking
scientific insights from data: 1) description, 2) prediction, and 3)
causal inference.
Prediction vs Causal
Inference
A major contention during the colloquium was that the
difference between prediction and causal inference is often
misunderstood and can lead to false conclusions or misguided actions.
The key difference between prediction and causal inference according
to Hernan is that all the information required for a well-defined
predictive task is included in the data, whereas causal inference
requires expert knowledge of relationships not discernable from the
data itself. The knowledge needed is how the system being analyzed
“works,” which in turn guides analyses in ways that are hard or
impossible to program into machine learning algorithms.
Perspective
In
comments to the Monitor about the Colloquium, Goodman said “I think
this is a fantastically important issue for epidemiologists, as we
seem to be on the sidelines as a tidal wave of uncritical hype about
the potential of machine learning techniques and AI to transform
healthcare and prevention washes over us. For issues involving pattern
recognition with a known truth, like diagnosis or image
interpretation, that may be right, but for figuring out which
treatments work and for who, where we don’t have an independent way to
know the truth, that’s different territory not recognized as such by
many - often from the tech world - enthralled by the new technologies,
and the possibility for “apps” that will replace the need for
expertise.
Online Access
The half-day Colloquium featured three sets of two
speakers and each set had a panel which reacted to the presentations.
The Colloquium in its entirety was recorded on video with timestamps
for each speaker and can be easily accessed and watched selectively by
speaker or in entirety at this link
https://tinyurl.com/y75m6h37
Papers by Miguel Hernan and colleagues on Data Science and Nigam Shah
and colleagues on Green Button can be found at the following links:
https://tinyurl.com/yacn9hsx and
https://tinyurl.com/yd2tomct
■
|