1
00:00:15,350 --> 00:00:17,450
PROFESSOR: All right,
everyone, so we

2
00:00:17,450 --> 00:00:21,510
are very happy to have Andy Beck
as our invited speaker today.

3
00:00:21,510 --> 00:00:25,250
Andy has a very
unique background.

4
00:00:25,250 --> 00:00:29,030
He's trained both as a computer
scientist and as a clinician.

5
00:00:29,030 --> 00:00:31,490
His specialty is in pathology.

6
00:00:31,490 --> 00:00:34,860
When he was a
student at Stanford,

7
00:00:34,860 --> 00:00:39,200
his thesis was on how one could
use machine learning algorithms

8
00:00:39,200 --> 00:00:43,730
to really understand a
pathology data set, at the time,

9
00:00:43,730 --> 00:00:47,180
using more traditional
regression-style approaches

10
00:00:47,180 --> 00:00:49,430
to understanding
what the field is now

11
00:00:49,430 --> 00:00:51,260
called computational pathology.

12
00:00:51,260 --> 00:00:53,780
But his work was really at
the forefront of his field.

13
00:00:53,780 --> 00:00:56,390
Since then, he's
come to Boston, where

14
00:00:56,390 --> 00:01:01,280
he was an attending and faculty
at Beth Israel Deaconess

15
00:01:01,280 --> 00:01:02,480
Medical Center.

16
00:01:02,480 --> 00:01:04,010
In the recent couple
of years, he's

17
00:01:04,010 --> 00:01:07,160
been running a
company called PathAI,

18
00:01:07,160 --> 00:01:11,960
which is, in my opinion, one
of the most exciting companies

19
00:01:11,960 --> 00:01:13,670
of AI in medicine.

20
00:01:13,670 --> 00:01:15,920
And he is my favorite
invited speaker--

21
00:01:15,920 --> 00:01:16,310
ANDY BECK: He says
that to everyone.

22
00:01:16,310 --> 00:01:18,893
PROFESSOR: --every time I get
an opportunity to invite someone

23
00:01:18,893 --> 00:01:19,870
to speak.

24
00:01:19,870 --> 00:01:21,960
And I think you'll be
really interested in what

25
00:01:21,960 --> 00:01:22,770
he has to say.

26
00:01:22,770 --> 00:01:23,080
ANDY BECK: Great.

27
00:01:23,080 --> 00:01:24,080
Well, thank you so much.

28
00:01:24,080 --> 00:01:25,550
Thanks for having me.

29
00:01:25,550 --> 00:01:28,310
Yeah, I'm really excited
to talk in this course.

30
00:01:28,310 --> 00:01:32,512
It is a super exciting time for
machine learning in pathology

31
00:01:32,512 --> 00:01:34,220
And if you have any
questions throughout,

32
00:01:34,220 --> 00:01:35,360
please feel free to ask.

33
00:01:38,530 --> 00:01:42,430
And so for some background
on what pathology is--

34
00:01:42,430 --> 00:01:43,860
it's so like, if
you're a patient.

35
00:01:43,860 --> 00:01:46,890
You go to the
doctor, and AI could

36
00:01:46,890 --> 00:01:49,890
apply in any aspect of
this whole trajectory,

37
00:01:49,890 --> 00:01:52,140
and I'll kind of talk about
specifically in pathology.

38
00:01:52,140 --> 00:01:53,140
So you go to the doctor.

39
00:01:53,140 --> 00:01:54,660
They take a bunch
of data from you.

40
00:01:54,660 --> 00:01:55,540
You talk to them.

41
00:01:55,540 --> 00:01:58,260
They get signs and symptoms.

42
00:01:58,260 --> 00:02:00,590
Typically, if they're
at all concerned,

43
00:02:00,590 --> 00:02:03,510
and it could be something that's
a structural alteration that's

44
00:02:03,510 --> 00:02:05,580
not accessible just
through taking blood work,

45
00:02:05,580 --> 00:02:08,056
say, like a cancer, which is
one of the biggest things,

46
00:02:08,056 --> 00:02:10,139
they'll send you to radiology
where they want to--

47
00:02:10,139 --> 00:02:12,139
the radiology is the best
way for acquiring data

48
00:02:12,139 --> 00:02:14,400
to look for big
structural changes.

49
00:02:14,400 --> 00:02:16,830
So you can't see single
cells in radiology.

50
00:02:16,830 --> 00:02:20,250
But you can see inside the body
and see some large things that

51
00:02:20,250 --> 00:02:22,230
are changing to make
evaluations for,

52
00:02:22,230 --> 00:02:25,410
like, you have a cough, like
are you looking at lung cancer,

53
00:02:25,410 --> 00:02:27,200
or are you looking at pneumonia?

54
00:02:27,200 --> 00:02:29,010
And radiology only
takes you so far.

55
00:02:29,010 --> 00:02:33,030
And people are super excited
about applying AI to radiology,

56
00:02:33,030 --> 00:02:35,100
but I think one thing
they often forget

57
00:02:35,100 --> 00:02:38,280
is these images are not
very data-rich compared

58
00:02:38,280 --> 00:02:41,460
to the core data types.

59
00:02:41,460 --> 00:02:43,257
I mean, this is my
bias from pathology,

60
00:02:43,257 --> 00:02:45,090
but radiology gets you
some part of the way,

61
00:02:45,090 --> 00:02:46,890
where you can sort of
triage normal stuff.

62
00:02:46,890 --> 00:02:48,630
And the radiologist will
have some impression

63
00:02:48,630 --> 00:02:49,410
of what they're looking at.

64
00:02:49,410 --> 00:02:50,785
And often, that's
the bottom line

65
00:02:50,785 --> 00:02:52,890
in the radiology report
is impression-- concerning

66
00:02:52,890 --> 00:02:55,770
for cancer, or impression--
likely benign but not sure,

67
00:02:55,770 --> 00:02:57,823
or impression-- totally benign.

68
00:02:57,823 --> 00:02:59,740
And that will also guide
subsequent decisions.

69
00:02:59,740 --> 00:03:01,410
But if there's some concern that
something serious is going on,

70
00:03:01,410 --> 00:03:03,750
the patient undergoes a
pretty serious procedure,

71
00:03:03,750 --> 00:03:05,130
which is a tissue biopsy.

72
00:03:05,130 --> 00:03:08,292
So pathology
requires tissue to do

73
00:03:08,292 --> 00:03:09,750
what I'm going to
talk about, which

74
00:03:09,750 --> 00:03:11,957
is surgical pathology that
requires tissue specimen.

75
00:03:11,957 --> 00:03:13,290
There's also blood-based things.

76
00:03:13,290 --> 00:03:16,475
But then this is the diagnosis
where you're trying to say

77
00:03:16,475 --> 00:03:17,100
is this cancer?

78
00:03:17,100 --> 00:03:18,300
Is this not cancer?

79
00:03:18,300 --> 00:03:20,070
And that report by
itself can really

80
00:03:20,070 --> 00:03:21,780
guide subsequent
decisions, which

81
00:03:21,780 --> 00:03:24,450
could be no further
treatment or a big surgery

82
00:03:24,450 --> 00:03:27,330
or a big decision about
chemotherapy and radiotherapy.

83
00:03:27,330 --> 00:03:28,830
So this is one area
where you really

84
00:03:28,830 --> 00:03:31,260
want to incorporate data
in the most effective way

85
00:03:31,260 --> 00:03:33,330
to reduce errors, to
increase standardization,

86
00:03:33,330 --> 00:03:35,130
and to really inform
the best treatment

87
00:03:35,130 --> 00:03:37,830
decision for each patient
based on the characteristics

88
00:03:37,830 --> 00:03:39,305
of their disease.

89
00:03:39,305 --> 00:03:40,680
And the one thing
about pathology

90
00:03:40,680 --> 00:03:43,470
that's pretty interesting
is it's super visual.

91
00:03:43,470 --> 00:03:46,380
And this is just a
kind of random sampling

92
00:03:46,380 --> 00:03:48,138
of some of the types
of different imagery

93
00:03:48,138 --> 00:03:49,930
that pathologists are
looking at every day.

94
00:03:49,930 --> 00:03:52,770
I think this is one thing that
draws people to this specialty

95
00:03:52,770 --> 00:03:54,960
is a saying in
radiology, you're sort

96
00:03:54,960 --> 00:03:57,690
of looking at an impression of
what might be happening based

97
00:03:57,690 --> 00:04:01,862
on sending different
types of images

98
00:04:01,862 --> 00:04:03,570
and acquiring the data
and sort of trying

99
00:04:03,570 --> 00:04:04,737
to estimate what's going on.

100
00:04:04,737 --> 00:04:07,380
Whereas here, you're actually
staining pieces of tissue

101
00:04:07,380 --> 00:04:10,617
and looking by eye at
actual individual cells.

102
00:04:10,617 --> 00:04:11,700
You can look within cells.

103
00:04:11,700 --> 00:04:14,170
You can look at how populations
of cells are being organized.

104
00:04:14,170 --> 00:04:16,003
And for many diseases,
this still represents

105
00:04:16,003 --> 00:04:19,714
sort of the core data type
that defines what's going on,

106
00:04:19,714 --> 00:04:21,839
and is this something with
a serious prognosis that

107
00:04:21,839 --> 00:04:22,797
requires, say, surgery?

108
00:04:22,797 --> 00:04:24,589
Or is this something
that's totally benign?

109
00:04:24,589 --> 00:04:26,980
All of these are different
aspects of benign processes.

110
00:04:26,980 --> 00:04:29,148
And so just the
normal human body

111
00:04:29,148 --> 00:04:30,690
creates all these
different patterns.

112
00:04:30,690 --> 00:04:32,620
And then there's a lot
of patterns of disease.

113
00:04:32,620 --> 00:04:35,730
And these are all different
subtypes of disease that

114
00:04:35,730 --> 00:04:37,345
are all different morphologies.

115
00:04:37,345 --> 00:04:38,970
So there's sort of
an incredible wealth

116
00:04:38,970 --> 00:04:41,003
of different visual imagery
that the pathologist

117
00:04:41,003 --> 00:04:42,670
has to incorporate
into their diagnosis.

118
00:04:42,670 --> 00:04:43,800
And then there's,
on top of that,

119
00:04:43,800 --> 00:04:45,258
things like special
stains that can

120
00:04:45,258 --> 00:04:48,840
stain for specific organisms,
for infectious disease,

121
00:04:48,840 --> 00:04:50,730
or specific patterns
of protein expression,

122
00:04:50,730 --> 00:04:54,180
for subtyping disease based
on expression of drug targets.

123
00:04:54,180 --> 00:04:57,390
And this even more sort of
increases the complexity

124
00:04:57,390 --> 00:04:58,900
of the work.

125
00:04:58,900 --> 00:05:01,290
So for many years,
there's really nothing new

126
00:05:01,290 --> 00:05:03,870
about trying to apply
AI or machine learning

127
00:05:03,870 --> 00:05:05,230
or computation to this field.

128
00:05:05,230 --> 00:05:06,930
It's actually a
very natural field,

129
00:05:06,930 --> 00:05:09,058
because it's sort
of laboratory-based.

130
00:05:09,058 --> 00:05:10,350
It's all about data processing.

131
00:05:10,350 --> 00:05:12,017
You take this input,
things like images,

132
00:05:12,017 --> 00:05:14,340
and produces output,
what a diagnosis is.

133
00:05:14,340 --> 00:05:17,020
So people have really been
trying this for 40 years

134
00:05:17,020 --> 00:05:17,520
or so now.

135
00:05:17,520 --> 00:05:19,950
This is one of the very first
studies that sort of just

136
00:05:19,950 --> 00:05:22,230
tried to see, could
we train a computer

137
00:05:22,230 --> 00:05:25,162
to identify the
size of cancer cells

138
00:05:25,162 --> 00:05:27,120
through a process they
called morphometry, here

139
00:05:27,120 --> 00:05:27,990
on the bottom?

140
00:05:27,990 --> 00:05:31,350
And then could we just
use sort of measurements

141
00:05:31,350 --> 00:05:35,220
about the size of cancer
cells in a very simple model

142
00:05:35,220 --> 00:05:36,295
to predict outcome?

143
00:05:36,295 --> 00:05:37,920
And in this study,
they have a learning

144
00:05:37,920 --> 00:05:40,410
set that they're learning
from and then a test set.

145
00:05:40,410 --> 00:05:42,750
And they show that their
system, as every paper that

146
00:05:42,750 --> 00:05:45,840
ever gets published shows, does
better than the two competing

147
00:05:45,840 --> 00:05:46,450
approaches.

148
00:05:46,450 --> 00:05:48,550
Although even in this
best case scenario,

149
00:05:48,550 --> 00:05:51,150
there's significant degradation
from learning to test.

150
00:05:51,150 --> 00:05:52,438
So one, it's super simple.

151
00:05:52,438 --> 00:05:54,480
It's using very simple
methods, and the data sets

152
00:05:54,480 --> 00:05:58,190
are tiny, 38 learning
cases, 40 test cases.

153
00:05:58,190 --> 00:06:01,020
And this is published in The
Lancet, which is the leading

154
00:06:01,020 --> 00:06:04,230
biomedical journal even today.

155
00:06:04,230 --> 00:06:06,480
And then people got
excited about AI

156
00:06:06,480 --> 00:06:08,890
sort of building off
of simple approaches.

157
00:06:08,890 --> 00:06:12,170
And back in 1990, it was thought
artificial neural nets would

158
00:06:12,170 --> 00:06:13,920
be super useful for
quantitative pathology

159
00:06:13,920 --> 00:06:16,080
for sort of obvious reasons.

160
00:06:16,080 --> 00:06:17,520
But at that time,
there was really

161
00:06:17,520 --> 00:06:20,045
no way of digitizing stuff
at any sort of scale,

162
00:06:20,045 --> 00:06:21,920
and that problem's only
recently been solved.

163
00:06:21,920 --> 00:06:24,180
But sort of in 2000, people
were first thinking about

164
00:06:24,180 --> 00:06:25,830
once the slides
are digital, then

165
00:06:25,830 --> 00:06:29,580
you could apply computational
methods effectively.

166
00:06:29,580 --> 00:06:31,590
But kind of nothing
really changed,

167
00:06:31,590 --> 00:06:33,090
and still, to a
large degree, hasn't

168
00:06:33,090 --> 00:06:34,840
changed for the
predominance of pathology,

169
00:06:34,840 --> 00:06:37,710
which I'll talk about.

170
00:06:37,710 --> 00:06:39,660
But as was mentioned
earlier, I was

171
00:06:39,660 --> 00:06:42,390
part of one of the first studies
to really take a more machine

172
00:06:42,390 --> 00:06:43,560
learning approach to this.

173
00:06:43,560 --> 00:06:45,060
And what we mean
by machine learning

174
00:06:45,060 --> 00:06:46,860
versus prior
approaches is the idea

175
00:06:46,860 --> 00:06:51,030
of using data-driven analysis
to figure out the best features.

176
00:06:51,030 --> 00:06:53,280
And now you can do that in
an even more explicit way

177
00:06:53,280 --> 00:06:54,750
with machine
learning, but there's

178
00:06:54,750 --> 00:06:56,708
sort of a progression
from measuring one or two

179
00:06:56,708 --> 00:06:59,060
things in a very tedious way
on very small data sets to,

180
00:06:59,060 --> 00:07:00,560
I'd say, this way,
where we're using

181
00:07:00,560 --> 00:07:02,520
some traditional
regression-based machine

182
00:07:02,520 --> 00:07:05,253
learning to measure larger
numbers of features.

183
00:07:05,253 --> 00:07:07,170
And then using things
like those associations,

184
00:07:07,170 --> 00:07:08,940
those features with
patient outcome

185
00:07:08,940 --> 00:07:11,700
to focus your analyses on
the most important ones.

186
00:07:11,700 --> 00:07:14,370
And the challenging
machine learning task here

187
00:07:14,370 --> 00:07:16,410
and really one of the
core tasks in pathology

188
00:07:16,410 --> 00:07:17,860
is image processing.

189
00:07:17,860 --> 00:07:19,770
So how do we train
computers to sort of

190
00:07:19,770 --> 00:07:21,570
have the knowledge
of what is being

191
00:07:21,570 --> 00:07:23,700
looked at that any pathologist
would want to have?

192
00:07:23,700 --> 00:07:25,200
And there's a few
basic things you'd

193
00:07:25,200 --> 00:07:26,790
want to train the
computer to do,

194
00:07:26,790 --> 00:07:29,070
which is, for example,
identify where's the cancer?

195
00:07:29,070 --> 00:07:30,135
Where's the stroma?

196
00:07:30,135 --> 00:07:31,260
Where are the cancer cells?

197
00:07:31,260 --> 00:07:33,690
Where are the
fibroblasts, et cetera?

198
00:07:33,690 --> 00:07:36,367
And then once you train a
machine learning based system

199
00:07:36,367 --> 00:07:37,950
to identify those
things, you can then

200
00:07:37,950 --> 00:07:40,980
extract lots of quantitative
phenotypes out of the images.

201
00:07:40,980 --> 00:07:43,080
And this is all using
human-engineered features

202
00:07:43,080 --> 00:07:45,330
to measure all the different
characteristics of what's

203
00:07:45,330 --> 00:07:46,540
going on in an image.

204
00:07:46,540 --> 00:07:48,390
And machine learning
is being used here

205
00:07:48,390 --> 00:07:49,960
to create those features.

206
00:07:49,960 --> 00:07:52,110
And then we use other
regression-based methods

207
00:07:52,110 --> 00:07:54,750
to associate these features with
things like clinical outcome.

208
00:07:54,750 --> 00:07:56,417
And in this work, we
show that by taking

209
00:07:56,417 --> 00:07:58,125
a data-driven
approach, sort of, you

210
00:07:58,125 --> 00:07:59,790
begin to focus on
things like what's

211
00:07:59,790 --> 00:08:01,770
happening in the tumor
microenvironment,

212
00:08:01,770 --> 00:08:03,350
not just in the tumor itself?

213
00:08:03,350 --> 00:08:05,910
And it sort of turned
out, over the past decade,

214
00:08:05,910 --> 00:08:07,890
that understanding the way the
tumor interacts with the tumor

215
00:08:07,890 --> 00:08:10,390
microenvironment is sort of one
of the most important things

216
00:08:10,390 --> 00:08:12,200
to do in cancer with
things like fields

217
00:08:12,200 --> 00:08:14,423
like immunooncology
being one of the biggest

218
00:08:14,423 --> 00:08:15,840
advances in the
therapy of cancer,

219
00:08:15,840 --> 00:08:17,507
where you're essentially
just regulating

220
00:08:17,507 --> 00:08:21,120
how tumor cells interact
with the cells around them.

221
00:08:21,120 --> 00:08:23,797
And that sort of data
is entirely inaccessible

222
00:08:23,797 --> 00:08:25,380
using traditional
pathology approaches

223
00:08:25,380 --> 00:08:27,450
and really required a
machine learning approach

224
00:08:27,450 --> 00:08:30,720
to extract a bunch of features
and sort of let the data speak

225
00:08:30,720 --> 00:08:32,640
for itself in terms of
which of those features

226
00:08:32,640 --> 00:08:35,679
is most important for survival.

227
00:08:35,679 --> 00:08:37,762
And in this study, we
showed that these things

228
00:08:37,762 --> 00:08:38,970
are associated with survival.

229
00:08:38,970 --> 00:08:40,345
I don't know if
you guys do a lot

230
00:08:40,345 --> 00:08:41,880
of Kaplan-Meier plots in here.

231
00:08:41,880 --> 00:08:43,570
PROFESSOR: They saw it once,
but taking us through it

232
00:08:43,570 --> 00:08:44,890
slowly is never a bad idea.

233
00:08:44,890 --> 00:08:46,182
ANDY BECK: Yeah, so these are--

234
00:08:46,182 --> 00:08:48,280
I feel there's one
type of plot to know

235
00:08:48,280 --> 00:08:50,790
for most of biomedical research,
and it's probably this one.

236
00:08:50,790 --> 00:08:53,020
And it's extremely simple.

237
00:08:53,020 --> 00:08:56,640
So it's really just an
empirical distribution

238
00:08:56,640 --> 00:08:59,350
of how patients are
doing over time.

239
00:08:59,350 --> 00:09:01,290
So the x-axis is time.

240
00:09:01,290 --> 00:09:04,067
And here, the goal is to
build a prognostic model.

241
00:09:04,067 --> 00:09:05,650
I wish I had a
predictive one in here,

242
00:09:05,650 --> 00:09:07,650
but we can talk about
what that would look like.

243
00:09:07,650 --> 00:09:09,840
But a prognostic model,
any sort of prognostic test

244
00:09:09,840 --> 00:09:11,880
in any disease in
medicine is to try

245
00:09:11,880 --> 00:09:14,612
to create subgroups that show
different survival outcomes.

246
00:09:14,612 --> 00:09:16,320
And then by implication,
they may benefit

247
00:09:16,320 --> 00:09:17,362
from different therapies.

248
00:09:17,362 --> 00:09:18,030
They may not.

249
00:09:18,030 --> 00:09:19,170
That doesn't answer
that question,

250
00:09:19,170 --> 00:09:20,545
but it just tells
you if you want

251
00:09:20,545 --> 00:09:22,490
to make an estimate for
how a patient's going

252
00:09:22,490 --> 00:09:25,192
to be doing in five years,
and you can sub-classify them

253
00:09:25,192 --> 00:09:27,150
into two groups, this is
a way to visualize it.

254
00:09:27,150 --> 00:09:28,140
You don't need two groups.

255
00:09:28,140 --> 00:09:29,723
You could do this
with even one group,

256
00:09:29,723 --> 00:09:32,730
but it's frequently used to show
differences between two groups.

257
00:09:32,730 --> 00:09:36,460
So you'll see here, there's
a black line and a red line.

258
00:09:36,460 --> 00:09:38,160
And these are groups
of patients where

259
00:09:38,160 --> 00:09:41,160
a model trained
not on these cases

260
00:09:41,160 --> 00:09:43,470
was trained to separate
high-risk patients

261
00:09:43,470 --> 00:09:44,850
from low-risk patients.

262
00:09:44,850 --> 00:09:47,370
And the way we did that was
we did logistic regression

263
00:09:47,370 --> 00:09:50,520
on a different data set, sort
of trying to classify patients

264
00:09:50,520 --> 00:09:52,980
alive at five years following
diagnosis versus patients

265
00:09:52,980 --> 00:09:54,492
deceased, five years diagnosis.

266
00:09:54,492 --> 00:09:55,200
We build a model.

267
00:09:55,200 --> 00:09:56,310
We fix the model.

268
00:09:56,310 --> 00:10:00,280
Then we apply it to this
data set of about 250 cases.

269
00:10:00,280 --> 00:10:02,790
And then we just ask, did we
actually effectively create

270
00:10:02,790 --> 00:10:07,290
two different groups of patients
whose survival distribution is

271
00:10:07,290 --> 00:10:08,592
significantly different?

272
00:10:08,592 --> 00:10:10,050
So what this p-value
is telling you

273
00:10:10,050 --> 00:10:12,690
is the probability that
these two curves come

274
00:10:12,690 --> 00:10:14,250
from the same
underlying distribution

275
00:10:14,250 --> 00:10:16,740
or that there's no difference
between these two curves

276
00:10:16,740 --> 00:10:19,482
across all of the time points.

277
00:10:19,482 --> 00:10:20,940
And what we see
here is there seems

278
00:10:20,940 --> 00:10:23,610
to be a difference between the
black line versus the red line,

279
00:10:23,610 --> 00:10:28,470
where, say, 10 years, the
probability of survival

280
00:10:28,470 --> 00:10:31,352
is about 80% in the low-risk
group and more like 60%

281
00:10:31,352 --> 00:10:32,310
in the high-risk group.

282
00:10:32,310 --> 00:10:34,650
And overall, the
p-value's very small

283
00:10:34,650 --> 00:10:36,900
for there being a difference
between those two curves.

284
00:10:36,900 --> 00:10:39,128
So that's sort of like
what a successful type

285
00:10:39,128 --> 00:10:40,920
Kaplan-Meier plot would
look like if you're

286
00:10:40,920 --> 00:10:43,440
trying to create a model
that separates patients

287
00:10:43,440 --> 00:10:45,840
into groups with different
survival distributions

288
00:10:45,840 --> 00:10:47,670
And then it's always important
for these types of things

289
00:10:47,670 --> 00:10:49,030
to try them on
multiple data sets.

290
00:10:49,030 --> 00:10:51,655
And here we show the same model
applied to a different data set

291
00:10:51,655 --> 00:10:55,700
showed pretty similar overall
effectiveness at stratifying

292
00:10:55,700 --> 00:10:57,820
patients into two groups.

293
00:10:57,820 --> 00:11:01,320
So why do you think doing
this might be useful?

294
00:11:01,320 --> 00:11:02,280
I guess, yeah, anyone?

295
00:11:04,793 --> 00:11:06,960
Because there's actually,
I think this type of curve

296
00:11:06,960 --> 00:11:10,088
is often confused with one that
actually is extremely useful,

297
00:11:10,088 --> 00:11:11,130
which I would say-- yeah?

298
00:11:11,130 --> 00:11:12,380
PROFESSOR: Why don't you wait?

299
00:11:12,380 --> 00:11:13,690
ANDY BECK: Sure.

300
00:11:13,690 --> 00:11:14,690
PROFESSOR: Don't be shy.

301
00:11:18,500 --> 00:11:20,080
You can call them.

302
00:11:20,080 --> 00:11:21,270
ANDY BECK: All right.

303
00:11:21,270 --> 00:11:22,980
AUDIENCE: Probably
you can you use

304
00:11:22,980 --> 00:11:27,120
this to start off
when the patient's

305
00:11:27,120 --> 00:11:29,040
of high-risk and
probably at five years,

306
00:11:29,040 --> 00:11:32,907
if the patient has high-risk,
probably do a follow-up.

307
00:11:32,907 --> 00:11:33,990
ANDY BECK: Right, exactly.

308
00:11:33,990 --> 00:11:35,802
Yeah, yeah.

309
00:11:35,802 --> 00:11:37,010
So that would be a great use.

310
00:11:37,010 --> 00:11:38,100
PROFESSOR: Can you repeat the
question for the recording?

311
00:11:38,100 --> 00:11:40,280
ANDY BECK: So it was
saying like if you

312
00:11:40,280 --> 00:11:43,430
know someone's at a high
risk of having an event prior

313
00:11:43,430 --> 00:11:46,190
to five years, an event is
when the curve goes down.

314
00:11:46,190 --> 00:11:52,970
So definitely, the red group
is at 40, almost double

315
00:11:52,970 --> 00:11:56,150
or something the risk
of the black group.

316
00:11:56,150 --> 00:11:58,220
So if you have
certain interventions

317
00:11:58,220 --> 00:12:01,940
you can do to help prevent
these things, such as giving

318
00:12:01,940 --> 00:12:04,670
an additional treatment or
giving more frequent monitoring

319
00:12:04,670 --> 00:12:05,390
for recurrence.

320
00:12:05,390 --> 00:12:08,420
Like if you can do a follow-up
scan in a month versus six

321
00:12:08,420 --> 00:12:11,090
months, you could make that
decision in a data-driven way

322
00:12:11,090 --> 00:12:13,132
by knowing whether the
patient's on the red curve

323
00:12:13,132 --> 00:12:15,230
or the black curve.

324
00:12:15,230 --> 00:12:16,370
So yeah, exactly right.

325
00:12:16,370 --> 00:12:18,140
It helps you to make therapeutic
decisions when there's

326
00:12:18,140 --> 00:12:19,550
a bunch of things you
can do, either give

327
00:12:19,550 --> 00:12:21,967
more aggressive treatment or
do more aggressive monitoring

328
00:12:21,967 --> 00:12:24,500
of disease, depending on
is it aggressive disease

329
00:12:24,500 --> 00:12:25,930
or a non-aggressive disease.

330
00:12:25,930 --> 00:12:27,680
The other type of curve
that I think often

331
00:12:27,680 --> 00:12:29,222
gets confused with
these that's quite

332
00:12:29,222 --> 00:12:34,050
useful is one that directly
tests that intervention.

333
00:12:34,050 --> 00:12:35,720
So essentially, you
could do a trial

334
00:12:35,720 --> 00:12:39,580
of the usefulness, the clinical
utility of this algorithm,

335
00:12:39,580 --> 00:12:42,140
where on the one hand, you
make the prediction on everyone

336
00:12:42,140 --> 00:12:44,150
and don't do
anything differently.

337
00:12:44,150 --> 00:12:47,065
And then the other one
is you make a prediction

338
00:12:47,065 --> 00:12:48,440
on the patients,
and you actually

339
00:12:48,440 --> 00:12:52,220
use it to make a decision, like
more frequent treatment or more

340
00:12:52,220 --> 00:12:53,270
frequent intervention.

341
00:12:53,270 --> 00:12:55,580
And then you could
do a curve, saying

342
00:12:55,580 --> 00:12:59,120
among the high-risk patients,
where we actually acted on it,

343
00:12:59,120 --> 00:13:00,020
that's black.

344
00:13:00,020 --> 00:13:02,240
And if we didn't
act on it, it's red.

345
00:13:02,240 --> 00:13:04,640
And then, if you do the
experiment in the right way,

346
00:13:04,640 --> 00:13:08,240
you can make the inference
that you're actually

347
00:13:08,240 --> 00:13:12,140
preventing death by 50%
if the intervention is

348
00:13:12,140 --> 00:13:13,433
causing black versus red.

349
00:13:13,433 --> 00:13:15,350
Here, we're not doing
anything with causality.

350
00:13:15,350 --> 00:13:18,140
We're just sort of observing
how patients do differently

351
00:13:18,140 --> 00:13:18,890
over time.

352
00:13:18,890 --> 00:13:22,250
But frequently, you see these
as the figure, the key figure

353
00:13:22,250 --> 00:13:25,058
for a randomized
control trial, where

354
00:13:25,058 --> 00:13:27,350
the only thing different
between the groups of patients

355
00:13:27,350 --> 00:13:28,740
is the intervention.

356
00:13:28,740 --> 00:13:30,830
And that really lets you
make a powerful inference

357
00:13:30,830 --> 00:13:32,510
that changes what
care should be.

358
00:13:32,510 --> 00:13:34,340
This one, you're just like, OK,
maybe we should do something

359
00:13:34,340 --> 00:13:35,715
differently, but
not really sure,

360
00:13:35,715 --> 00:13:37,033
but it makes intuitive sense.

361
00:13:37,033 --> 00:13:38,450
But if you actually
have something

362
00:13:38,450 --> 00:13:40,220
from a randomized clinical
trial or something else

363
00:13:40,220 --> 00:13:41,900
that allows you to
infer causality,

364
00:13:41,900 --> 00:13:45,380
this is the most
important figure.

365
00:13:45,380 --> 00:13:48,200
And you can actually infer
how many lives are being saved

366
00:13:48,200 --> 00:13:49,737
or things by doing something.

367
00:13:49,737 --> 00:13:51,320
But this one's not
about intervention.

368
00:13:51,320 --> 00:13:53,090
It's just about
sort of observing

369
00:13:53,090 --> 00:13:56,130
how patients do over time.

370
00:13:56,130 --> 00:13:59,750
So that was some of the
work from eight years ago,

371
00:13:59,750 --> 00:14:02,083
and none of this has
really changed in practice.

372
00:14:02,083 --> 00:14:04,250
Everyone is still using
glass slides and microscopes

373
00:14:04,250 --> 00:14:05,030
in the clinic.

374
00:14:05,030 --> 00:14:07,290
Research is a totally
different story.

375
00:14:07,290 --> 00:14:10,583
But still, 99% of
clinic is using

376
00:14:10,583 --> 00:14:12,500
these old-fashioned
technologies-- microscopes

377
00:14:12,500 --> 00:14:15,830
from technology breakthroughs
in the mid-1800s, staining

378
00:14:15,830 --> 00:14:18,020
breakthroughs in the late 1800s.

379
00:14:18,020 --> 00:14:20,480
The H and E stain
is the key stain.

380
00:14:20,480 --> 00:14:23,600
So aspects of pathology
haven't moved forward at all,

381
00:14:23,600 --> 00:14:26,300
and this has pretty
significant consequences.

382
00:14:26,300 --> 00:14:28,040
And here's just
a couple of types

383
00:14:28,040 --> 00:14:30,020
of figures that really
allow you to see

384
00:14:30,020 --> 00:14:32,660
the primary data for what
a problem interobserver

385
00:14:32,660 --> 00:14:34,963
variability really is
in clinical practice.

386
00:14:34,963 --> 00:14:36,380
And this is just
another, I think,

387
00:14:36,380 --> 00:14:40,160
really nice, empirical
way of viewing raw data,

388
00:14:40,160 --> 00:14:44,060
where there is a ground
truth consensus of experts,

389
00:14:44,060 --> 00:14:47,510
who sort of decided what all
these 70 or so cases were,

390
00:14:47,510 --> 00:14:50,210
through experts always
knowing the right answer.

391
00:14:50,210 --> 00:14:52,160
And for all of these
70, called them

392
00:14:52,160 --> 00:14:53,930
all the category
of atypia, which

393
00:14:53,930 --> 00:14:55,530
here is indicated in yellow.

394
00:14:55,530 --> 00:14:57,350
And then they took
all of these 70 cases

395
00:14:57,350 --> 00:14:59,450
that the experts that
are atypia and sent them

396
00:14:59,450 --> 00:15:02,540
to hundreds of pathologists
across the country

397
00:15:02,540 --> 00:15:05,150
and for each one, just
plotted the distribution

398
00:15:05,150 --> 00:15:07,040
of different diagnoses
they were receiving.

399
00:15:07,040 --> 00:15:09,500
And quite strikingly-- and
this was published in JAMA,

400
00:15:09,500 --> 00:15:12,200
a great journal, about
four years ago now--

401
00:15:12,200 --> 00:15:14,150
they show this
incredible distribution

402
00:15:14,150 --> 00:15:16,500
of different diagnoses
among each case.

403
00:15:16,500 --> 00:15:18,112
So this is really
why you might want

404
00:15:18,112 --> 00:15:20,570
a computational approach is
there should be the same color.

405
00:15:20,570 --> 00:15:22,987
This should just be one big
color or maybe a few outliers,

406
00:15:22,987 --> 00:15:25,670
but for almost any case,
there's a significant proportion

407
00:15:25,670 --> 00:15:27,860
of people calling it
normal, which is yellow--

408
00:15:27,860 --> 00:15:30,350
or sorry, tan, then
atypical, which is yellow,

409
00:15:30,350 --> 00:15:33,170
and then actually cancer,
which is orange or red.

410
00:15:33,170 --> 00:15:35,010
PROFESSOR: What
does atypical mean?

411
00:15:35,010 --> 00:15:37,850
ANDY BECK: Yeah, so atypical
is this border area between

412
00:15:37,850 --> 00:15:41,990
totally normal and cancer,
where the pathologist is saying

413
00:15:41,990 --> 00:15:42,500
it's--

414
00:15:42,500 --> 00:15:45,417
which is actually the
most important diagnosis

415
00:15:45,417 --> 00:15:47,000
because totally
normal you do nothing.

416
00:15:47,000 --> 00:15:50,350
Cancer-- there's well-described
protocols for what to do.

417
00:15:50,350 --> 00:15:52,035
Atypia, they often overtreat.

418
00:15:52,035 --> 00:15:53,660
And that's sort of
the bias in medicine

419
00:15:53,660 --> 00:15:56,580
is always assume the worst when
you get a certain diagnosis

420
00:15:56,580 --> 00:15:57,080
back.

421
00:15:57,080 --> 00:16:01,067
So atypia has nuclear features
of cancer but doesn't fully.

422
00:16:01,067 --> 00:16:02,900
You know, maybe you get
7 of the 10 criteria

423
00:16:02,900 --> 00:16:05,160
or three of the five criteria.

424
00:16:05,160 --> 00:16:07,130
And it has to do
with sort of nuclei

425
00:16:07,130 --> 00:16:10,010
looking a little bigger and a
little weirder than expected

426
00:16:10,010 --> 00:16:12,710
but not enough where the
pathologist feels comfortable

427
00:16:12,710 --> 00:16:13,730
calling it cancer.

428
00:16:13,730 --> 00:16:15,355
And that's part of
the reason that that

429
00:16:15,355 --> 00:16:17,540
shows almost a coin flip.

430
00:16:17,540 --> 00:16:21,620
Of the ones the experts
called atypia, only 48%

431
00:16:21,620 --> 00:16:23,240
was agreed with
in the community.

432
00:16:23,240 --> 00:16:26,090
The other interesting thing the
study showed was intraobserver

433
00:16:26,090 --> 00:16:29,000
variability is just as big
of an issue as interobserver.

434
00:16:29,000 --> 00:16:33,410
So a person disagrees with
themselves after an eight month

435
00:16:33,410 --> 00:16:35,240
washout period
pretty much as often

436
00:16:35,240 --> 00:16:37,340
as they disagree with others.

437
00:16:37,340 --> 00:16:41,690
So another reason why
computational approaches

438
00:16:41,690 --> 00:16:43,998
would be valuable and why
this really is a problem.

439
00:16:43,998 --> 00:16:45,290
And this is in breast biopsies.

440
00:16:45,290 --> 00:16:47,900
The same research group
showed quite similar results.

441
00:16:47,900 --> 00:16:51,457
This was in British Medical
Journal in skin biopsies, which

442
00:16:51,457 --> 00:16:53,540
is another super important
area, where, again they

443
00:16:53,540 --> 00:16:55,880
have the same type of
visualization of data.

444
00:16:55,880 --> 00:17:00,470
They have five different classes
of severity of skin lesions,

445
00:17:00,470 --> 00:17:02,690
ranging from a totally
normal benign nevus, like I'm

446
00:17:02,690 --> 00:17:05,690
sure many of us have on
our skin to a melanoma,

447
00:17:05,690 --> 00:17:09,319
which is a serious, malignant
cancer that needs to be treated

448
00:17:09,319 --> 00:17:11,569
as soon as possible.

449
00:17:11,569 --> 00:17:14,470
And here, the white
color is totally benign.

450
00:17:14,470 --> 00:17:16,790
The darker blue
color is melanoma.

451
00:17:16,790 --> 00:17:19,369
And again, they show lots of
discordance, pretty much as

452
00:17:19,369 --> 00:17:22,790
bad as in the breast biopsies.

453
00:17:22,790 --> 00:17:25,550
And here again, the
intraobserver variability

454
00:17:25,550 --> 00:17:28,303
with an eight-month washout
period was about 33%.

455
00:17:28,303 --> 00:17:29,720
So people disagree
with themselves

456
00:17:29,720 --> 00:17:30,678
one out of three times.

457
00:17:33,210 --> 00:17:36,030
And then these aren't totally
outlier cases or one research

458
00:17:36,030 --> 00:17:36,530
group.

459
00:17:36,530 --> 00:17:38,660
The College of
American Pathologists

460
00:17:38,660 --> 00:17:42,770
did a big summary of 116
studies and showed overall,

461
00:17:42,770 --> 00:17:47,750
an 18.3% median discrepancy
rate across all the studies

462
00:17:47,750 --> 00:17:49,880
and a 6% major
discrepancy rate, which

463
00:17:49,880 --> 00:17:51,740
would be a major
clinical decision

464
00:17:51,740 --> 00:17:55,045
is the wrong one, like
surgery, no surgery, et cetera.

465
00:17:55,045 --> 00:17:56,420
And those sort of
in the ballpark

466
00:17:56,420 --> 00:18:00,020
agree with the previously
published findings.

467
00:18:00,020 --> 00:18:02,640
So a lot of reasons
to be pessimistic

468
00:18:02,640 --> 00:18:05,907
but one reason to be very
optimistic is the one area

469
00:18:05,907 --> 00:18:08,240
where AI is not-- not the one
area, but maybe one of two

470
00:18:08,240 --> 00:18:11,990
or three areas where AI is
not total hype is vision.

471
00:18:11,990 --> 00:18:14,690
Vision really started working
well as, I don't if you've

472
00:18:14,690 --> 00:18:17,480
covered in this class but with
deep convolutional neural nets

473
00:18:17,480 --> 00:18:18,410
in 2012.

474
00:18:18,410 --> 00:18:20,300
And then all the
groups sort of just

475
00:18:20,300 --> 00:18:23,480
kept getting incrementally
better year over year.

476
00:18:23,480 --> 00:18:25,640
And now this is an
old graph from 2015,

477
00:18:25,640 --> 00:18:27,860
but there's been a huge
development of methods

478
00:18:27,860 --> 00:18:31,133
even since 2015, where
now I think we really

479
00:18:31,133 --> 00:18:33,800
understand the strengths and the
weaknesses of these approaches.

480
00:18:33,800 --> 00:18:36,270
And pathology sort of has
a lot of the strengths,

481
00:18:36,270 --> 00:18:40,340
which is super well-defined,
very focused questions.

482
00:18:40,340 --> 00:18:42,743
And I think there's lots of
failures whenever you try

483
00:18:42,743 --> 00:18:43,910
to do anything more general.

484
00:18:43,910 --> 00:18:46,452
But for the types of tasks where
you know exactly what you're

485
00:18:46,452 --> 00:18:49,360
looking for and you can
generate the training data,

486
00:18:49,360 --> 00:18:51,660
these systems can
work really well.

487
00:18:51,660 --> 00:18:54,560
So that's a lot of what
we're focused on at PathAI

488
00:18:54,560 --> 00:18:56,640
is how do we extract
the most information out

489
00:18:56,640 --> 00:18:58,480
of pathology images
really doing two things.

490
00:18:58,480 --> 00:19:00,750
One is understanding
what's inside the images

491
00:19:00,750 --> 00:19:03,720
and the second is using deep
learning to sort of directly

492
00:19:03,720 --> 00:19:06,240
try to infer patient
level phenotypes

493
00:19:06,240 --> 00:19:08,750
and outcomes directly
from the images.

494
00:19:08,750 --> 00:19:10,800
And we use both
traditional machine

495
00:19:10,800 --> 00:19:12,270
learning models
for certain things,

496
00:19:12,270 --> 00:19:13,890
like particularly
making inference

497
00:19:13,890 --> 00:19:16,290
at the patient level, where
n is often very small.

498
00:19:16,290 --> 00:19:18,770
But anything that's directly
operating on the image

499
00:19:18,770 --> 00:19:22,860
is almost some variant always of
deep convolutional neural nets,

500
00:19:22,860 --> 00:19:27,930
which really are the state of
the art for image processing.

501
00:19:27,930 --> 00:19:31,010
And we sort of, a lot of what
we think about at PathAI,

502
00:19:31,010 --> 00:19:34,020
and I think what's really
important in this area of ML

503
00:19:34,020 --> 00:19:36,360
for medicine is generating
the right data set

504
00:19:36,360 --> 00:19:38,250
and then using things
like deep learning

505
00:19:38,250 --> 00:19:40,733
to optimize all of the
features in a data-driven away,

506
00:19:40,733 --> 00:19:42,150
and then really
thinking about how

507
00:19:42,150 --> 00:19:43,890
to use the outputs
of these models

508
00:19:43,890 --> 00:19:47,347
intelligently and really
validate them in a robust way,

509
00:19:47,347 --> 00:19:48,930
because there's many
ways to be fooled

510
00:19:48,930 --> 00:19:52,570
by artefacts and other things.

511
00:19:52,570 --> 00:19:54,150
So just some of the--

512
00:19:54,150 --> 00:19:57,255
not to belabor the points, but
why these approaches are really

513
00:19:57,255 --> 00:19:59,130
valuable in this
application is it allows you

514
00:19:59,130 --> 00:20:00,870
to exhaustively analyze slides.

515
00:20:00,870 --> 00:20:02,663
So a pathologist,
the reason they're

516
00:20:02,663 --> 00:20:05,080
making so many errors is they're
just kind of overwhelmed.

517
00:20:05,080 --> 00:20:06,247
I mean, there's two reasons.

518
00:20:06,247 --> 00:20:08,880
One is humans aren't good at
interpreting visual patterns.

519
00:20:08,880 --> 00:20:11,380
Actually, I think that's not
the real reason, because humans

520
00:20:11,380 --> 00:20:12,730
are pretty darn good at that.

521
00:20:12,730 --> 00:20:14,938
And there are difficult
things where we can disagree,

522
00:20:14,938 --> 00:20:18,750
but when people focus on small
images, frequently they agree.

523
00:20:18,750 --> 00:20:21,780
But these images are
enormous, and humans just

524
00:20:21,780 --> 00:20:24,270
don't have enough time to
study carefully every cell

525
00:20:24,270 --> 00:20:25,170
on every slide.

526
00:20:25,170 --> 00:20:27,450
Whereas, the computer,
in a real way,

527
00:20:27,450 --> 00:20:30,000
can be forced to
exhaustively analyze

528
00:20:30,000 --> 00:20:33,940
every cell on every slide, and
that's just a huge difference.

529
00:20:33,940 --> 00:20:34,950
It's quantitative.

530
00:20:34,950 --> 00:20:36,420
I mean, this is one thing
the computer is definitely

531
00:20:36,420 --> 00:20:36,920
better at.

532
00:20:36,920 --> 00:20:39,150
It can compute huge
numerators, huge denominators,

533
00:20:39,150 --> 00:20:40,635
and exactly compute proportions.

534
00:20:40,635 --> 00:20:42,510
Whereas, when a person
is looking at a slide,

535
00:20:42,510 --> 00:20:44,677
they're really just eyeballing
some percentage based

536
00:20:44,677 --> 00:20:46,050
on a very small amount of data.

537
00:20:46,050 --> 00:20:47,110
It's super efficient.

538
00:20:47,110 --> 00:20:49,590
So you can analyze--

539
00:20:49,590 --> 00:20:52,710
this whole process is
massively paralyzable,

540
00:20:52,710 --> 00:20:54,690
so you can almost
do a slide as fast

541
00:20:54,690 --> 00:20:57,978
as you want based on how much
you're willing to spend on it.

542
00:20:57,978 --> 00:21:00,270
And it allows you not only
do all of of these, sort of,

543
00:21:00,270 --> 00:21:02,980
automation tasks exhaustively,
quantitatively, and efficiently

544
00:21:02,980 --> 00:21:05,610
but also discover a lot of new
insights from the data, which

545
00:21:05,610 --> 00:21:07,312
I think we did in
a very early way,

546
00:21:07,312 --> 00:21:09,020
back eight years ago,
when we sort of had

547
00:21:09,020 --> 00:21:11,640
human-extracted features
correlate those with outcome.

548
00:21:11,640 --> 00:21:13,890
But now you can really
supervise the whole process

549
00:21:13,890 --> 00:21:15,600
with machine learning
of how you go

550
00:21:15,600 --> 00:21:19,200
from the components of an
image to patient outcomes

551
00:21:19,200 --> 00:21:24,330
and learn new biology that
you didn't know going in.

552
00:21:24,330 --> 00:21:25,800
And everyone's
always like, well,

553
00:21:25,800 --> 00:21:27,592
are you just going to
replace pathologists?

554
00:21:27,592 --> 00:21:31,080
And I really don't think this
is, in any way, the future.

555
00:21:31,080 --> 00:21:35,850
In almost every field that's
sort of like where automation

556
00:21:35,850 --> 00:21:38,550
is becoming very
common, the demand

557
00:21:38,550 --> 00:21:41,440
for people who are experts
in that area is increasing.

558
00:21:41,440 --> 00:21:43,470
And like airplane
pilots is one I was just

559
00:21:43,470 --> 00:21:44,792
learning about today.

560
00:21:44,792 --> 00:21:46,500
They just do a completely
different thing

561
00:21:46,500 --> 00:21:48,330
than they did 20 years
ago, and now it's

562
00:21:48,330 --> 00:21:51,450
all about mission control
of this big system

563
00:21:51,450 --> 00:21:53,575
and understanding all the
flight management systems

564
00:21:53,575 --> 00:21:55,533
and understanding all
the data they're getting.

565
00:21:55,533 --> 00:21:57,900
And I think the job has not
gotten necessarily simpler,

566
00:21:57,900 --> 00:21:59,340
but they're much more
effective, and they're doing

567
00:21:59,340 --> 00:22:00,790
much different types of work.

568
00:22:00,790 --> 00:22:01,920
And I do think
the pathologist is

569
00:22:01,920 --> 00:22:03,337
going to move from
sort of staring

570
00:22:03,337 --> 00:22:06,150
into a microscope with a
literally very myopic focus

571
00:22:06,150 --> 00:22:08,368
on very small
things to being more

572
00:22:08,368 --> 00:22:10,410
of a consultant with
physicians, integrating lots

573
00:22:10,410 --> 00:22:13,260
of different types
of data, things

574
00:22:13,260 --> 00:22:15,210
that AI is really bad
at, a lot of reasoning

575
00:22:15,210 --> 00:22:19,033
about specific instances,
and then providing

576
00:22:19,033 --> 00:22:20,200
that guidance to physicians.

577
00:22:20,200 --> 00:22:22,075
So I think the job will
look a lot different,

578
00:22:22,075 --> 00:22:25,485
but we never really needed more
diagnosticians in the future

579
00:22:25,485 --> 00:22:28,420
than in the past.

580
00:22:28,420 --> 00:22:30,690
So one example, I think
we sent out a reading

581
00:22:30,690 --> 00:22:33,900
about this was this concept
of breast cancer metastasis

582
00:22:33,900 --> 00:22:36,780
is a good use case
of machine learning.

583
00:22:36,780 --> 00:22:38,520
And this is just
a patient example.

584
00:22:38,520 --> 00:22:41,370
So a primary mass is discovered.

585
00:22:41,370 --> 00:22:44,460
So one of the big
determinants of the prognosis

586
00:22:44,460 --> 00:22:47,088
from a primary tumor is has
it spread to the lymph nodes?

587
00:22:47,088 --> 00:22:48,630
Because that's one
of the first areas

588
00:22:48,630 --> 00:22:51,240
that tumors metastasize to.

589
00:22:51,240 --> 00:22:53,695
And the way to diagnose whether
tumors have metastasized

590
00:22:53,695 --> 00:22:55,950
to lymph nodes is
to take a biopsy

591
00:22:55,950 --> 00:22:58,200
and then evaluate those
for the presence of cancer

592
00:22:58,200 --> 00:23:00,960
where it shouldn't be.

593
00:23:00,960 --> 00:23:04,980
And this is a task that's very
quantitative and very tedious.

594
00:23:04,980 --> 00:23:08,700
So the International Symposium
on Biomedical Imaging

595
00:23:08,700 --> 00:23:12,090
organized this challenge called
the Chameleon 16 Challenge,

596
00:23:12,090 --> 00:23:14,910
where they put together almost
300 training slides and about

597
00:23:14,910 --> 00:23:16,740
130 test slides.

598
00:23:16,740 --> 00:23:19,950
And they asked a bunch of teams
to build machine learning based

599
00:23:19,950 --> 00:23:23,940
systems to automate the
evaluation of the test

600
00:23:23,940 --> 00:23:26,940
slides, both to diagnose whether
the slide contained cancer

601
00:23:26,940 --> 00:23:29,490
or not, as well as to actually
identify where in the slides

602
00:23:29,490 --> 00:23:31,560
the cancer was located.

603
00:23:31,560 --> 00:23:34,170
And kind of the big machine
learning challenge here,

604
00:23:34,170 --> 00:23:38,790
why you can't just throw
it into a off-the-shelf

605
00:23:38,790 --> 00:23:42,620
or on the web image
classification tool

606
00:23:42,620 --> 00:23:46,650
is the images are so
large that it's just not

607
00:23:46,650 --> 00:23:50,430
feasible to throw
the whole image

608
00:23:50,430 --> 00:23:53,070
into any kind of neural net.

609
00:23:53,070 --> 00:23:57,150
Because they can be
between 20,000 and 200,000

610
00:23:57,150 --> 00:23:58,260
pixels on a side.

611
00:23:58,260 --> 00:24:03,840
So they have millions of pixels.

612
00:24:03,840 --> 00:24:06,540
And for that, we do
this process where

613
00:24:06,540 --> 00:24:08,280
we start with a
labeled data set,

614
00:24:08,280 --> 00:24:10,920
where there are these very
large regions labeled either

615
00:24:10,920 --> 00:24:12,960
as normal or tumor.

616
00:24:12,960 --> 00:24:14,970
And then we build
procedures, which is actually

617
00:24:14,970 --> 00:24:17,460
a key component of getting
machine learning to work well,

618
00:24:17,460 --> 00:24:20,550
of sampling patches of images
and putting those patches

619
00:24:20,550 --> 00:24:22,110
into the model.

620
00:24:22,110 --> 00:24:23,760
And this sampling
procedure is actually

621
00:24:23,760 --> 00:24:26,820
incredibly important
for controlling

622
00:24:26,820 --> 00:24:29,160
the behavior of the
system, because you could

623
00:24:29,160 --> 00:24:30,510
sample in all different ways.

624
00:24:30,510 --> 00:24:32,218
You're never going to
sample exhaustively

625
00:24:32,218 --> 00:24:35,160
just because there's far
too many possible patches.

626
00:24:35,160 --> 00:24:37,920
So thinking about the right
examples to show the system

627
00:24:37,920 --> 00:24:40,950
has an enormous effect
on both the performance

628
00:24:40,950 --> 00:24:43,800
and the generalizability of
the systems you're building.

629
00:24:43,800 --> 00:24:45,900
And some of the, sort
of, insights we learned

630
00:24:45,900 --> 00:24:49,290
was how best to do
the, sort of, sampling.

631
00:24:49,290 --> 00:24:51,832
But once you have these samples,
it's all data driven-- sure.

632
00:24:51,832 --> 00:24:54,123
AUDIENCE: Can you talk more
about the sampling strategy

633
00:24:54,123 --> 00:24:54,740
schemes?

634
00:24:54,740 --> 00:24:58,200
ANDY BECK: Yeah, so
from a high level,

635
00:24:58,200 --> 00:25:01,800
you want to go from
random sampling, which

636
00:25:01,800 --> 00:25:06,000
is a reasonable thing to do,
to more intelligent sampling,

637
00:25:06,000 --> 00:25:09,210
based on knowing what
the computer needs

638
00:25:09,210 --> 00:25:12,220
to learn more about.

639
00:25:12,220 --> 00:25:15,413
And one thing we've done and--

640
00:25:15,413 --> 00:25:17,580
so it's sort of like
figuring-- so the first step is

641
00:25:17,580 --> 00:25:19,050
sort of simple.

642
00:25:19,050 --> 00:25:21,158
You can randomly sample.

643
00:25:21,158 --> 00:25:22,950
But then the second
part is a little harder

644
00:25:22,950 --> 00:25:24,810
to figure out what
examples do you

645
00:25:24,810 --> 00:25:27,540
want to enrich your
training set for to make

646
00:25:27,540 --> 00:25:29,180
the system perform even better?

647
00:25:29,180 --> 00:25:31,680
And there's different things
you can optimize for, for that.

648
00:25:31,680 --> 00:25:33,780
So it's sort of like this
whole sampling actually

649
00:25:33,780 --> 00:25:35,197
being part of the
machine learning

650
00:25:35,197 --> 00:25:37,383
procedure is quite useful.

651
00:25:37,383 --> 00:25:39,300
And you're not just going
to be sampling once.

652
00:25:39,300 --> 00:25:41,485
You could iterate on
this and keep providing

653
00:25:41,485 --> 00:25:42,610
different types of samples.

654
00:25:42,610 --> 00:25:45,030
So for example, if
you learn that it's

655
00:25:45,030 --> 00:25:48,090
missing certain types
of errors, or it

656
00:25:48,090 --> 00:25:49,602
hasn't seen enough of certain--

657
00:25:49,602 --> 00:25:51,060
there's many ways
of getting at it.

658
00:25:51,060 --> 00:25:54,000
But if you know it hasn't
seen enough types of examples

659
00:25:54,000 --> 00:25:56,640
in your training set, you
can over-sample for that.

660
00:25:56,640 --> 00:25:58,500
Or if you see you have
a confusion matrix

661
00:25:58,500 --> 00:26:00,450
and you see it's failing
on certain types,

662
00:26:00,450 --> 00:26:02,617
you can try to figure out
why is it failing on those

663
00:26:02,617 --> 00:26:04,950
and alter the sampling
procedure to enrich for that.

664
00:26:04,950 --> 00:26:07,740
You could even provide
outputs to humans,

665
00:26:07,740 --> 00:26:11,460
who can point you to the areas
where it's making mistakes.

666
00:26:11,460 --> 00:26:13,860
Because often you don't
have exhaustively labeled.

667
00:26:13,860 --> 00:26:16,720
In this case, we actually
did have exhaustively labeled

668
00:26:16,720 --> 00:26:17,220
slides.

669
00:26:17,220 --> 00:26:18,695
So it was somewhat easier.

670
00:26:18,695 --> 00:26:20,820
But you can see there's
even a lot of heterogeneity

671
00:26:20,820 --> 00:26:22,270
within the different classes.

672
00:26:22,270 --> 00:26:25,950
So you might do some
clever tricks to figure out

673
00:26:25,950 --> 00:26:28,470
what are the types of the red
class that it's getting wrong,

674
00:26:28,470 --> 00:26:31,330
and how am I going to fix that
by providing it more examples?

675
00:26:31,330 --> 00:26:34,650
So I think, sort of, that's
one of the easier things

676
00:26:34,650 --> 00:26:35,490
to control.

677
00:26:35,490 --> 00:26:38,310
Rather than trying to
tune other parameters

678
00:26:38,310 --> 00:26:41,730
within these super complicated
networks, in our experience,

679
00:26:41,730 --> 00:26:44,787
just playing with the
training, the sampling

680
00:26:44,787 --> 00:26:46,620
piece of the training,
it should almost just

681
00:26:46,620 --> 00:26:48,037
be thought of as
another parameter

682
00:26:48,037 --> 00:26:50,670
to optimize for when you're
dealing with a problem

683
00:26:50,670 --> 00:26:53,040
where you have humongous
slides and you can't

684
00:26:53,040 --> 00:26:56,090
use all the training data.

685
00:26:56,090 --> 00:26:59,250
AUDIENCE: So decades ago,
I met some pathologists

686
00:26:59,250 --> 00:27:03,390
who were looking at
cervical cancer screening.

687
00:27:03,390 --> 00:27:08,130
And they thought that you
could detect a gradient

688
00:27:08,130 --> 00:27:11,140
in the degree of atypia.

689
00:27:11,140 --> 00:27:15,210
And so not at training time
but at testing time, what

690
00:27:15,210 --> 00:27:18,300
they were trying to do was to
follow that gradient in order

691
00:27:18,300 --> 00:27:25,450
to find the most atypical
part of of the image.

692
00:27:25,450 --> 00:27:27,583
Is that still
believed to be true?

693
00:27:27,583 --> 00:27:28,250
ANDY BECK: Yeah.

694
00:27:28,250 --> 00:27:29,890
That it's a continuum?

695
00:27:29,890 --> 00:27:31,380
Yeah, definitely.

696
00:27:31,380 --> 00:27:35,240
PROFESSOR: You mean within
a sample and in the slides.

697
00:27:35,240 --> 00:27:36,980
ANDY BECK: Yeah, I
mean, you mean just

698
00:27:36,980 --> 00:27:39,050
like a continuum
of aggressiveness.

699
00:27:39,050 --> 00:27:40,570
Yeah, I think it is a continuum.

700
00:27:40,570 --> 00:27:43,460
I mean, this is more
of a binary task,

701
00:27:43,460 --> 00:27:45,590
but there's going
to be continuums

702
00:27:45,590 --> 00:27:47,720
of grade within the cancer.

703
00:27:47,720 --> 00:27:50,000
I mean, that's another
level of adding on.

704
00:27:50,000 --> 00:27:52,130
If we wanted to correlate
this with outcome,

705
00:27:52,130 --> 00:27:54,380
it would definitely be
valuable to do that.

706
00:27:54,380 --> 00:27:56,730
To not just say quantitate
the bulk of tumor

707
00:27:56,730 --> 00:28:00,770
but to estimate the malignancy
of every individual nucleus,

708
00:28:00,770 --> 00:28:02,270
which we can do also.

709
00:28:02,270 --> 00:28:05,180
So you can actually classify,
not just tumor region

710
00:28:05,180 --> 00:28:06,830
but you can classify
individual cells.

711
00:28:06,830 --> 00:28:08,893
And you can classify
them based on malignancy.

712
00:28:08,893 --> 00:28:10,310
And then you can
get the, sort of,

713
00:28:10,310 --> 00:28:12,500
gradient within a population.

714
00:28:12,500 --> 00:28:16,130
In this study, it was just a
region-based, not a cell-based,

715
00:28:16,130 --> 00:28:18,800
but you can definitely do
that, and definitely, it's

716
00:28:18,800 --> 00:28:19,390
a spectrum.

717
00:28:19,390 --> 00:28:21,140
I mean, it's kind of
like the atypia idea.

718
00:28:21,140 --> 00:28:23,690
Everything in biology is
pretty much on a spectrum,

719
00:28:23,690 --> 00:28:27,080
like from normal to atypical
to low-grade cancer,

720
00:28:27,080 --> 00:28:29,840
medium-grade cancer,
high-grade cancer,

721
00:28:29,840 --> 00:28:31,370
and these sorts of
methods do allow

722
00:28:31,370 --> 00:28:34,250
you to really more
precisely estimate

723
00:28:34,250 --> 00:28:35,827
where you are on that continuum.

724
00:28:38,690 --> 00:28:41,350
And that's the basic approach.

725
00:28:41,350 --> 00:28:42,910
We get the big
whole site images.

726
00:28:42,910 --> 00:28:44,667
We figure out how
to sample patches

727
00:28:44,667 --> 00:28:46,750
from the different regions
to optimize performance

728
00:28:46,750 --> 00:28:48,370
of the model during
training time.

729
00:28:48,370 --> 00:28:50,153
And then during
testing time, just we

730
00:28:50,153 --> 00:28:51,570
take a whole big
whole site image.

731
00:28:51,570 --> 00:28:53,680
We break it into millions
of little patches.

732
00:28:53,680 --> 00:28:55,662
Send each patch individually.

733
00:28:55,662 --> 00:28:57,370
We don't actually--
you could potentially

734
00:28:57,370 --> 00:28:59,740
use spatial information
about how close they

735
00:28:59,740 --> 00:29:01,240
are to each other,
which would make

736
00:29:01,240 --> 00:29:02,992
the process less efficient.

737
00:29:02,992 --> 00:29:03,700
We don't do that.

738
00:29:03,700 --> 00:29:05,320
We just send them
in individually

739
00:29:05,320 --> 00:29:08,440
and then visualize the
output as a heat map.

740
00:29:10,960 --> 00:29:13,030
And this, I think,
isn't in the reference

741
00:29:13,030 --> 00:29:15,700
I sent so the one
I sent showed how

742
00:29:15,700 --> 00:29:19,750
you were able to combine the
estimates of the deep learning

743
00:29:19,750 --> 00:29:22,420
system with the human
pathologist's estimate

744
00:29:22,420 --> 00:29:25,870
to make the human pathologist's
error rate go down by 85%

745
00:29:25,870 --> 00:29:28,147
and get to less than 1%.

746
00:29:28,147 --> 00:29:30,730
And the interesting thing about
how these systems keep getting

747
00:29:30,730 --> 00:29:32,647
better over time and
potentially they over-fit

748
00:29:32,647 --> 00:29:34,640
to the competition data set--

749
00:29:34,640 --> 00:29:36,910
because I think we submitted,
maybe, three times,

750
00:29:36,910 --> 00:29:38,020
which isn't that many.

751
00:29:38,020 --> 00:29:41,890
But over the course of six
months after the first closing

752
00:29:41,890 --> 00:29:44,650
of the competition, people kept
competing and making systems

753
00:29:44,650 --> 00:29:45,220
better.

754
00:29:45,220 --> 00:29:46,887
And actually, the
fully automated system

755
00:29:46,887 --> 00:29:49,840
on this data set achieved an
error rate of less than 1%

756
00:29:49,840 --> 00:29:53,260
by the final submission date,
which was significantly better

757
00:29:53,260 --> 00:29:55,810
than both the pathologists
in the competition, which

758
00:29:55,810 --> 00:29:58,720
is the error rate, I believe,
cited in the initial archive

759
00:29:58,720 --> 00:30:00,210
paper.

760
00:30:00,210 --> 00:30:01,960
And also, they took
the same set of slides

761
00:30:01,960 --> 00:30:03,760
and sent them out to
pathologists operating

762
00:30:03,760 --> 00:30:06,850
in clinical practice, where
they had really significantly

763
00:30:06,850 --> 00:30:09,110
higher error rates,
mainly due to the fact,

764
00:30:09,110 --> 00:30:11,650
they were more constrained
by time limitations

765
00:30:11,650 --> 00:30:13,840
in clinical practice
than in the competition.

766
00:30:13,840 --> 00:30:15,820
And most of the errors they
are making are false negatives.

767
00:30:15,820 --> 00:30:17,528
Simply, they don't
have the time to focus

768
00:30:17,528 --> 00:30:21,610
on small regions of metastasis
amid these humongous giga

769
00:30:21,610 --> 00:30:24,432
pixel-size slides.

770
00:30:24,432 --> 00:30:27,780
AUDIENCE: In the paper, you
say you combined the machine

771
00:30:27,780 --> 00:30:29,870
learning options with
the pathologists,

772
00:30:29,870 --> 00:30:31,410
but you don't really say how.

773
00:30:31,410 --> 00:30:33,790
Is that it that they
look at the heat maps,

774
00:30:33,790 --> 00:30:36,718
or is it just sort of combined?

775
00:30:36,718 --> 00:30:38,510
ANDY BECK: Yeah, no,
it's a great question.

776
00:30:38,510 --> 00:30:41,510
So today, we do it that way.

777
00:30:41,510 --> 00:30:43,150
And that's the way
in clinical practice

778
00:30:43,150 --> 00:30:45,700
we're building it, that the
pathologists will look at both

779
00:30:45,700 --> 00:30:48,610
and then make a diagnosis
based on incorporating both.

780
00:30:48,610 --> 00:30:51,040
For the competition,
it was very simple,

781
00:30:51,040 --> 00:30:52,690
and the organizers
actually did it.

782
00:30:52,690 --> 00:30:54,190
They interpreted
them independently.

783
00:30:54,190 --> 00:30:56,273
So the pathologists just
looked at all the slides.

784
00:30:56,273 --> 00:30:57,620
Our system made a prediction.

785
00:30:57,620 --> 00:31:00,040
It was literally the
average of the probability

786
00:31:00,040 --> 00:31:01,615
that that slide
contained cancer.

787
00:31:01,615 --> 00:31:03,490
That became the final
score, and then the AUC

788
00:31:03,490 --> 00:31:06,100
went to 99% from
whatever it was,

789
00:31:06,100 --> 00:31:08,840
92% by combining
these two scores.

790
00:31:08,840 --> 00:31:10,840
AUDIENCE: I guess they
make uncorrelated errors.

791
00:31:10,840 --> 00:31:11,632
ANDY BECK: Exactly.

792
00:31:11,632 --> 00:31:13,110
They're pretty
much uncorrelated,

793
00:31:13,110 --> 00:31:14,860
particularly because
the pathologists tend

794
00:31:14,860 --> 00:31:16,990
to have almost all
false negatives,

795
00:31:16,990 --> 00:31:20,050
and the deep
learning system tends

796
00:31:20,050 --> 00:31:22,090
to be fooled by a few
things, like artefact.

797
00:31:22,090 --> 00:31:24,190
And they do make
uncorrelated errors,

798
00:31:24,190 --> 00:31:26,275
and that's why there's a
huge bump in performance.

799
00:31:31,230 --> 00:31:33,180
So I kind of made a
reference to this,

800
00:31:33,180 --> 00:31:35,280
but any of these
competition data sets

801
00:31:35,280 --> 00:31:38,335
are relatively easy
to get really good at.

802
00:31:38,335 --> 00:31:39,960
People have shown
that you can actually

803
00:31:39,960 --> 00:31:42,757
build models that just predict
a data set using deep learning.

804
00:31:42,757 --> 00:31:44,340
Like, deep learning
is almost too good

805
00:31:44,340 --> 00:31:48,217
at finding certain patterns
and can find artefact.

806
00:31:48,217 --> 00:31:49,800
So it's just a caveat
to keep in mind.

807
00:31:49,800 --> 00:31:55,230
We're doing experiments on
lots of real-world testing

808
00:31:55,230 --> 00:31:57,017
of methods like this
across many labs

809
00:31:57,017 --> 00:31:59,100
with many different standing
procedures and tissue

810
00:31:59,100 --> 00:32:00,990
preparation
procedures, et cetera,

811
00:32:00,990 --> 00:32:02,460
to evaluate the robustness.

812
00:32:02,460 --> 00:32:05,250
But that's why competition
results, even ImageNet always

813
00:32:05,250 --> 00:32:09,890
need to be taken
with a grain of salt.

814
00:32:09,890 --> 00:32:12,073
And then but we sort
of think the value add

815
00:32:12,073 --> 00:32:13,240
of this is going to be huge.

816
00:32:13,240 --> 00:32:15,290
I mean, it's hard to tell
because it's such a big image,

817
00:32:15,290 --> 00:32:16,420
but this is what a
pathologist today is

818
00:32:16,420 --> 00:32:18,112
looking at under a
microscope, and it's

819
00:32:18,112 --> 00:32:19,195
very hard to see anything.

820
00:32:19,195 --> 00:32:22,540
And with a very simple
visualization, just of

821
00:32:22,540 --> 00:32:25,450
the output of the AI system as
red where cancer looks like it

822
00:32:25,450 --> 00:32:26,350
is.

823
00:32:26,350 --> 00:32:28,720
It's clearly a sort of
great map of the areas

824
00:32:28,720 --> 00:32:30,580
they need to be
sure to focus on.

825
00:32:30,580 --> 00:32:32,860
And this is real data
from this example, where

826
00:32:32,860 --> 00:32:35,800
this bright red area, in fact,
contains this tiny little rim

827
00:32:35,800 --> 00:32:37,960
of metastatic
breast cancer cells

828
00:32:37,960 --> 00:32:41,020
that would be very easy to miss
without that assistant sort

829
00:32:41,020 --> 00:32:43,210
of just pointing you in
the right place to look at,

830
00:32:43,210 --> 00:32:45,670
because it's a tiny
set of 20 cells

831
00:32:45,670 --> 00:32:48,622
amid a big sea of all
these normal lymphocytes.

832
00:32:48,622 --> 00:32:50,080
And here's another
one that, again,

833
00:32:50,080 --> 00:32:51,790
now you can see from low power.

834
00:32:51,790 --> 00:32:53,498
It's like a satellite
image or something,

835
00:32:53,498 --> 00:32:56,420
where you can focus immediately
on this little red area, that,

836
00:32:56,420 --> 00:32:58,960
again, is a tiny pocket
of 10 cancer cells

837
00:32:58,960 --> 00:33:01,780
amid hundreds of thousands
of normal cells that are now

838
00:33:01,780 --> 00:33:05,750
visible from low power.

839
00:33:05,750 --> 00:33:10,010
So this is one application
we're working on,

840
00:33:10,010 --> 00:33:13,490
where the clinical
use case will be

841
00:33:13,490 --> 00:33:15,770
today, people are just
sort of looking at images

842
00:33:15,770 --> 00:33:17,900
without the assistance
of any machine learning.

843
00:33:17,900 --> 00:33:20,450
And they just have to kind
of pick a number of patches

844
00:33:20,450 --> 00:33:22,265
to focus on with no guidance.

845
00:33:22,265 --> 00:33:24,140
So sometimes they focus
on the right patches,

846
00:33:24,140 --> 00:33:26,510
sometimes they don't, but
clearly they don't have time

847
00:33:26,510 --> 00:33:29,030
to look at all of this
at high magnification,

848
00:33:29,030 --> 00:33:30,970
because that would
take an entire day

849
00:33:30,970 --> 00:33:33,020
if you were trying to
look at 40X magnification

850
00:33:33,020 --> 00:33:33,890
at the whole image.

851
00:33:33,890 --> 00:33:35,810
So they sort of use
their intuition to focus.

852
00:33:35,810 --> 00:33:37,185
And for that
reason, they end up,

853
00:33:37,185 --> 00:33:39,878
as we've seen, making
significant number of mistakes.

854
00:33:39,878 --> 00:33:41,420
It's not reproducible,
because people

855
00:33:41,420 --> 00:33:43,180
focus on different
aspects of the image,

856
00:33:43,180 --> 00:33:44,450
and it's pretty slow.

857
00:33:44,450 --> 00:33:46,240
And they're faced with
this empty report.

858
00:33:46,240 --> 00:33:47,810
So they have to actually
summarize everything

859
00:33:47,810 --> 00:33:49,070
they've looked at in a report.

860
00:33:49,070 --> 00:33:50,240
Like, what's the diagnosis?

861
00:33:50,240 --> 00:33:51,546
What's the size?

862
00:33:51,546 --> 00:33:53,796
So let's say there's cancer
here and cancer here, they

863
00:33:53,796 --> 00:33:56,570
have to manually add the
distances of the cancer

864
00:33:56,570 --> 00:33:57,900
in those two regions.

865
00:33:57,900 --> 00:34:01,580
And then they have to put this
into a staging system that

866
00:34:01,580 --> 00:34:04,010
incorporates how many areas
of metastasis there are

867
00:34:04,010 --> 00:34:05,050
and how big are they?

868
00:34:05,050 --> 00:34:07,217
And all of these things are
pretty much automatable.

869
00:34:07,217 --> 00:34:08,675
And this is the
kind of thing we're

870
00:34:08,675 --> 00:34:11,630
building, where the system will
highlight where it sees cancer,

871
00:34:11,630 --> 00:34:13,489
tell the pathologist
to focus there.

872
00:34:13,489 --> 00:34:15,679
And then based on the
input of the AI system

873
00:34:15,679 --> 00:34:18,440
and the input of the pathologist
can summarize all of that data,

874
00:34:18,440 --> 00:34:21,620
quantitative as
well as diagnostic

875
00:34:21,620 --> 00:34:23,389
as well as summary staging.

876
00:34:23,389 --> 00:34:25,190
Sort of if the pathologist
then takes this

877
00:34:25,190 --> 00:34:27,080
is their first
version of the report,

878
00:34:27,080 --> 00:34:29,710
they can edit it,
confirm it, sign it out.

879
00:34:29,710 --> 00:34:31,460
That data goes back
into the system, which

880
00:34:31,460 --> 00:34:33,460
can be used for more
training data in the future

881
00:34:33,460 --> 00:34:34,850
and the case is signed out.

882
00:34:34,850 --> 00:34:38,239
So it's much faster, much more
accurate, and standardized

883
00:34:38,239 --> 00:34:43,080
once this thing is fully
developed, which it isn't yet.

884
00:34:43,080 --> 00:34:45,480
So this is a great
application for AI,

885
00:34:45,480 --> 00:34:47,670
because you really do need--

886
00:34:47,670 --> 00:34:49,245
you actually do
have a ton of data,

887
00:34:49,245 --> 00:34:51,120
so you need to do an
exhaustive analysis that

888
00:34:51,120 --> 00:34:54,025
has a lot of value.

889
00:34:54,025 --> 00:34:57,060
It's a task where the local
image data in a patch,

890
00:34:57,060 --> 00:34:59,010
which is really what
this current generation

891
00:34:59,010 --> 00:35:01,350
of deep CNN's are really
good at, is enough.

892
00:35:01,350 --> 00:35:03,600
So we're looking at things
at the cellular level.

893
00:35:03,600 --> 00:35:05,070
Radiology actually
could be harder,

894
00:35:05,070 --> 00:35:07,320
because you often want to
summarize over larger areas.

895
00:35:07,320 --> 00:35:10,800
Here, you really often have
the salient information

896
00:35:10,800 --> 00:35:14,477
in patches that really are
scalable in current ML systems.

897
00:35:14,477 --> 00:35:16,560
And then we can interpret
the output to the model.

898
00:35:16,560 --> 00:35:19,060
So it really isn't-- even though
the model itself is a black

899
00:35:19,060 --> 00:35:22,740
box, we can visualize the
output on top of the image,

900
00:35:22,740 --> 00:35:24,690
which gives us incredible
advantage in terms

901
00:35:24,690 --> 00:35:27,210
of interpretability of what
the models are doing well,

902
00:35:27,210 --> 00:35:29,010
what they're doing poorly on.

903
00:35:29,010 --> 00:35:31,320
And it's a specialty,
pathology, where sort of 80%

904
00:35:31,320 --> 00:35:32,320
is not good enough.

905
00:35:32,320 --> 00:35:37,640
We want to get as close
to 100% as possible.

906
00:35:37,640 --> 00:35:39,850
And that's one sort of
diagnostic application.

907
00:35:39,850 --> 00:35:42,250
The last, or one of the last
examples I'm going to give

908
00:35:42,250 --> 00:35:44,542
has to do with precision
immunotherapy, where we're not

909
00:35:44,542 --> 00:35:47,590
only trying to identify
what the diagnosis is but

910
00:35:47,590 --> 00:35:51,207
to actually subtype patients
to predict the right treatment.

911
00:35:51,207 --> 00:35:52,915
And as I mentioned
earlier, immunotherapy

912
00:35:52,915 --> 00:35:56,273
is a really important and
exciting, relatively new area

913
00:35:56,273 --> 00:35:57,940
of cancer therapy,
which was another one

914
00:35:57,940 --> 00:35:59,770
of the big advances in 2012.

915
00:35:59,770 --> 00:36:02,230
Around the same time that
deep learning came out,

916
00:36:02,230 --> 00:36:04,270
the first studies
came out showing

917
00:36:04,270 --> 00:36:08,410
that targeting a protein
mostly on tumor cells

918
00:36:08,410 --> 00:36:12,010
but also on immune cells, the
PD-1 or the PD-L1 protein,

919
00:36:12,010 --> 00:36:13,720
which the protein's
job when it's on

920
00:36:13,720 --> 00:36:15,860
is to inhibit immune response.

921
00:36:15,860 --> 00:36:18,195
But in the setting of
cancer, the inhibition

922
00:36:18,195 --> 00:36:20,320
of immune response is
actually bad for the patient,

923
00:36:20,320 --> 00:36:22,540
because the immune system's
job is to really try

924
00:36:22,540 --> 00:36:24,230
to fight off the cancer.

925
00:36:24,230 --> 00:36:26,610
So they realized a very
simple therapeutic strategy

926
00:36:26,610 --> 00:36:30,310
just having an antibody that
binds to this inhibitory signal

927
00:36:30,310 --> 00:36:32,650
can sort of unleash the
patient's own immune system

928
00:36:32,650 --> 00:36:36,280
to really end up curing really
serious advanced cancers.

929
00:36:36,280 --> 00:36:38,140
And that image on
the top right sort of

930
00:36:38,140 --> 00:36:40,150
speaks to that, where
this patient had

931
00:36:40,150 --> 00:36:43,030
a very large melanoma.

932
00:36:43,030 --> 00:36:45,310
And then they just got
this antibody to target,

933
00:36:45,310 --> 00:36:47,770
to sort of invigorate
their immune system,

934
00:36:47,770 --> 00:36:50,200
and then the tumor
really shrunk.

935
00:36:50,200 --> 00:36:53,170
And one of the big biomarkers
for assessing which patients

936
00:36:53,170 --> 00:36:55,210
will benefit from
these therapies

937
00:36:55,210 --> 00:36:57,820
is the tumor cell or the
immune cell expressing

938
00:36:57,820 --> 00:37:00,690
this drug target PD-1 or PD-L1.

939
00:37:00,690 --> 00:37:02,440
And the one they
test for is PD-L1,

940
00:37:02,440 --> 00:37:05,990
which is the ligand
for the PD-1 receptor.

941
00:37:05,990 --> 00:37:07,900
So this is often the
key piece of data

942
00:37:07,900 --> 00:37:09,650
used to decide who
gets these therapies.

943
00:37:09,650 --> 00:37:12,460
And it turns out, pathologists
are pretty bad at scoring this,

944
00:37:12,460 --> 00:37:14,377
not surprisingly, because
it's very difficult,

945
00:37:14,377 --> 00:37:17,870
and there's millions of
cells potentially per case.

946
00:37:17,870 --> 00:37:19,720
And they show an
interobserver agreement

947
00:37:19,720 --> 00:37:22,030
of only 0.86 for scoring
on tumor cells, which

948
00:37:22,030 --> 00:37:25,360
isn't bad, but 0.2 for scoring
it on immune cells, which

949
00:37:25,360 --> 00:37:27,260
is super important.

950
00:37:27,260 --> 00:37:28,408
So this is a drug target.

951
00:37:28,408 --> 00:37:30,700
We're trying to measure to
see which patients might get

952
00:37:30,700 --> 00:37:34,660
this life-saving therapy,
but the diagnostic we have

953
00:37:34,660 --> 00:37:37,323
is super hard to interpret.

954
00:37:37,323 --> 00:37:38,740
And some studies,
for this reason,

955
00:37:38,740 --> 00:37:41,500
have shown sort of mixed results
about how valuable it is.

956
00:37:41,500 --> 00:37:43,810
In some cases, it
appears valuable.

957
00:37:43,810 --> 00:37:46,300
In other cases, it
appears it's not.

958
00:37:46,300 --> 00:37:48,670
So we want to see would this
be a good example of where

959
00:37:48,670 --> 00:37:51,220
we can use machine learning?

960
00:37:51,220 --> 00:37:54,050
And for this type
of application,

961
00:37:54,050 --> 00:37:55,750
this is really hard,
and we want to be

962
00:37:55,750 --> 00:37:58,090
able to apply it across
not just one cancer but 20

963
00:37:58,090 --> 00:37:59,330
different cancers.

964
00:37:59,330 --> 00:38:02,080
So we built a system at
PathAI for generating lots

965
00:38:02,080 --> 00:38:03,910
of training data at scale.

966
00:38:03,910 --> 00:38:06,400
And that's something that a
competition just won't get you.

967
00:38:06,400 --> 00:38:09,550
Like that competition
example had 300 slides.

968
00:38:09,550 --> 00:38:10,720
Once a year, they do it.

969
00:38:10,720 --> 00:38:13,118
But we want to be able to
build these models every week

970
00:38:13,118 --> 00:38:13,660
or something.

971
00:38:13,660 --> 00:38:16,600
So now, we have something
500 pathologists signed

972
00:38:16,600 --> 00:38:19,660
into our system that we can use
to label lots of pathology data

973
00:38:19,660 --> 00:38:23,020
for us and to really build
these models quickly and really

974
00:38:23,020 --> 00:38:23,590
high quality.

975
00:38:23,590 --> 00:38:26,380
So now we have something
like over 2 and 1/2 million

976
00:38:26,380 --> 00:38:28,170
annotations in the system.

977
00:38:28,170 --> 00:38:30,670
And that allows us to
build tissue region models.

978
00:38:30,670 --> 00:38:33,645
And this is immunohistochemistry
in a cancer, where

979
00:38:33,645 --> 00:38:35,020
we've trained a
model to identify

980
00:38:35,020 --> 00:38:37,270
all of the cancer epithelium
in red, the cancer stroma

981
00:38:37,270 --> 00:38:38,420
in green.

982
00:38:38,420 --> 00:38:39,940
So now we know
where the protein is

983
00:38:39,940 --> 00:38:43,690
being expressed, in the
epithelium or in the stroma.

984
00:38:43,690 --> 00:38:46,840
And then we've also trained
cellular classification.

985
00:38:46,840 --> 00:38:49,700
So now, for every single cell,
we classify it as a cell type.

986
00:38:49,700 --> 00:38:52,660
Is it a cancer cell or a
fibroblast or a macrophage

987
00:38:52,660 --> 00:38:53,410
or a lymphocyte?

988
00:38:53,410 --> 00:38:55,120
And is it expressing
the protein,

989
00:38:55,120 --> 00:38:56,613
based on how brown it is?

990
00:38:56,613 --> 00:38:58,780
So while pathologists will
try to make some estimate

991
00:38:58,780 --> 00:39:01,363
across the whole slide, we can
actually compute for every cell

992
00:39:01,363 --> 00:39:03,040
and then compute
exact statistics

993
00:39:03,040 --> 00:39:05,080
about which cells are
expressing this protein

994
00:39:05,080 --> 00:39:07,795
and which patients might be the
best candidates for therapy.

995
00:39:13,130 --> 00:39:19,370
And then the question is, can
we identify additional things

996
00:39:19,370 --> 00:39:22,160
beyond just PD-L1 protein
expression that's predictive

997
00:39:22,160 --> 00:39:23,780
of response to immunotherapy?

998
00:39:23,780 --> 00:39:26,420
And we've developed some
machine learning approaches

999
00:39:26,420 --> 00:39:29,220
for doing that.

1000
00:39:29,220 --> 00:39:32,010
And part of it's doing
things like quantitating

1001
00:39:32,010 --> 00:39:33,762
different cells and
regions on H and E

1002
00:39:33,762 --> 00:39:35,220
images, which
currently aren't used

1003
00:39:35,220 --> 00:39:36,720
at all in patient subtyping.

1004
00:39:36,720 --> 00:39:38,940
But we can do analyses to
extract new features here

1005
00:39:38,940 --> 00:39:40,590
and to ask, even
though nothing's

1006
00:39:40,590 --> 00:39:43,500
known about these images
and immunotherapy response,

1007
00:39:43,500 --> 00:39:47,360
can we discover
new features here?

1008
00:39:47,360 --> 00:39:48,990
And this would be
an example routinely

1009
00:39:48,990 --> 00:39:51,360
of the types of features
we can quantify now

1010
00:39:51,360 --> 00:39:55,290
using deep learning to extract
these features on any case.

1011
00:39:55,290 --> 00:39:57,690
And this is sort of like
every sort of pathologic

1012
00:39:57,690 --> 00:39:59,610
characteristic you
can sort of imagine.

1013
00:39:59,610 --> 00:40:01,620
And then we correlate
these with drug response

1014
00:40:01,620 --> 00:40:03,270
and can use this
as a discovery tool

1015
00:40:03,270 --> 00:40:05,760
for identifying new aspects
of pathology predictive

1016
00:40:05,760 --> 00:40:08,550
of which patients
will respond best.

1017
00:40:08,550 --> 00:40:10,800
And then we can combine
these features into models.

1018
00:40:10,800 --> 00:40:12,300
This is sort of a
ridiculous example

1019
00:40:12,300 --> 00:40:13,530
because they're so different.

1020
00:40:13,530 --> 00:40:16,320
But this would be
one example where

1021
00:40:16,320 --> 00:40:19,300
the output of the model, and
this is totally fake data

1022
00:40:19,300 --> 00:40:21,270
but I think it's just
to get to the point.

1023
00:40:21,270 --> 00:40:23,730
Is here, the color
indicates the treatment,

1024
00:40:23,730 --> 00:40:25,800
where green would be
the immunotherapy,

1025
00:40:25,800 --> 00:40:30,090
red would be the
traditional therapy,

1026
00:40:30,090 --> 00:40:32,370
and the goal is to
build a model to predict

1027
00:40:32,370 --> 00:40:34,412
which patients actually
benefit from the therapy.

1028
00:40:34,412 --> 00:40:36,120
So this may be an easy
question, but what

1029
00:40:36,120 --> 00:40:37,950
do you think, if
the model's working,

1030
00:40:37,950 --> 00:40:39,867
what would the title of
the graph on the right

1031
00:40:39,867 --> 00:40:42,360
be versus the graph on
the left if these are

1032
00:40:42,360 --> 00:40:45,028
the ways of classifying
patients with our model,

1033
00:40:45,028 --> 00:40:47,070
and the classifications
are going to be responder

1034
00:40:47,070 --> 00:40:50,850
class or non-responder class?

1035
00:40:50,850 --> 00:40:52,740
And the color
indicates the drug.

1036
00:40:56,450 --> 00:40:58,920
AUDIENCE: The drug works
or it doesn't work.

1037
00:40:58,920 --> 00:41:02,442
ANDY BECK: That's right but
what's the output of the model?

1038
00:41:02,442 --> 00:41:03,150
But you're right.

1039
00:41:03,150 --> 00:41:05,100
The interpretation of
these graphs is drug works,

1040
00:41:05,100 --> 00:41:05,850
drug doesn't work.

1041
00:41:05,850 --> 00:41:07,920
It's kind of a tricky
question, right?

1042
00:41:07,920 --> 00:41:10,620
But what is our model
trying to predict?

1043
00:41:10,620 --> 00:41:12,870
AUDIENCE: Whether the person
is going to die or not?

1044
00:41:12,870 --> 00:41:14,940
It looks like
likelihood of death

1045
00:41:14,940 --> 00:41:17,312
is just not as
high on the right.

1046
00:41:17,312 --> 00:41:19,020
ANDY BECK: I think
the overall likelihood

1047
00:41:19,020 --> 00:41:22,018
is the same on the two
graphs, right versus left.

1048
00:41:22,018 --> 00:41:24,060
You don't know how many
patients are in each arm.

1049
00:41:24,060 --> 00:41:25,060
But I think the
one piece on it--

1050
00:41:25,060 --> 00:41:26,610
so green is
experimental treatment.

1051
00:41:26,610 --> 00:41:27,960
Red is conventional treatment.

1052
00:41:27,960 --> 00:41:29,070
Maybe I already said that.

1053
00:41:29,070 --> 00:41:31,957
So here, and it's sort of like
a read my mind type question,

1054
00:41:31,957 --> 00:41:33,540
but here the output
of the model would

1055
00:41:33,540 --> 00:41:37,980
be responder to the drug would
be the right class of patients.

1056
00:41:37,980 --> 00:41:39,480
And the left class
of patients would

1057
00:41:39,480 --> 00:41:41,478
be non-responder to the drug.

1058
00:41:41,478 --> 00:41:43,770
So you're not actually saying
anything about prognosis,

1059
00:41:43,770 --> 00:41:46,590
but you're saying
that I'm predicting

1060
00:41:46,590 --> 00:41:49,650
that if you're in the right
population of patients,

1061
00:41:49,650 --> 00:41:52,020
you will benefit
from the blue drug.

1062
00:41:52,020 --> 00:41:54,330
And then you actually see
that on this right population

1063
00:41:54,330 --> 00:41:57,060
of patients, the blue
drug does really well.

1064
00:41:57,060 --> 00:41:58,650
And then the red
drug are patients

1065
00:41:58,650 --> 00:42:01,067
who we thought-- we predicted
would benefit from the drug,

1066
00:42:01,067 --> 00:42:02,580
but because it's
an experiment, we

1067
00:42:02,580 --> 00:42:03,913
didn't give them the right drug.

1068
00:42:03,913 --> 00:42:05,690
And in fact, they did
a whole lot worse.

1069
00:42:05,690 --> 00:42:07,440
Whereas, the one on
the left, we're saying

1070
00:42:07,440 --> 00:42:09,000
you don't benefit from
the drug, and they truly

1071
00:42:09,000 --> 00:42:10,330
don't benefit from the drug.

1072
00:42:10,330 --> 00:42:12,330
So this is the way of
using an output of a model

1073
00:42:12,330 --> 00:42:15,420
to predict drug response
and then visualizing

1074
00:42:15,420 --> 00:42:16,620
whether it actually works.

1075
00:42:16,620 --> 00:42:17,995
And it's kind of
like the example

1076
00:42:17,995 --> 00:42:21,720
I talked about before, but
here's a real version of it.

1077
00:42:21,720 --> 00:42:24,095
And you can learn this
directly using machine learning

1078
00:42:24,095 --> 00:42:26,220
to try to say, I want to
find patients who actually

1079
00:42:26,220 --> 00:42:27,480
benefit the most from a drug.

1080
00:42:33,660 --> 00:42:36,150
And then in terms of
how do we validate

1081
00:42:36,150 --> 00:42:37,192
our models are correct?

1082
00:42:37,192 --> 00:42:38,650
I mean, we have
two different ways.

1083
00:42:38,650 --> 00:42:40,130
One is do stuff like that.

1084
00:42:40,130 --> 00:42:42,630
So we build a model that
says, respond to drug,

1085
00:42:42,630 --> 00:42:44,312
don't respond to a drug.

1086
00:42:44,312 --> 00:42:46,020
And then we plot the
Kaplan-Meier curves.

1087
00:42:46,020 --> 00:42:52,170
If it's image analysis stuff, we
ask pathologists to hand label.

1088
00:42:52,170 --> 00:42:53,760
Many cells, and we
take the consensus

1089
00:42:53,760 --> 00:42:57,180
of pathologists as our ground
truth and go from there.

1090
00:43:01,340 --> 00:43:03,215
AUDIENCE: The way
you're presenting it,

1091
00:43:03,215 --> 00:43:05,390
it makes it sound like
all the data comes

1092
00:43:05,390 --> 00:43:08,310
from the pathology images.

1093
00:43:08,310 --> 00:43:12,790
But in reality, people look at
single nucleotide polymorphisms

1094
00:43:12,790 --> 00:43:19,410
or gene sequences or all kinds
of clinical data as well.

1095
00:43:19,410 --> 00:43:21,855
So how do you get those?

1096
00:43:21,855 --> 00:43:24,230
ANDY BECK: Yeah, I mean, the
beauty of the pathology data

1097
00:43:24,230 --> 00:43:25,887
is it's always available.

1098
00:43:25,887 --> 00:43:27,470
So that's why a lot
of the stuff we do

1099
00:43:27,470 --> 00:43:31,550
is focused on that, because
every clinical trial

1100
00:43:31,550 --> 00:43:34,955
patient has treatment
data, outcome

1101
00:43:34,955 --> 00:43:36,080
data, and pathology images.

1102
00:43:36,080 --> 00:43:39,830
So it's like, we can really
do this at scale pretty fast.

1103
00:43:39,830 --> 00:43:43,220
A lot of the other stuff is
things like gene expression,

1104
00:43:43,220 --> 00:43:45,840
many people are collecting them.

1105
00:43:45,840 --> 00:43:48,405
And it's important to
compare these to baselines

1106
00:43:48,405 --> 00:43:49,280
or to integrate them.

1107
00:43:49,280 --> 00:43:52,160
I mean, two things-- one is
compare to it as a baseline.

1108
00:43:52,160 --> 00:43:55,220
What can we predict in terms of
responder, non-responder using

1109
00:43:55,220 --> 00:43:58,700
just the pathology images versus
using just gene expression

1110
00:43:58,700 --> 00:44:00,380
data versus combining them?

1111
00:44:00,380 --> 00:44:04,130
And that would just be
increasing the input feature

1112
00:44:04,130 --> 00:44:04,630
space.

1113
00:44:04,630 --> 00:44:06,880
Part of the input feature
space comes from the images.

1114
00:44:06,880 --> 00:44:08,780
Part of it comes from
gene expression data.

1115
00:44:08,780 --> 00:44:10,363
Then you use machine
learning to focus

1116
00:44:10,363 --> 00:44:12,170
on the most important
characteristics

1117
00:44:12,170 --> 00:44:14,120
and predict outcome.

1118
00:44:14,120 --> 00:44:16,970
And the other is if you
want to sort of prioritize.

1119
00:44:16,970 --> 00:44:18,895
Use pathology as a
baseline because it's

1120
00:44:18,895 --> 00:44:20,100
available on everyone.

1121
00:44:20,100 --> 00:44:23,480
But then an adjuvant test
that costs another $1,000

1122
00:44:23,480 --> 00:44:25,640
and might take another
two weeks, how much does

1123
00:44:25,640 --> 00:44:28,140
that add to the prediction?

1124
00:44:28,140 --> 00:44:29,390
And that would be another way.

1125
00:44:29,390 --> 00:44:31,427
So I think it is
important, but a lot

1126
00:44:31,427 --> 00:44:33,260
of our technology to
developing our platform

1127
00:44:33,260 --> 00:44:35,540
is focused around how do
we most effectively use

1128
00:44:35,540 --> 00:44:38,220
pathology and can certainly
add in gene expression date.

1129
00:44:38,220 --> 00:44:40,220
I'm actually going to
talk about that next-- one

1130
00:44:40,220 --> 00:44:40,968
way of doing it.

1131
00:44:40,968 --> 00:44:43,010
Because it's a very natural
synergy, because they

1132
00:44:43,010 --> 00:44:44,302
tell you very different things.

1133
00:44:47,250 --> 00:44:49,357
So here's one example of
integrating, just kind

1134
00:44:49,357 --> 00:44:51,440
of relative to that question,
gene expression data

1135
00:44:51,440 --> 00:44:54,320
with image data, where the
cancer genome analysis,

1136
00:44:54,320 --> 00:44:55,280
and this is all public.

1137
00:44:55,280 --> 00:44:58,787
So they have pathology images,
RNA data, clinical outcomes.

1138
00:44:58,787 --> 00:45:00,620
They don't have the
greatest treatment data,

1139
00:45:00,620 --> 00:45:02,495
but it's a great place
for method development

1140
00:45:02,495 --> 00:45:06,110
for sort of ML in
cancer, including

1141
00:45:06,110 --> 00:45:07,850
pathology-type analyses.

1142
00:45:07,850 --> 00:45:09,470
So this is a case of melanoma.

1143
00:45:09,470 --> 00:45:12,140
We've trained a model to
identify cancer and stroma

1144
00:45:12,140 --> 00:45:13,950
and all the different cells.

1145
00:45:13,950 --> 00:45:16,980
And then we extract, as you saw,
sort of hundreds of features.

1146
00:45:16,980 --> 00:45:19,820
And then we can rank
the features here

1147
00:45:19,820 --> 00:45:21,960
by their correlation
with survival.

1148
00:45:21,960 --> 00:45:24,110
So now we're mapping
from pathology images

1149
00:45:24,110 --> 00:45:27,890
to outcome data and we find just
in a totally data-driven way

1150
00:45:27,890 --> 00:45:31,228
that there's some small set
of 15 features or so highly

1151
00:45:31,228 --> 00:45:32,270
associated with survival.

1152
00:45:32,270 --> 00:45:33,510
The rest aren't.

1153
00:45:33,510 --> 00:45:36,920
And the top ranking one
is an immune cell feature,

1154
00:45:36,920 --> 00:45:38,630
increased area of
stroma plasma cells

1155
00:45:38,630 --> 00:45:40,500
that are associated
with increased survival.

1156
00:45:40,500 --> 00:45:42,708
And this was an analysis
that was really just linking

1157
00:45:42,708 --> 00:45:43,970
the images with outcome.

1158
00:45:43,970 --> 00:45:47,060
And then we can ask, well,
what are the genes underlying

1159
00:45:47,060 --> 00:45:48,050
this pathology?

1160
00:45:48,050 --> 00:45:50,990
So pathology is telling you
about cells and tissues.

1161
00:45:50,990 --> 00:45:53,570
RNAs are telling you about
the actual transcriptional

1162
00:45:53,570 --> 00:45:57,180
landscape of what's
going on underneath.

1163
00:45:57,180 --> 00:45:59,283
And then we can rank all
the genes in the genome

1164
00:45:59,283 --> 00:46:01,700
just by their correlation with
this quantitative phenotype

1165
00:46:01,700 --> 00:46:02,990
we're measuring on
the pathology images.

1166
00:46:02,990 --> 00:46:05,420
And here are all the genes,
ranked from 0 to 20,000.

1167
00:46:05,420 --> 00:46:08,400
And again, we see a small
set that we're thresholding

1168
00:46:08,400 --> 00:46:11,450
at a correlation
of 0.4, strongly

1169
00:46:11,450 --> 00:46:14,720
associated with the pathologic
phenotype we're measuring.

1170
00:46:14,720 --> 00:46:17,360
And then we sort of
discover these sets

1171
00:46:17,360 --> 00:46:20,480
of genes that are known to be
highly enriched in immune cell

1172
00:46:20,480 --> 00:46:21,230
genes.

1173
00:46:21,230 --> 00:46:23,635
Sort of which is some
form of validation

1174
00:46:23,635 --> 00:46:25,760
that we're measuring what
we think we're measuring,

1175
00:46:25,760 --> 00:46:29,690
but also this sets of genes are
potentially new drug targets,

1176
00:46:29,690 --> 00:46:32,388
new diagnostics, et
cetera, that was uncovered

1177
00:46:32,388 --> 00:46:34,430
by going from clinical
outcomes to pathology data

1178
00:46:34,430 --> 00:46:36,368
to the underlying RNA signature.

1179
00:46:39,440 --> 00:46:41,990
And then kind of the beauty of
the approach we're working on

1180
00:46:41,990 --> 00:46:44,900
is it's super scalable,
and in theory, you

1181
00:46:44,900 --> 00:46:47,000
could apply it to all of
TCGA or other data sets

1182
00:46:47,000 --> 00:46:51,950
and apply it across cancer
types and do things like find--

1183
00:46:51,950 --> 00:46:57,230
automatically find artefacts
in all of the slides

1184
00:46:57,230 --> 00:46:59,720
and kind of do this
in a broad way.

1185
00:46:59,720 --> 00:47:02,700
And then sort of the most
interesting part, potentially,

1186
00:47:02,700 --> 00:47:04,453
is analyzing the
outputs of the models

1187
00:47:04,453 --> 00:47:05,870
and how they
correlate with things

1188
00:47:05,870 --> 00:47:09,930
like drug response or
underlying molecular profiles.

1189
00:47:09,930 --> 00:47:11,930
And this is really the
process we're working on,

1190
00:47:11,930 --> 00:47:15,290
is how do we go from images to
new ways of measuring disease

1191
00:47:15,290 --> 00:47:16,792
pathology?

1192
00:47:16,792 --> 00:47:19,250
And kind of in summary, a lot
of the technology development

1193
00:47:19,250 --> 00:47:21,080
that I think is
most important today

1194
00:47:21,080 --> 00:47:22,700
for getting ML to
work really well

1195
00:47:22,700 --> 00:47:25,490
in the real world for
applications in medicine

1196
00:47:25,490 --> 00:47:28,370
is a lot about being super
thoughtful about building

1197
00:47:28,370 --> 00:47:29,900
the right training data set.

1198
00:47:29,900 --> 00:47:32,040
And how do you do that in
a scalable way and even

1199
00:47:32,040 --> 00:47:33,770
in a way that incorporates
machine learning?

1200
00:47:33,770 --> 00:47:34,670
Which is kind of what
I was talking about

1201
00:47:34,670 --> 00:47:36,410
before-- intelligently
picking patches.

1202
00:47:36,410 --> 00:47:39,180
But that sort of concept
applies everywhere.

1203
00:47:39,180 --> 00:47:41,360
So I think there's almost
more room for innovation

1204
00:47:41,360 --> 00:47:44,210
on the defining the
training data set side

1205
00:47:44,210 --> 00:47:46,675
than on the predictive
modeling side,

1206
00:47:46,675 --> 00:47:48,050
and then putting
the two together

1207
00:47:48,050 --> 00:47:50,378
is incredibly important.

1208
00:47:50,378 --> 00:47:51,920
And for the kind of
work we're doing,

1209
00:47:51,920 --> 00:47:54,620
there's already such great
advances in image processing.

1210
00:47:54,620 --> 00:47:57,230
A lot of it's about
engineering and scalability,

1211
00:47:57,230 --> 00:47:59,220
as well as rigorous validation.

1212
00:47:59,220 --> 00:48:01,830
And then how do we connect it
with underlying molecular data

1213
00:48:01,830 --> 00:48:03,720
as well as clinical
outcome data?

1214
00:48:03,720 --> 00:48:08,445
Versus trying to solve a lot
of the core vision tasks, which

1215
00:48:08,445 --> 00:48:10,320
there's already just
been incredible progress

1216
00:48:10,320 --> 00:48:11,900
over the past couple of years.

1217
00:48:11,900 --> 00:48:13,920
And in terms of in
our world, things

1218
00:48:13,920 --> 00:48:15,960
we think a lot about,
not just the technology

1219
00:48:15,960 --> 00:48:17,490
and putting together
our data sets but also,

1220
00:48:17,490 --> 00:48:18,850
how do we work with regulators?

1221
00:48:18,850 --> 00:48:20,547
How do we make
strong business cases

1222
00:48:20,547 --> 00:48:22,380
for partners working
with to actually change

1223
00:48:22,380 --> 00:48:24,048
what they're doing
to incorporate some

1224
00:48:24,048 --> 00:48:26,340
of these new approaches that
will really bring benefits

1225
00:48:26,340 --> 00:48:31,340
to patients around quality and
accuracy in their diagnosis?

1226
00:48:31,340 --> 00:48:32,280
So in summary--

1227
00:48:32,280 --> 00:48:34,540
I know you have to
go in four minutes--

1228
00:48:34,540 --> 00:48:36,580
this has been a
longstanding problem.

1229
00:48:36,580 --> 00:48:39,070
There's nothing new
about trying to apply AI

1230
00:48:39,070 --> 00:48:41,500
to diagnostics or
to vision tasks,

1231
00:48:41,500 --> 00:48:44,620
but there are some really big
differences in the past five

1232
00:48:44,620 --> 00:48:46,600
years that, even
in my short career,

1233
00:48:46,600 --> 00:48:49,480
I've seen a sea
change in this field.

1234
00:48:49,480 --> 00:48:51,250
One is availability
of digital data--

1235
00:48:51,250 --> 00:48:53,830
it's now much cheaper to
generate lots of images

1236
00:48:53,830 --> 00:48:55,450
at scale.

1237
00:48:55,450 --> 00:48:56,860
But even more
important, I think,

1238
00:48:56,860 --> 00:48:59,620
are the last two, which is
access to large-scale computing

1239
00:48:59,620 --> 00:49:03,790
resources is a game-changer
for anyone with access

1240
00:49:03,790 --> 00:49:06,790
to cloud computing or
large computing resources.

1241
00:49:06,790 --> 00:49:09,220
Just, we all have access
to a sort of arbitrary

1242
00:49:09,220 --> 00:49:11,800
compute today, and
10 years ago, that

1243
00:49:11,800 --> 00:49:13,755
was a huge limitation
in this field.

1244
00:49:13,755 --> 00:49:15,880
As well as these really
major algorithmic advances,

1245
00:49:15,880 --> 00:49:19,090
particularly deep
CNN's revision.

1246
00:49:19,090 --> 00:49:21,700
And, in general, AI
works extremely well

1247
00:49:21,700 --> 00:49:25,180
when problems can be defined to
get the right type of training

1248
00:49:25,180 --> 00:49:28,092
data, access,
large-scale computing,

1249
00:49:28,092 --> 00:49:30,550
as well as implement things
like deep CNNs that work really

1250
00:49:30,550 --> 00:49:31,278
well.

1251
00:49:31,278 --> 00:49:33,070
And it sort of fails
everywhere else, which

1252
00:49:33,070 --> 00:49:34,770
is probably 98% of things.

1253
00:49:34,770 --> 00:49:37,660
But if you can create a problem
where the algorithms actually

1254
00:49:37,660 --> 00:49:40,960
work, you can have lots
of data to train on,

1255
00:49:40,960 --> 00:49:43,330
they can succeed really well.

1256
00:49:43,330 --> 00:49:46,420
And this sort of vision-based
AI-powered pathology

1257
00:49:46,420 --> 00:49:49,060
is broadly applicable across,
really, all image-based tasks

1258
00:49:49,060 --> 00:49:49,660
and pathology.

1259
00:49:49,660 --> 00:49:51,243
It does enable
integration with things

1260
00:49:51,243 --> 00:49:54,010
like omics data--
genomics, transcriptonics,

1261
00:49:54,010 --> 00:49:57,010
SNP data, et cetera.

1262
00:49:57,010 --> 00:49:59,500
And in the near future, we
think this will be incorporated

1263
00:49:59,500 --> 00:50:00,520
into clinical practice.

1264
00:50:00,520 --> 00:50:02,470
And even today, it's
really central to a lot

1265
00:50:02,470 --> 00:50:04,935
of research efforts.

1266
00:50:04,935 --> 00:50:06,310
And I just want
to end on a quote

1267
00:50:06,310 --> 00:50:08,620
from 1987, where
in the future, AI

1268
00:50:08,620 --> 00:50:12,070
can be expected to become
staples of pathology practice.

1269
00:50:12,070 --> 00:50:17,305
And I think we're much, much
closer than 30 years ago.

1270
00:50:17,305 --> 00:50:18,930
And I want to thank
everyone at PathAI,

1271
00:50:18,930 --> 00:50:20,500
as well as Hunter, who
really helped put together

1272
00:50:20,500 --> 00:50:21,417
a lot of these slides.

1273
00:50:21,417 --> 00:50:23,140
And we do have lots
of opportunities

1274
00:50:23,140 --> 00:50:25,970
for machine learning
engineers, software engineers,

1275
00:50:25,970 --> 00:50:26,940
et cetera, at PathAI.

1276
00:50:26,940 --> 00:50:30,520
So certainly reach out if you're
interested in learning more.

1277
00:50:30,520 --> 00:50:32,990
And I'm happy to take any
questions, if we have time.

1278
00:50:32,990 --> 00:50:35,035
So thank you.

1279
00:50:35,035 --> 00:50:36,430
[APPLAUSE]

1280
00:50:40,150 --> 00:50:42,760
AUDIENCE: Yes, I think generally
very aggressive events.

1281
00:50:42,760 --> 00:50:46,640
I was wondering how close is
this to clinical practice?

1282
00:50:46,640 --> 00:50:48,180
Is there FDA or--

1283
00:50:48,180 --> 00:50:52,590
ANDY BECK: Yeah, so I mean,
actual clinical practice,

1284
00:50:52,590 --> 00:50:57,890
probably 2020, like
early, mid-2020.

1285
00:50:57,890 --> 00:51:01,450
But I mean, today, it's very
active in clinical research,

1286
00:51:01,450 --> 00:51:03,900
so like clinical trials,
et cetera, that do

1287
00:51:03,900 --> 00:51:07,740
involve patients, but it's in a
much more well-defined setting.

1288
00:51:07,740 --> 00:51:09,770
But the first
clinical use cases,

1289
00:51:09,770 --> 00:51:12,000
at least of the types
of stuff we're building,

1290
00:51:12,000 --> 00:51:13,873
will be, I think,
about a year from now.

1291
00:51:13,873 --> 00:51:15,540
And I think it will
start small and then

1292
00:51:15,540 --> 00:51:16,680
get progressively bigger.

1293
00:51:16,680 --> 00:51:18,513
So I don't think it's
going to be everything

1294
00:51:18,513 --> 00:51:20,345
all at once transforms
in the clinic,

1295
00:51:20,345 --> 00:51:21,720
but I do think
we'll start seeing

1296
00:51:21,720 --> 00:51:23,497
the first applications out.

1297
00:51:23,497 --> 00:51:25,830
And they will go-- some of
them will go through the FDA,

1298
00:51:25,830 --> 00:51:27,830
and there'll be some
laboratory-developed tests.

1299
00:51:27,830 --> 00:51:30,450
Ours will go through the
FDA, but labs themselves

1300
00:51:30,450 --> 00:51:33,870
can actually validate
tools themselves.

1301
00:51:33,870 --> 00:51:35,472
And that's another path.

1302
00:51:35,472 --> 00:51:36,180
AUDIENCE: Thanks.

1303
00:51:36,180 --> 00:51:36,847
ANDY BECK: Sure.

1304
00:51:46,698 --> 00:51:51,880
PROFESSOR: So have you been
using observational data sets?

1305
00:51:51,880 --> 00:51:56,200
You gave one example where
you tried to use data

1306
00:51:56,200 --> 00:51:58,540
from a randomized controlled
trial, or both trials,

1307
00:51:58,540 --> 00:52:00,373
you used different
randomized control trials

1308
00:52:00,373 --> 00:52:03,820
for different efficacies
of each event.

1309
00:52:03,820 --> 00:52:05,560
The next major segment
of this course,

1310
00:52:05,560 --> 00:52:08,335
starting in about two weeks,
will be about causal inference

1311
00:52:08,335 --> 00:52:10,060
from observational data.

1312
00:52:10,060 --> 00:52:12,120
I'm wondering if
that is something

1313
00:52:12,120 --> 00:52:14,300
PathAI has gotten into yet?

1314
00:52:14,300 --> 00:52:17,410
And if so, what has your
finding been so far?

1315
00:52:17,410 --> 00:52:20,320
ANDY BECK: So we have focused
a lot on randomized controlled

1316
00:52:20,320 --> 00:52:24,160
trial data and have
developed methods

1317
00:52:24,160 --> 00:52:26,650
around that, which sort
of simplifies the problem

1318
00:52:26,650 --> 00:52:30,640
and allows us to do, I think,
pretty clever things around how

1319
00:52:30,640 --> 00:52:33,330
to generate those types
of graphs I was showing,

1320
00:52:33,330 --> 00:52:38,620
where you truly can infer the
treatment is having an effect.

1321
00:52:38,620 --> 00:52:39,910
And we've done far less.

1322
00:52:39,910 --> 00:52:41,230
I'm super interested in that.

1323
00:52:41,230 --> 00:52:42,940
I'd say the
advantages of RCTs are

1324
00:52:42,940 --> 00:52:45,580
people are already investing
hugely in building these very

1325
00:52:45,580 --> 00:52:49,270
well-curated data sets
that include images,

1326
00:52:49,270 --> 00:52:52,170
molecular data, when available,
treatment, and outcome.

1327
00:52:52,170 --> 00:52:53,980
And it's just that's
there, because they've

1328
00:52:53,980 --> 00:52:55,300
invested in the clinical trial.

1329
00:52:55,300 --> 00:52:57,130
They've invested in
generating that data set.

1330
00:52:57,130 --> 00:52:59,110
To me, the big challenge
in observational stuff,

1331
00:52:59,110 --> 00:53:01,443
there's a few but I'd be
interested in what you guys are

1332
00:53:01,443 --> 00:53:04,120
doing and learn
about it, is getting

1333
00:53:04,120 --> 00:53:06,310
the data is not easy, right?

1334
00:53:06,310 --> 00:53:09,400
The outcome data is not--

1335
00:53:09,400 --> 00:53:11,565
linking the pathology
images with the outcome data

1336
00:53:11,565 --> 00:53:12,940
even is, actually,
in my opinion,

1337
00:53:12,940 --> 00:53:14,865
harder in observational
way than in RCT.

1338
00:53:14,865 --> 00:53:16,990
Because they're actually
doing it and paying for it

1339
00:53:16,990 --> 00:53:18,790
and collecting it in RCTs.

1340
00:53:18,790 --> 00:53:21,270
No one's really done
a very good job of--

1341
00:53:21,270 --> 00:53:23,920
TCGA would be a good place to
play around with because that

1342
00:53:23,920 --> 00:53:26,320
is observational data.

1343
00:53:26,320 --> 00:53:27,700
And we want to
also, we generally

1344
00:53:27,700 --> 00:53:29,860
want to focus on
actionable decisions.

1345
00:53:29,860 --> 00:53:32,050
And RCT is sort of
perfectly set up for that.

1346
00:53:32,050 --> 00:53:35,378
Do I give drug X or not?

1347
00:53:35,378 --> 00:53:37,420
So I think if you put
together the right data set

1348
00:53:37,420 --> 00:53:40,220
and somehow make the
results actionable,

1349
00:53:40,220 --> 00:53:41,770
it could be really,
really useful,

1350
00:53:41,770 --> 00:53:43,062
because there is a lot of data.

1351
00:53:43,062 --> 00:53:45,093
But I think just
collecting the outcomes

1352
00:53:45,093 --> 00:53:47,260
and linking them with images
is actually quite hard.

1353
00:53:47,260 --> 00:53:49,690
And ironically, I think it's
harder for observational

1354
00:53:49,690 --> 00:53:52,600
than for randomized control
trials, where they're already

1355
00:53:52,600 --> 00:53:53,290
collecting it.

1356
00:53:53,290 --> 00:53:55,248
I guess one example would
be the Nurses' Health

1357
00:53:55,248 --> 00:53:58,600
Study or these big epidemiology
cohorts, potentially.

1358
00:53:58,600 --> 00:54:00,765
They are collecting that
data and organizing it.

1359
00:54:00,765 --> 00:54:02,140
But what were you
thinking about?

1360
00:54:02,140 --> 00:54:03,220
Do you have anything
with pathology

1361
00:54:03,220 --> 00:54:05,448
in mind for causal inference
from observational data?

1362
00:54:05,448 --> 00:54:06,990
PROFESSOR: Well, I
think, the example

1363
00:54:06,990 --> 00:54:11,000
you gave, like Nurses' Health
Study or the Framingham study,

1364
00:54:11,000 --> 00:54:13,510
where you're tracking
patients across time.

1365
00:54:13,510 --> 00:54:16,700
They're getting different
interventions across time.

1366
00:54:16,700 --> 00:54:19,370
And because of the way the
study was designed, in fact,

1367
00:54:19,370 --> 00:54:22,040
there are even good outcomes
for patients across times.

1368
00:54:22,040 --> 00:54:23,415
So that problem
in the profession

1369
00:54:23,415 --> 00:54:24,960
doesn't happen there.

1370
00:54:24,960 --> 00:54:27,910
But then suppose you were
to take it from a biobank

1371
00:54:27,910 --> 00:54:28,860
and do pathologies?

1372
00:54:28,860 --> 00:54:31,450
You're now getting the samples.

1373
00:54:31,450 --> 00:54:33,442
Then, you can ask
about, well, what

1374
00:54:33,442 --> 00:54:35,650
is the effect of different
interventions or treatment

1375
00:54:35,650 --> 00:54:37,740
plans on outcomes?

1376
00:54:37,740 --> 00:54:39,610
The challenge, of course,
drawing inferences

1377
00:54:39,610 --> 00:54:41,180
there is that there
was bias in terms

1378
00:54:41,180 --> 00:54:43,153
of who got what treatments.

1379
00:54:43,153 --> 00:54:45,445
That's where the techniques
that we talk about in class

1380
00:54:45,445 --> 00:54:48,612
would become very important.

1381
00:54:48,612 --> 00:54:51,070
I just say, I appreciate the
challenges that you mentioned.

1382
00:54:51,070 --> 00:54:52,903
ANDY BECK: I think it's
incredibly powerful.

1383
00:54:52,903 --> 00:54:55,510
I think the other issue I just
think about is that treatments

1384
00:54:55,510 --> 00:54:57,040
change so quickly over time.

1385
00:54:57,040 --> 00:54:59,248
So you don't want to be like
overfitting to the past.

1386
00:55:01,145 --> 00:55:02,770
But I think there's
certain cases where

1387
00:55:02,770 --> 00:55:04,930
the therapeutic decisions
today are similar to what

1388
00:55:04,930 --> 00:55:05,847
they were in the past.

1389
00:55:05,847 --> 00:55:08,620
There are other areas, like
immunooncology, where there's

1390
00:55:08,620 --> 00:55:10,570
just no history to learn from.

1391
00:55:10,570 --> 00:55:12,350
So I think it depends on the--

1392
00:55:12,350 --> 00:55:14,850
PROFESSOR: All right, then with
that, let's thank Andy Beck.

1393
00:55:14,850 --> 00:55:15,350
[APPLAUSE]

1394
00:55:15,350 --> 00:55:16,700
ANDY BECK: Thank you.