1
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative

2
00:00:02,460 --> 00:00:03,870
Commons license.

3
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to

4
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.

5
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from

6
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.

8
00:00:23,130 --> 00:00:25,940
PROFESSOR: So today's agenda
is to say a few more things

9
00:00:25,940 --> 00:00:28,050
about continuous random
variables.

10
00:00:28,050 --> 00:00:32,049
Mainly we're going to talk a
little bit about inference.

11
00:00:32,049 --> 00:00:35,080
This is a topic that we're going
to revisit at the end of

12
00:00:35,080 --> 00:00:36,390
the semester.

13
00:00:36,390 --> 00:00:38,070
But there's a few things
that we can

14
00:00:38,070 --> 00:00:40,180
already say at this point.

15
00:00:40,180 --> 00:00:44,060
And then the new topic for
today is the subject of

16
00:00:44,060 --> 00:00:45,880
derived distributions.

17
00:00:45,880 --> 00:00:48,140
Basically if you know the
distribution of one random

18
00:00:48,140 --> 00:00:50,230
variable, and you have a
function of that random

19
00:00:50,230 --> 00:00:52,010
variable, how to find a

20
00:00:52,010 --> 00:00:54,840
distribution for that function.

21
00:00:54,840 --> 00:00:58,180
And it's a fairly mechanical
skill, but that's an important

22
00:00:58,180 --> 00:01:00,740
one, so we're going
to go through it.

23
00:01:00,740 --> 00:01:02,200
So let's see where we stand.

24
00:01:02,200 --> 00:01:03,540
Here is the big picture.

25
00:01:03,540 --> 00:01:06,720
That's all we have
done so far.

26
00:01:06,720 --> 00:01:09,460
We have talked about discrete
random variables, which we

27
00:01:09,460 --> 00:01:11,970
described by probability
mass function.

28
00:01:11,970 --> 00:01:14,900
So if we have multiple random
variables, we describe them

29
00:01:14,900 --> 00:01:16,760
with the a joint
mass function.

30
00:01:16,760 --> 00:01:19,810
And then we define conditional
probabilities, or conditional

31
00:01:19,810 --> 00:01:24,310
PMFs, and the three are related
according to this

32
00:01:24,310 --> 00:01:27,040
formula, which is, you can
think of it either as the

33
00:01:27,040 --> 00:01:29,300
definition of conditional
probability.

34
00:01:29,300 --> 00:01:32,170
Or as the multiplication rule,
the probability of two things

35
00:01:32,170 --> 00:01:35,870
happening is the product of the
probabilities of the first

36
00:01:35,870 --> 00:01:38,200
thing happening, and then the
second happening, given that

37
00:01:38,200 --> 00:01:39,860
the first has happened.

38
00:01:39,860 --> 00:01:42,830
There's another relation between
this, which is the

39
00:01:42,830 --> 00:01:46,360
probability of x occurring, is
the sum of the different

40
00:01:46,360 --> 00:01:50,560
probabilities of the different
ways that x may occur, which

41
00:01:50,560 --> 00:01:53,700
is in conjunction with different
values of y.

42
00:01:53,700 --> 00:01:57,730
And there's an analog of all
that in the continuous world,

43
00:01:57,730 --> 00:02:02,430
where all you do is to replace
p's by f's, and replace sums

44
00:02:02,430 --> 00:02:03,340
by integrals.

45
00:02:03,340 --> 00:02:05,620
So the formulas all
look the same.

46
00:02:05,620 --> 00:02:09,120
The interpretations are a little
more subtle, so the f's

47
00:02:09,120 --> 00:02:11,720
are not probabilities, they're
probability densities.

48
00:02:11,720 --> 00:02:16,010
So they're probabilities per
unit length, or in the case of

49
00:02:16,010 --> 00:02:20,290
joint PDf's, these are
probabilities per unit area.

50
00:02:20,290 --> 00:02:22,690
So they're densities
of some sort.

51
00:02:22,690 --> 00:02:26,020
Probably the more subtle concept
to understand what it

52
00:02:26,020 --> 00:02:29,250
really is the conditional
density.

53
00:02:29,250 --> 00:02:30,590
In some sense, it's simple.

54
00:02:30,590 --> 00:02:34,900
It's just the density of X in
a world where you have been

55
00:02:34,900 --> 00:02:40,290
told the value of the random
variable Y. It's a function

56
00:02:40,290 --> 00:02:44,510
that has two arguments, but the
best way to think about it

57
00:02:44,510 --> 00:02:47,050
is to say that we fixed y.

58
00:02:47,050 --> 00:02:50,980
We're told the value of the
random variable Y, and we look

59
00:02:50,980 --> 00:02:52,930
at it as a function of x.

60
00:02:52,930 --> 00:02:56,150
So as a function of x, the
denominator is a constant, and

61
00:02:56,150 --> 00:02:59,650
it just looks like the
joint density.

62
00:02:59,650 --> 00:03:01,620
when we keep y fixed.

63
00:03:01,620 --> 00:03:05,570
So it's really a function of
one argument, just the

64
00:03:05,570 --> 00:03:06,870
argument x.

65
00:03:06,870 --> 00:03:10,080
And it has the same shape as the
joint's density when you

66
00:03:10,080 --> 00:03:11,720
take that slice of it.

67
00:03:11,720 --> 00:03:17,570
So conditional PDFs are just
slices of joint PDFs.

68
00:03:17,570 --> 00:03:20,810
There's a bunch of concepts,
expectations, variances,

69
00:03:20,810 --> 00:03:23,790
cumulative distribution
functions that apply equally

70
00:03:23,790 --> 00:03:26,260
well for to both universes
of discrete or

71
00:03:26,260 --> 00:03:28,800
continuous random variables.

72
00:03:28,800 --> 00:03:31,330
So why is probability useful?

73
00:03:31,330 --> 00:03:36,170
Probability is useful because,
among other things, we use it

74
00:03:36,170 --> 00:03:38,420
to make sense of the
world around us.

75
00:03:38,420 --> 00:03:41,870
We use it to make inferences
about things that we do not

76
00:03:41,870 --> 00:03:43,280
see directly.

77
00:03:43,280 --> 00:03:45,570
And this is done in a
very simple manner

78
00:03:45,570 --> 00:03:46,840
using the base rule.

79
00:03:46,840 --> 00:03:49,730
We've already seen some of that,
and now we're going to

80
00:03:49,730 --> 00:03:55,070
revisit it with a bunch of
different variations.

81
00:03:55,070 --> 00:03:58,240
And the variations come because
sometimes our random

82
00:03:58,240 --> 00:04:01,040
variable are discrete, sometimes
they're continuous,

83
00:04:01,040 --> 00:04:04,390
or we can have a combination
of the two.

84
00:04:04,390 --> 00:04:08,170
So the big picture is that
there's some unknown random

85
00:04:08,170 --> 00:04:11,660
variable out of there, and we
know the distribution that's

86
00:04:11,660 --> 00:04:12,550
random variable.

87
00:04:12,550 --> 00:04:16,360
And in the discrete case, it's
going to be given by PMF.

88
00:04:16,360 --> 00:04:20,269
In the continuous case,
it's given a PDF.

89
00:04:20,269 --> 00:04:24,060
Then we have some phenomenon,
some noisy phenomenon or some

90
00:04:24,060 --> 00:04:28,380
measuring device, and that
measuring device produces

91
00:04:28,380 --> 00:04:31,260
observable random variables Y.

92
00:04:31,260 --> 00:04:34,930
We don't know what x is, but we
have some beliefs about how

93
00:04:34,930 --> 00:04:36,310
X is distributed.

94
00:04:36,310 --> 00:04:39,450
We observe the random variable
Y. We need a

95
00:04:39,450 --> 00:04:41,300
model of this box.

96
00:04:41,300 --> 00:04:46,170
And the model of that box is
going to be either a PMF, for

97
00:04:46,170 --> 00:04:52,565
the random variable Y. And that
model tells us, if the

98
00:04:52,565 --> 00:04:57,080
true state of the world is X,
how do we expect to Y to be

99
00:04:57,080 --> 00:04:58,520
distributed?

100
00:04:58,520 --> 00:05:01,610
That's for the case where
Y is this discrete.

101
00:05:01,610 --> 00:05:06,350
If Y is a continuous, you might
instead have a density

102
00:05:06,350 --> 00:05:10,820
for Y, or something
of that form.

103
00:05:10,820 --> 00:05:13,980
So in either case, this
should be a function

104
00:05:13,980 --> 00:05:15,520
that's known to us.

105
00:05:15,520 --> 00:05:18,370
This is our model of the
measuring device.

106
00:05:18,370 --> 00:05:20,950
And now having observed
y, we want to make

107
00:05:20,950 --> 00:05:22,680
inferences about x.

108
00:05:22,680 --> 00:05:25,140
What does it mean to
make inferences?

109
00:05:25,140 --> 00:05:29,880
Well the most complete answer in
the inference problem is to

110
00:05:29,880 --> 00:05:32,380
tell me the probability
distribution

111
00:05:32,380 --> 00:05:34,830
of the unknown quantity.

112
00:05:34,830 --> 00:05:36,900
But when I say the probability
distribution, I

113
00:05:36,900 --> 00:05:38,540
don't mean this one.

114
00:05:38,540 --> 00:05:41,280
I mean the probability
distribution that takes into

115
00:05:41,280 --> 00:05:43,760
account the measurements
that you got.

116
00:05:43,760 --> 00:05:48,270
So the output of an inference
problem is to come up with the

117
00:05:48,270 --> 00:05:59,830
distribution of X, the unknown
quantity, given what we have

118
00:05:59,830 --> 00:06:00,980
already observed.

119
00:06:00,980 --> 00:06:04,110
And in the discrete case, it
would be an object like that.

120
00:06:04,110 --> 00:06:08,920
If X is continuous, it would
be an object of this kind.

121
00:06:13,340 --> 00:06:18,080
OK, so we're given conditional
probabilities of this type,

122
00:06:18,080 --> 00:06:21,240
and we want to get conditional
distributions of the opposite

123
00:06:21,240 --> 00:06:23,280
type where the order of the

124
00:06:23,280 --> 00:06:25,580
conditioning is being reversed.

125
00:06:25,580 --> 00:06:28,980
So the starting point
is always a formula

126
00:06:28,980 --> 00:06:30,810
such as this one.

127
00:06:30,810 --> 00:06:33,670
The probability of x happening,
and then y

128
00:06:33,670 --> 00:06:36,280
happening given that
x happens.

129
00:06:36,280 --> 00:06:40,910
This is the probability that
a particular x and y happen

130
00:06:40,910 --> 00:06:42,370
simultaneously.

131
00:06:42,370 --> 00:06:47,240
But this is also equal to the
probability that y happens,

132
00:06:47,240 --> 00:06:50,377
and then that x happens, given
that y has happened.

133
00:06:53,060 --> 00:06:57,140
And you take this expression
and send one term to the

134
00:06:57,140 --> 00:07:00,950
denominator of the other side,
and this gives us the base

135
00:07:00,950 --> 00:07:03,180
rule for the discrete case.

136
00:07:03,180 --> 00:07:05,550
Which is this one that you have
already seen, and you

137
00:07:05,550 --> 00:07:07,200
have played with it.

138
00:07:07,200 --> 00:07:10,720
So this is what the formula
looks like in

139
00:07:10,720 --> 00:07:12,030
the discrete case.

140
00:07:12,030 --> 00:07:14,570
And the typical example where
both random variables are

141
00:07:14,570 --> 00:07:18,000
discrete is the one we discussed
some time ago.

142
00:07:18,000 --> 00:07:20,720
X is, let's say, a binary
variable, or whether an

143
00:07:20,720 --> 00:07:22,960
airplane is present
up there or not.

144
00:07:22,960 --> 00:07:27,790
Y is a discrete measurement, for
example, whether our radar

145
00:07:27,790 --> 00:07:30,040
beeped or it didn't beep.

146
00:07:30,040 --> 00:07:33,860
And we make inferences and
calculate the probability that

147
00:07:33,860 --> 00:07:37,860
the plane is there, or the
probability that the plane is

148
00:07:37,860 --> 00:07:41,000
not there, given the measurement
that we have made.

149
00:07:41,000 --> 00:07:43,940
And of course X and Y do not
need to be just binary.

150
00:07:43,940 --> 00:07:47,480
They could be more general
discrete random variables.

151
00:07:47,480 --> 00:07:50,900
So how does the story change
in the continuous case?

152
00:07:50,900 --> 00:07:53,290
First, what's a possible
application of

153
00:07:53,290 --> 00:07:54,570
the continuous case?

154
00:07:54,570 --> 00:07:59,620
Well, think of X as being some
signal that takes values over

155
00:07:59,620 --> 00:08:00,630
a continuous range.

156
00:08:00,630 --> 00:08:04,730
Let's say X is the current
through a resistor.

157
00:08:04,730 --> 00:08:07,530
And then you have some measuring
device that measures

158
00:08:07,530 --> 00:08:11,530
currents, but that device is
noisy, it gets hit, let's say

159
00:08:11,530 --> 00:08:13,640
for example, by Gaussian
noise.

160
00:08:13,640 --> 00:08:18,340
And the Y that you observe is a
noisy version of X. But your

161
00:08:18,340 --> 00:08:22,410
instruments are analog, so
you measure things on

162
00:08:22,410 --> 00:08:24,750
a continuous scale.

163
00:08:24,750 --> 00:08:26,250
What are you going to
do in that case?

164
00:08:26,250 --> 00:08:29,920
Well the inference problem, the
output of the inference

165
00:08:29,920 --> 00:08:33,360
problem, is going to be the
conditional distribution of X.

166
00:08:33,360 --> 00:08:38,950
What do you think your current
is based on a particular value

167
00:08:38,950 --> 00:08:40,870
of Y that you have observed?

168
00:08:40,870 --> 00:08:44,480
So the output of our inference
problem is, given the specific

169
00:08:44,480 --> 00:08:48,560
value of Y, to calculate this
entire function as a function

170
00:08:48,560 --> 00:08:51,050
of x, and then go and plot it.

171
00:08:51,050 --> 00:08:53,570
How do we calculate it?

172
00:08:53,570 --> 00:08:57,410
You go through the same
calculation as in the discrete

173
00:08:57,410 --> 00:09:01,590
case, except that all of the
x's gets replaced by p's.

174
00:09:01,590 --> 00:09:04,630
In the continuous case, it's
equally true that the joint's

175
00:09:04,630 --> 00:09:07,790
density is the product of the
marginal density with the

176
00:09:07,790 --> 00:09:09,220
conditional density.

177
00:09:09,220 --> 00:09:11,400
So the formula is still
valid with just a

178
00:09:11,400 --> 00:09:13,160
little change of notation.

179
00:09:13,160 --> 00:09:16,480
So we end up with the same
formula here, except that we

180
00:09:16,480 --> 00:09:18,990
replace x's with p's.

181
00:09:18,990 --> 00:09:23,240
So all of these functions
are known to us.

182
00:09:23,240 --> 00:09:25,500
We have formulas for them.

183
00:09:25,500 --> 00:09:29,400
We fix a specific value of y,
we plug it in, so we're left

184
00:09:29,400 --> 00:09:30,640
with a function of x.

185
00:09:30,640 --> 00:09:33,420
And that gives us the posterior
distribution.

186
00:09:33,420 --> 00:09:38,130
Actually there's also a
denominator term that's not

187
00:09:38,130 --> 00:09:42,340
necessarily given to us, but we
can always calculate it if

188
00:09:42,340 --> 00:09:45,650
we have the marginal of X,
and we have the model for

189
00:09:45,650 --> 00:09:47,250
measuring device.

190
00:09:47,250 --> 00:09:50,960
Then we can always find the
marginal distribution of Y. So

191
00:09:50,960 --> 00:09:54,630
this quantity, that number, is
in general a known one, as

192
00:09:54,630 --> 00:09:58,490
well, and doesn't give
us any problems.

193
00:09:58,490 --> 00:10:03,140
So to complicate things a little
bit, we can also look

194
00:10:03,140 --> 00:10:07,610
into situations where our two
random variables are of

195
00:10:07,610 --> 00:10:09,080
different kinds.

196
00:10:09,080 --> 00:10:12,290
For example, one random variable
could be discrete,

197
00:10:12,290 --> 00:10:15,280
and the other it might
be continuous.

198
00:10:15,280 --> 00:10:17,340
And there's two versions.

199
00:10:17,340 --> 00:10:22,320
Here one version is when X is
discrete, but Y is continuous.

200
00:10:22,320 --> 00:10:25,130
What's an example of this?

201
00:10:25,130 --> 00:10:30,690
Well suppose that I send a
single bit of information so

202
00:10:30,690 --> 00:10:34,620
my X is 0 or 1.

203
00:10:34,620 --> 00:10:39,710
And what I measure is Y,
which is X plus, let's

204
00:10:39,710 --> 00:10:42,360
say, Gaussian noise.

205
00:10:48,960 --> 00:10:52,550
This is the standard example
that shows up in any textbook

206
00:10:52,550 --> 00:10:55,220
on communication, or
signal processing.

207
00:10:55,220 --> 00:10:58,530
You send a single bit, but what
you observe is a noisy

208
00:10:58,530 --> 00:11:02,120
version of that bit.

209
00:11:02,120 --> 00:11:05,150
You start with a model
of your x's.

210
00:11:05,150 --> 00:11:07,610
These would be your prior
probabilities.

211
00:11:07,610 --> 00:11:11,670
For example, you might be
believe that either 0 or 1 are

212
00:11:11,670 --> 00:11:16,250
equally likely, in which case
your PMF gives equal weight to

213
00:11:16,250 --> 00:11:18,320
two possible values.

214
00:11:18,320 --> 00:11:21,840
And then we need a model of
our measuring device.

215
00:11:21,840 --> 00:11:23,990
This is one specific model.

216
00:11:23,990 --> 00:11:28,090
The general model would have
a shape such as follows.

217
00:11:28,090 --> 00:11:37,560
Y has a distribution,
its density.

218
00:11:37,560 --> 00:11:41,590
And that density, however,
depends on the value of X.

219
00:11:41,590 --> 00:11:46,170
So when x is 0, we might get
a density of this kind.

220
00:11:46,170 --> 00:11:50,250
And when x is 1, we might
get the density

221
00:11:50,250 --> 00:11:52,210
of a different kind.

222
00:11:52,210 --> 00:11:57,010
So these are the conditional
densities of y in a universe

223
00:11:57,010 --> 00:11:59,730
that's specified by a particular
value of x.

224
00:12:04,660 --> 00:12:09,040
And then we go ahead and
do our inference.

225
00:12:09,040 --> 00:12:13,520
OK, what's the right formula
for doing this inference?

226
00:12:13,520 --> 00:12:18,270
We need a formula that's sort of
an analog of this one, but

227
00:12:18,270 --> 00:12:22,210
applies to the case where we
have two random variables of

228
00:12:22,210 --> 00:12:23,670
different kinds.

229
00:12:23,670 --> 00:12:29,370
So let me just redo this
calculation here.

230
00:12:29,370 --> 00:12:33,250
Except that I'm not going to
have a probability of taking

231
00:12:33,250 --> 00:12:34,340
specific values.

232
00:12:34,340 --> 00:12:36,800
It will have to be something
a little different.

233
00:12:36,800 --> 00:12:39,250
So here's how it goes.

234
00:12:39,250 --> 00:12:44,340
Let's look at the probability
that X takes a specific value

235
00:12:44,340 --> 00:12:47,510
that makes sense in the discrete
case, but for the

236
00:12:47,510 --> 00:12:50,040
continuous random variable,
let's look at the probability

237
00:12:50,040 --> 00:12:53,480
that it takes values in
some little interval.

238
00:12:53,480 --> 00:12:55,940
And now this probability of
two things happening, I'm

239
00:12:55,940 --> 00:12:57,520
going to write it
as a product.

240
00:12:57,520 --> 00:12:59,450
And I'm going to write
this as a product in

241
00:12:59,450 --> 00:13:01,350
two different ways.

242
00:13:01,350 --> 00:13:09,360
So one way is to say that this
is the probability that X

243
00:13:09,360 --> 00:13:13,670
takes that value and then given
that X takes that value,

244
00:13:13,670 --> 00:13:19,310
the probability that Y falls
inside that interval.

245
00:13:19,310 --> 00:13:21,810
So this is our usual
multiplication rule for

246
00:13:21,810 --> 00:13:25,330
multiplying probabilities, but
I can use the multiplication

247
00:13:25,330 --> 00:13:27,610
rule also in a different way.

248
00:13:27,610 --> 00:13:30,210
It's the probability
that Y falls in

249
00:13:30,210 --> 00:13:33,460
the range of interest.

250
00:13:33,460 --> 00:13:36,990
And then the probability that X
takes the value of interest

251
00:13:36,990 --> 00:13:41,145
given that Y satisfies
the first condition.

252
00:13:45,960 --> 00:13:53,760
So this is something that's
definitely true.

253
00:13:53,760 --> 00:13:57,410
We're just using the
multiplication rule.

254
00:13:57,410 --> 00:14:02,240
And now let's translate it
into PMF is PDF notation.

255
00:14:02,240 --> 00:14:07,130
So the entry up there is the
PMF of X evaluated at x.

256
00:14:07,130 --> 00:14:10,030
The second entry, what is it?

257
00:14:10,030 --> 00:14:12,230
Well probabilities of
little intervals are

258
00:14:12,230 --> 00:14:13,480
given to us by densities.

259
00:14:16,010 --> 00:14:19,160
But we are in the conditional
universe where X takes on a

260
00:14:19,160 --> 00:14:20,430
particular value.

261
00:14:20,430 --> 00:14:27,450
So it's going to be the density
of Y given the value

262
00:14:27,450 --> 00:14:30,210
of X times delta.

263
00:14:30,210 --> 00:14:32,790
So probabilities of little
intervals are given by the

264
00:14:32,790 --> 00:14:36,430
density times the length of
the little interval, but

265
00:14:36,430 --> 00:14:39,390
because we're working in the
conditional universe, it has

266
00:14:39,390 --> 00:14:41,230
to be the conditional density.

267
00:14:41,230 --> 00:14:43,860
Now let's try the second
expression.

268
00:14:43,860 --> 00:14:46,690
This is the probability
that the Y falls

269
00:14:46,690 --> 00:14:48,040
into the little interval.

270
00:14:48,040 --> 00:14:51,160
So that's the density
of Y times delta.

271
00:14:51,160 --> 00:14:53,950
And then here we have an
object which is the

272
00:14:53,950 --> 00:14:59,690
conditional probability X in a
universe where the value of Y

273
00:14:59,690 --> 00:15:00,940
is given to us.

274
00:15:04,900 --> 00:15:08,830
Now this relation is sort
of approximate.

275
00:15:08,830 --> 00:15:13,630
This is true for very small
delta in the limit.

276
00:15:13,630 --> 00:15:17,880
But we can cancel the deltas
from both sides, and we're

277
00:15:17,880 --> 00:15:21,800
left with a formula that links
together PMFs and PDFs.

278
00:15:21,800 --> 00:15:25,340
Now this may look terribly
confusing because there's both

279
00:15:25,340 --> 00:15:27,730
p's and f's involved.

280
00:15:27,730 --> 00:15:29,850
But the logic should be clear.

281
00:15:29,850 --> 00:15:32,590
If a random variable
is discrete, it's

282
00:15:32,590 --> 00:15:34,480
described by PMF.

283
00:15:34,480 --> 00:15:38,120
So here we're talking about
the PMF of X in some

284
00:15:38,120 --> 00:15:39,130
particular universe.

285
00:15:39,130 --> 00:15:41,210
X is discrete, so
it has a PMF.

286
00:15:41,210 --> 00:15:42,320
Similarly here.

287
00:15:42,320 --> 00:15:45,380
Y is continuous so it's
described by a PDF.

288
00:15:45,380 --> 00:15:47,840
And even in the conditional
universe where I tell you the

289
00:15:47,840 --> 00:15:50,900
value of X, Y is still a
continuous random variable, so

290
00:15:50,900 --> 00:15:53,280
it's been described by a PDF.

291
00:15:53,280 --> 00:15:55,430
So this is the basic
relation that links

292
00:15:55,430 --> 00:15:57,360
together PMF and PDFs.

293
00:15:57,360 --> 00:15:59,080
In this mixed the world.

294
00:15:59,080 --> 00:16:04,270
And now in this inequality,
you can take this term and

295
00:16:04,270 --> 00:16:07,830
send it to the new denominator
to the other side.

296
00:16:07,830 --> 00:16:10,070
And what you end up with
is the formula

297
00:16:10,070 --> 00:16:11,830
that we have up here.

298
00:16:11,830 --> 00:16:15,640
And this is a formula that we
can use to make inferences

299
00:16:15,640 --> 00:16:18,780
about the discrete random
variable X when we're told the

300
00:16:18,780 --> 00:16:26,540
value of the continuous random
variable Y. The probability

301
00:16:26,540 --> 00:16:29,690
that X takes on a particular
value has something

302
00:16:29,690 --> 00:16:31,330
to do with the prior.

303
00:16:31,330 --> 00:16:36,520
And other than that, it's
proportional to this quantity,

304
00:16:36,520 --> 00:16:41,720
the conditional of Y given X.
So these are the quantities

305
00:16:41,720 --> 00:16:43,190
that we plotted here.

306
00:16:43,190 --> 00:16:47,550
Suppose that the x's are equally
likely in your prior,

307
00:16:47,550 --> 00:16:50,210
so we don't really care
about that term.

308
00:16:50,210 --> 00:16:55,530
It tells us that the posterior
of X is proportional to that

309
00:16:55,530 --> 00:16:58,520
particular density under
the given x's.

310
00:16:58,520 --> 00:17:03,350
So in this picture, if I were to
get a particular y here, I

311
00:17:03,350 --> 00:17:07,200
would say that x equals 1
has a probability that's

312
00:17:07,200 --> 00:17:09,220
proportional to this quantity.

313
00:17:09,220 --> 00:17:11,470
x equals 0 has a probability
that's

314
00:17:11,470 --> 00:17:13,599
proportional to this quantity.

315
00:17:13,599 --> 00:17:16,910
So the ratio of these two
quantities gives us the

316
00:17:16,910 --> 00:17:21,200
relative odds of the different
x's given the y

317
00:17:21,200 --> 00:17:24,010
that we have observed.

318
00:17:24,010 --> 00:17:28,099
So we're going to come back to
this topic and redo plenty of

319
00:17:28,099 --> 00:17:31,350
examples of these kinds towards
the end of the class,

320
00:17:31,350 --> 00:17:34,480
when we spend some
time dedicated

321
00:17:34,480 --> 00:17:36,130
to inference problems.

322
00:17:36,130 --> 00:17:39,890
But already at this stage, we
sort of have the basic skills

323
00:17:39,890 --> 00:17:42,000
to deal with a lot of that.

324
00:17:42,000 --> 00:17:43,840
And it's useful at this
point to pull all

325
00:17:43,840 --> 00:17:45,610
the formulas together.

326
00:17:45,610 --> 00:17:49,990
So finally let's look at the
last case that's remaining.

327
00:17:49,990 --> 00:17:54,440
Here we have a continuous
phenomenon that we're trying

328
00:17:54,440 --> 00:17:57,770
to measure, but our measurements
are discrete.

329
00:17:57,770 --> 00:18:00,780
What's an example where
this might happen?

330
00:18:00,780 --> 00:18:05,270
So you have some device that
emits light, and you drive it

331
00:18:05,270 --> 00:18:07,500
with a current that has
a certain intensity.

332
00:18:07,500 --> 00:18:09,910
You don't know what that
current is, and it's a

333
00:18:09,910 --> 00:18:12,120
continuous random variable.

334
00:18:12,120 --> 00:18:14,600
But the device emits
light by sending

335
00:18:14,600 --> 00:18:16,580
out individual photons.

336
00:18:16,580 --> 00:18:20,480
And your measurement is some
other device that counts how

337
00:18:20,480 --> 00:18:23,250
many photons did you get
in a single second.

338
00:18:23,250 --> 00:18:28,020
So if we have devices that emit
a very low intensity you

339
00:18:28,020 --> 00:18:31,720
can actually start counting
individual photons as they're

340
00:18:31,720 --> 00:18:32,980
being observed.

341
00:18:32,980 --> 00:18:35,390
So we have a discrete
measurement, which is the

342
00:18:35,390 --> 00:18:38,920
number of problems, and we
have a continuous hidden

343
00:18:38,920 --> 00:18:43,060
random variable that we're
trying to estimate.

344
00:18:43,060 --> 00:18:45,790
What do we do in this case?

345
00:18:45,790 --> 00:18:52,600
Well we start again with a
formula of this kind, and send

346
00:18:52,600 --> 00:18:55,560
the p term to the denominator.

347
00:18:55,560 --> 00:18:58,180
And that's the formula that we
use there, except that the

348
00:18:58,180 --> 00:19:01,100
roles of x's and y's
are interchanged.

349
00:19:01,100 --> 00:19:06,810
So since here we have Y being
discrete, we should change all

350
00:19:06,810 --> 00:19:07,590
the subscripts.

351
00:19:07,590 --> 00:19:15,490
It would be p_Y f_X given
y f_X, and P(Y given X).

352
00:19:15,490 --> 00:19:19,230
So just change all
those subscripts.

353
00:19:19,230 --> 00:19:22,740
Because now what we're used to
be continuous became discrete,

354
00:19:22,740 --> 00:19:25,310
and vice versa.

355
00:19:25,310 --> 00:19:27,360
Take that formula, send
the other terms to the

356
00:19:27,360 --> 00:19:32,140
denominator, and we have a
formula for the density, or X,

357
00:19:32,140 --> 00:19:34,370
given the particular
measurements for Y that we

358
00:19:34,370 --> 00:19:36,350
have obtained.

359
00:19:36,350 --> 00:19:41,420
In some sense that's all there
is in Bayesian inference.

360
00:19:41,420 --> 00:19:46,540
It's using these very simple
one line formulas.

361
00:19:46,540 --> 00:19:51,210
But why are there people then
who make their living solving

362
00:19:51,210 --> 00:19:52,550
inference problems?

363
00:19:52,550 --> 00:19:54,990
Well, the devil is
in the details.

364
00:19:54,990 --> 00:19:57,460
As we're going to discuss,
there are some real world

365
00:19:57,460 --> 00:20:01,150
issues of how exactly do you
design your f's, how do you

366
00:20:01,150 --> 00:20:04,680
model your system, then how do
you do your calculations.

367
00:20:04,680 --> 00:20:06,940
This might not be always easy.

368
00:20:06,940 --> 00:20:09,710
For example, there's certain
integrals or sums that have to

369
00:20:09,710 --> 00:20:12,900
be evaluated, which may be
hard to do and so on.

370
00:20:12,900 --> 00:20:14,900
So this object is
a lot of richer

371
00:20:14,900 --> 00:20:16,730
than just these formulas.

372
00:20:16,730 --> 00:20:21,270
On the other hand, at the
conceptual level, that's the

373
00:20:21,270 --> 00:20:23,910
basis for Bayesian inference,
that these

374
00:20:23,910 --> 00:20:25,160
are the basic concepts.

375
00:20:27,570 --> 00:20:30,850
All right, so now let's change
gear and move to the new

376
00:20:30,850 --> 00:20:36,180
subject, which is the topic of
finding the distribution of a

377
00:20:36,180 --> 00:20:38,360
functional for a random
variable.

378
00:20:38,360 --> 00:20:42,820
We call those distributions
derived distributions, because

379
00:20:42,820 --> 00:20:45,480
we're given the distribution
of X. We're interested in a

380
00:20:45,480 --> 00:20:48,980
function of X. We want to derive
the distribution of

381
00:20:48,980 --> 00:20:51,020
that function based on
the distribution

382
00:20:51,020 --> 00:20:53,060
that we already know.

383
00:20:53,060 --> 00:20:56,610
So it could be a function of
just one random variable.

384
00:20:56,610 --> 00:20:59,170
It could be a function of
several random variables.

385
00:20:59,170 --> 00:21:02,880
So one example that we are going
to solve at some point,

386
00:21:02,880 --> 00:21:05,830
let's say you have to run the
variables X and Y. Somebody

387
00:21:05,830 --> 00:21:09,055
tells you their distribution,
for example, is a uniform of

388
00:21:09,055 --> 00:21:10,000
the square.

389
00:21:10,000 --> 00:21:12,120
For some reason, you're
interested in the ratio of

390
00:21:12,120 --> 00:21:14,660
these two random variables,
and you want to find the

391
00:21:14,660 --> 00:21:16,910
distribution of that ratio.

392
00:21:16,910 --> 00:21:21,810
You can think of lots of cases
where your random variable of

393
00:21:21,810 --> 00:21:25,950
interest is created by taking
some other unknown variables

394
00:21:25,950 --> 00:21:27,570
and taking a function of them.

395
00:21:27,570 --> 00:21:31,170
And so it's legitimate to care
about the distribution of that

396
00:21:31,170 --> 00:21:33,310
random variable.

397
00:21:33,310 --> 00:21:35,560
A caveat, however.

398
00:21:35,560 --> 00:21:39,480
There's an important case where
you don't need to find

399
00:21:39,480 --> 00:21:41,840
the distribution of that
random variable.

400
00:21:41,840 --> 00:21:44,600
And this is when you want to
calculate the expectations.

401
00:21:44,600 --> 00:21:47,750
If all you care about is the
expected value of this

402
00:21:47,750 --> 00:21:50,580
function of the random
variables, you can work

403
00:21:50,580 --> 00:21:53,800
directly with the distribution
of the original random

404
00:21:53,800 --> 00:21:58,490
variables without ever having
to find the PDF of g.

405
00:21:58,490 --> 00:22:03,790
So you don't do unnecessary work
if it's not needed, but

406
00:22:03,790 --> 00:22:06,290
if it's needed, or if you're
asked to do it,

407
00:22:06,290 --> 00:22:08,470
then you just do it.

408
00:22:08,470 --> 00:22:13,040
So how do we find the
distribution of the function?

409
00:22:13,040 --> 00:22:17,690
As a warm-up, let's look
at the discrete case.

410
00:22:17,690 --> 00:22:21,120
Suppose that X is a discrete
random variable and takes

411
00:22:21,120 --> 00:22:22,550
certain values.

412
00:22:22,550 --> 00:22:27,070
We have a function g that
maps x's into y's.

413
00:22:27,070 --> 00:22:30,430
And we want to find the
probability mass function for

414
00:22:30,430 --> 00:22:31,930
Y.

415
00:22:31,930 --> 00:22:36,780
So for example, if I'm
interested in finding the

416
00:22:36,780 --> 00:22:41,020
probability that Y takes on
this particular value, how

417
00:22:41,020 --> 00:22:42,910
would they find it?

418
00:22:42,910 --> 00:22:46,890
Well I ask, what are the
different ways that these

419
00:22:46,890 --> 00:22:49,390
particular y value can happen?

420
00:22:49,390 --> 00:22:53,390
And the different ways that it
can happen is either if x

421
00:22:53,390 --> 00:22:56,800
takes this value, or if
X takes that value.

422
00:22:56,800 --> 00:23:02,650
So we identify this event in the
y space with that event in

423
00:23:02,650 --> 00:23:04,220
the x space.

424
00:23:04,220 --> 00:23:06,790
These two events
are identical.

425
00:23:06,790 --> 00:23:12,350
X falls in this set if and only
if Y falls in that set.

426
00:23:12,350 --> 00:23:15,060
Therefore, the probability of
Y falling in that set is the

427
00:23:15,060 --> 00:23:17,540
probability of X falling
in that set.

428
00:23:17,540 --> 00:23:20,890
The probability of X falling in
that set is just the sum of

429
00:23:20,890 --> 00:23:24,650
the individual probabilities
of the x's in this set.

430
00:23:24,650 --> 00:23:27,360
So we just add the probabilities
of the different

431
00:23:27,360 --> 00:23:31,300
x's where the summation is taken
over all x's that leads

432
00:23:31,300 --> 00:23:35,070
to that particular value of y.

433
00:23:35,070 --> 00:23:35,860
Very good.

434
00:23:35,860 --> 00:23:39,090
So that's all there is
in the discrete case.

435
00:23:39,090 --> 00:23:41,070
It's a very nice and simple.

436
00:23:41,070 --> 00:23:43,460
So let's transfer
these methods to

437
00:23:43,460 --> 00:23:45,810
the continuous case.

438
00:23:45,810 --> 00:23:47,890
Suppose we are in the
continuous case.

439
00:23:47,890 --> 00:23:52,140
Suppose that X and Y now can
take values anywhere.

440
00:23:52,140 --> 00:23:55,440
And I try to use same methods
and I ask, what is the

441
00:23:55,440 --> 00:24:00,340
probability that Y is going
to take this value?

442
00:24:00,340 --> 00:24:03,100
At least if the diagram is this
way, you would say this

443
00:24:03,100 --> 00:24:06,990
is the same as the probability
that X takes this value.

444
00:24:06,990 --> 00:24:10,220
So I can find the probability
of Y being this in terms of

445
00:24:10,220 --> 00:24:12,600
the probability of
X being that.

446
00:24:12,600 --> 00:24:14,610
Is this useful?

447
00:24:14,610 --> 00:24:16,480
In the continuous
case, it's not.

448
00:24:16,480 --> 00:24:19,830
Because in the continuous case,
any single value has 0

449
00:24:19,830 --> 00:24:21,020
probability.

450
00:24:21,020 --> 00:24:25,450
So what you're going to get out
of this argument is that

451
00:24:25,450 --> 00:24:29,530
the probability Y takes this
value is 0, is equal to the

452
00:24:29,530 --> 00:24:32,800
probability that X takes that
value which also 0.

453
00:24:32,800 --> 00:24:34,650
That doesn't help us.

454
00:24:34,650 --> 00:24:36,060
We want to do something more.

455
00:24:36,060 --> 00:24:40,650
We want to actually find,
perhaps, the density of Y, as

456
00:24:40,650 --> 00:24:43,550
opposed to the probabilities
of individual y's.

457
00:24:43,550 --> 00:24:47,620
So to find the density of Y,
you might argue as follows.

458
00:24:47,620 --> 00:24:51,100
I'm looking at an interval for
y, and I ask what's the

459
00:24:51,100 --> 00:24:53,510
probability of falling
in this interval.

460
00:24:53,510 --> 00:24:57,890
And you go back and find the
corresponding set of x's that

461
00:24:57,890 --> 00:25:02,090
leads to those y's, and equate
those two probabilities.

462
00:25:02,090 --> 00:25:04,960
The probability of all of those
y's collectively should

463
00:25:04,960 --> 00:25:09,710
be equal to the probability of
all of the x's that map into

464
00:25:09,710 --> 00:25:11,930
that interval collectively.

465
00:25:11,930 --> 00:25:16,010
And this way you can
relate the two.

466
00:25:16,010 --> 00:25:22,870
As far as the mechanics go, in
many cases it's easier to not

467
00:25:22,870 --> 00:25:26,670
to work with little intervals,
but instead to work with

468
00:25:26,670 --> 00:25:30,110
cumulative distribution
functions that used to work

469
00:25:30,110 --> 00:25:32,600
with sort of big intervals.

470
00:25:32,600 --> 00:25:35,460
So you can instead do
a different picture.

471
00:25:35,460 --> 00:25:38,250
Look at this set of y's.

472
00:25:38,250 --> 00:25:41,690
This is the set of y's
that are smaller

473
00:25:41,690 --> 00:25:43,200
than a certain value.

474
00:25:43,200 --> 00:25:46,990
The probability of this set
is given by the cumulative

475
00:25:46,990 --> 00:25:49,740
distribution of the
random variable Y.

476
00:25:49,740 --> 00:25:54,450
Now this set of y's gets
produced by some corresponding

477
00:25:54,450 --> 00:25:56,850
set of x's.

478
00:25:56,850 --> 00:26:04,120
Maybe these are the x's that
map into y's in that set.

479
00:26:04,120 --> 00:26:06,040
And then we argue as follows.

480
00:26:06,040 --> 00:26:08,870
The probability that the Y falls
in this interval is the

481
00:26:08,870 --> 00:26:12,600
same as the probability that
X falls in that interval.

482
00:26:12,600 --> 00:26:15,810
So the event of Y falling here
and the event of X falling

483
00:26:15,810 --> 00:26:19,330
there are the same, so their
probabilities must be equal.

484
00:26:19,330 --> 00:26:22,010
And then I do the calculations
here.

485
00:26:22,010 --> 00:26:25,050
And I end up getting the
cumulative distribution

486
00:26:25,050 --> 00:26:28,760
function of Y. Once I have the
cumulative, I can get the

487
00:26:28,760 --> 00:26:31,670
density by just differentiating.

488
00:26:31,670 --> 00:26:34,900
So this is the general cookbook
procedure that we

489
00:26:34,900 --> 00:26:37,886
will be using to calculate
it derived distributions.

490
00:26:40,450 --> 00:26:43,500
We're interested in a random
variable Y, which is a

491
00:26:43,500 --> 00:26:45,320
function of the x's.

492
00:26:45,320 --> 00:26:50,070
We will aim at obtaining the
cumulative distribution of Y.

493
00:26:50,070 --> 00:26:54,040
Somehow, manage to calculate the
probability of this event.

494
00:26:54,040 --> 00:26:58,120
Once we get it, and what I mean
by get it, I don't mean

495
00:26:58,120 --> 00:27:00,980
getting it for a single
value of little y.

496
00:27:00,980 --> 00:27:04,640
You need to get this
for all little y's.

497
00:27:04,640 --> 00:27:07,930
So you need to get the
function itself, the

498
00:27:07,930 --> 00:27:09,480
cumulative distribution.

499
00:27:09,480 --> 00:27:12,750
Once you get it in that form,
then you can calculate the

500
00:27:12,750 --> 00:27:15,260
derivative at any particular
point.

501
00:27:15,260 --> 00:27:18,000
And this is going to give
you the density of Y.

502
00:27:18,000 --> 00:27:19,690
So a simple two-step
procedure.

503
00:27:19,690 --> 00:27:24,050
The devil is in the details of
how you carry the mechanics.

504
00:27:24,050 --> 00:27:27,580
So let's do one first example.

505
00:27:27,580 --> 00:27:31,020
Suppose that X is a uniform
random variable, takes values

506
00:27:31,020 --> 00:27:32,660
between 0 and 2.

507
00:27:32,660 --> 00:27:35,605
We're interested in the random
variable Y, which is the cube

508
00:27:35,605 --> 00:27:37,500
of X. What kind of distribution

509
00:27:37,500 --> 00:27:38,840
is it going to have?

510
00:27:38,840 --> 00:27:44,960
Now first notice that Y takes
values between 0 and 8.

511
00:27:44,960 --> 00:27:48,810
So X is uniform, so all the
x's are equally likely.

512
00:27:51,680 --> 00:27:55,340
You might then say, well, in
that case, all the y's should

513
00:27:55,340 --> 00:27:56,740
be equally likely.

514
00:27:56,740 --> 00:28:00,630
So Y might also have a
uniform distribution.

515
00:28:00,630 --> 00:28:02,210
Is this true?

516
00:28:02,210 --> 00:28:04,040
We'll find out.

517
00:28:04,040 --> 00:28:06,990
So let's start applying the
cookbook procedure.

518
00:28:06,990 --> 00:28:10,410
We want to find first the
cumulative distribution of the

519
00:28:10,410 --> 00:28:14,890
random variable Y, which by
definition is the probability

520
00:28:14,890 --> 00:28:17,370
that the random variable is
less than or equal to a

521
00:28:17,370 --> 00:28:18,850
certain number.

522
00:28:18,850 --> 00:28:20,680
That's what we want to find.

523
00:28:20,680 --> 00:28:24,440
What we have in our hands is the
distribution of X. That's

524
00:28:24,440 --> 00:28:26,320
what we need to work with.

525
00:28:26,320 --> 00:28:30,090
So the first step that you need
to do is to look at this

526
00:28:30,090 --> 00:28:33,680
events and translate it, and
write it in terms of the

527
00:28:33,680 --> 00:28:39,040
random variable about which you
know you have information.

528
00:28:39,040 --> 00:28:44,320
So Y is X cubed, so this event
is the same as that event.

529
00:28:44,320 --> 00:28:46,760
So now we can forget
about the y's.

530
00:28:46,760 --> 00:28:49,860
It's just an exercise involving
a single random

531
00:28:49,860 --> 00:28:52,750
variable with a known
distribution and we want to

532
00:28:52,750 --> 00:28:56,610
calculate the probability
of some event.

533
00:28:56,610 --> 00:28:58,780
So we're looking
at this event.

534
00:28:58,780 --> 00:29:02,230
X cubed being less than or equal
to Y. We massage that

535
00:29:02,230 --> 00:29:06,130
expression so that's it involves
X directly, so let's

536
00:29:06,130 --> 00:29:08,960
take cubic roots of both sides
of this inequality.

537
00:29:08,960 --> 00:29:12,130
This event is the same as the
event that X is less than or

538
00:29:12,130 --> 00:29:14,820
equal to Y to the 1/3.

539
00:29:14,820 --> 00:29:19,300
Now with a uniform distribution
on [0,2], what is

540
00:29:19,300 --> 00:29:22,070
that probability going to be?

541
00:29:22,070 --> 00:29:27,710
It's the probability of being in
the interval from 0 to y to

542
00:29:27,710 --> 00:29:34,680
the 1/3, so it's going to be in
the area under the uniform

543
00:29:34,680 --> 00:29:37,010
going up to that point.

544
00:29:37,010 --> 00:29:39,315
And what's the area under
that uniform?

545
00:29:42,650 --> 00:29:44,290
So here's x.

546
00:29:44,290 --> 00:29:50,810
Here is the distribution
of X. It goes up to 2.

547
00:29:50,810 --> 00:29:53,330
The distribution of
X is this one.

548
00:29:53,330 --> 00:29:56,860
We want to go up to
y to the 1/3.

549
00:29:56,860 --> 00:30:02,390
So the probability for this
event happening is this area.

550
00:30:02,390 --> 00:30:06,590
And the area is equal to the
base, which is y to the 1/3

551
00:30:06,590 --> 00:30:08,250
times the height.

552
00:30:08,250 --> 00:30:09,720
What is the height?

553
00:30:09,720 --> 00:30:13,480
Well since the density must
integrate to 1, the total area

554
00:30:13,480 --> 00:30:15,340
under the curve has to be 1.

555
00:30:15,340 --> 00:30:19,660
So the height here is 1/2, and
that explains why we get the

556
00:30:19,660 --> 00:30:22,530
1/2 factor down there.

557
00:30:22,530 --> 00:30:24,900
So that's the formula for the
cumulative distribution.

558
00:30:24,900 --> 00:30:26,070
And then the rest is easy.

559
00:30:26,070 --> 00:30:28,340
You just take derivatives.

560
00:30:28,340 --> 00:30:32,650
You differentiate this
expression with respect to y

561
00:30:32,650 --> 00:30:36,240
1/2 times 1/3, and y
drops by one power.

562
00:30:36,240 --> 00:30:39,670
So you get y to 2/3 in
the denominator.

563
00:30:39,670 --> 00:30:55,490
So if you wish to plot this,
it's 1/y to the 2/3.

564
00:30:55,490 --> 00:31:00,480
So when y goes to 0, it sort
of blows up and it

565
00:31:00,480 --> 00:31:03,090
goes on this way.

566
00:31:03,090 --> 00:31:06,090
Is this picture correct
the way I've drawn it?

567
00:31:08,900 --> 00:31:11,256
What's wrong with it?

568
00:31:11,256 --> 00:31:12,630
[? AUDIENCE:  Something. ?]

569
00:31:12,630 --> 00:31:13,420
PROFESSOR: Yes.

570
00:31:13,420 --> 00:31:17,610
y only takes values
from 0 to 8.

571
00:31:17,610 --> 00:31:21,890
This formula that I wrote here
is only correct when the

572
00:31:21,890 --> 00:31:25,000
preview picture applies.

573
00:31:25,000 --> 00:31:31,070
I took my y to the 1/3 to
be between 0 and 2.

574
00:31:31,070 --> 00:31:40,650
So this formula here is only
correct for y between 0 and 8.

575
00:31:43,770 --> 00:31:46,610
And for that reason, the formula
for the derivative is

576
00:31:46,610 --> 00:31:50,700
also true only for a
y between 0 and 8.

577
00:31:50,700 --> 00:31:55,630
And any other values of why are
impossible, so they get

578
00:31:55,630 --> 00:31:57,880
zero density.

579
00:31:57,880 --> 00:32:04,070
So to complete the picture
here, the PDF of y has a

580
00:32:04,070 --> 00:32:09,290
cut-off of 8, and it's also
0 everywhere else.

581
00:32:13,330 --> 00:32:16,640
And one thing that we see is
that the distribution of Y is

582
00:32:16,640 --> 00:32:17,980
not uniform.

583
00:32:17,980 --> 00:32:24,240
Certain y's are more likely than
others, even though we

584
00:32:24,240 --> 00:32:26,130
started with a uniform random

585
00:32:26,130 --> 00:32:32,240
variable X. All right.

586
00:32:32,240 --> 00:32:36,530
So we will keep doing examples
of this kind, a sequence of

587
00:32:36,530 --> 00:32:40,350
progressively more interesting
or more complicated.

588
00:32:40,350 --> 00:32:42,530
So that's going to continue
in the next lecture.

589
00:32:42,530 --> 00:32:45,930
You're going to see plenty of
examples in your recitations

590
00:32:45,930 --> 00:32:48,060
and tutorials and so on.

591
00:32:48,060 --> 00:32:52,420
So let's do one that's pretty
similar to the one that we

592
00:32:52,420 --> 00:32:57,730
did, but it's going to add to
just a small twist in how we

593
00:32:57,730 --> 00:33:00,470
do the mechanics.

594
00:33:00,470 --> 00:33:02,780
OK so you set your
cruise control

595
00:33:02,780 --> 00:33:04,010
when you start driving.

596
00:33:04,010 --> 00:33:06,310
And you keep driving at the
constants based at the

597
00:33:06,310 --> 00:33:07,870
constant speed.

598
00:33:07,870 --> 00:33:09,980
Where you set your cruise
control is somewhere

599
00:33:09,980 --> 00:33:11,660
between 30 and 60.

600
00:33:11,660 --> 00:33:14,520
You're going to drive
a distance of 200.

601
00:33:14,520 --> 00:33:18,660
And so the time it's going to
take for your trip is 200 over

602
00:33:18,660 --> 00:33:20,530
the setting of your
cruise control.

603
00:33:20,530 --> 00:33:22,610
So it's 200/V.

604
00:33:22,610 --> 00:33:26,210
Somebody gives you the
distribution of V, and they

605
00:33:26,210 --> 00:33:29,490
tell you not only it's between
30 and 60, it's roughly

606
00:33:29,490 --> 00:33:33,530
equally likely to be anything
between 30 and 60, so we have

607
00:33:33,530 --> 00:33:36,280
a uniform distribution
over that range.

608
00:33:36,280 --> 00:33:40,060
So we have a distribution of
V. We want to find the

609
00:33:40,060 --> 00:33:43,460
distribution of the random
variable T, which is the time

610
00:33:43,460 --> 00:33:46,540
it takes till your trip ends.

611
00:33:49,200 --> 00:33:51,790
So how are we going
to proceed?

612
00:33:51,790 --> 00:33:55,170
We'll use the exact same
cookbook procedure.

613
00:33:55,170 --> 00:33:57,360
We're going to start by
finding the cumulative

614
00:33:57,360 --> 00:34:02,920
distribution of T.
What is this?

615
00:34:02,920 --> 00:34:05,730
By definition, the cumulative
distribution is the

616
00:34:05,730 --> 00:34:10,230
probability that T is less
than a certain number.

617
00:34:10,230 --> 00:34:12,070
OK.

618
00:34:12,070 --> 00:34:15,340
Now we don't know the
distribution of T, so we

619
00:34:15,340 --> 00:34:17,989
cannot to work with these
event directly.

620
00:34:17,989 --> 00:34:21,960
But we take that event and
translate it into T-space.

621
00:34:21,960 --> 00:34:28,205
So we replace the t's by what we
know T to be in terms of V

622
00:34:28,205 --> 00:34:28,271
or

623
00:34:28,271 --> 00:34:33,565
the v's All right.

624
00:34:36,230 --> 00:34:39,659
So we have the distribution
of V. So now let's

625
00:34:39,659 --> 00:34:41,739
calculate this quantity.

626
00:34:41,739 --> 00:34:42,179
OK.

627
00:34:42,179 --> 00:34:46,210
Let's massage this event and
rewrite it as the probability

628
00:34:46,210 --> 00:35:06,880
that V is larger or
equal to 200/T.

629
00:35:06,880 --> 00:35:10,870
So what is this going to be?

630
00:35:10,870 --> 00:35:14,400
So let's say that 200/T
is some number that

631
00:35:14,400 --> 00:35:16,015
falls inside the range.

632
00:35:19,150 --> 00:35:24,630
So that's going to be true if
200/T is bigger than 30, and

633
00:35:24,630 --> 00:35:26,610
less than 60.

634
00:35:26,610 --> 00:35:37,110
Which means that t is
less than 30/200.

635
00:35:37,110 --> 00:35:38,360
No, 200/30.

636
00:35:41,300 --> 00:35:44,570
And bigger than 200/60.

637
00:35:44,570 --> 00:35:51,360
So for t's inside that range,
this number 200/t falls inside

638
00:35:51,360 --> 00:35:52,230
that range.

639
00:35:52,230 --> 00:35:55,960
This is the range of t's that
are possible, given the

640
00:35:55,960 --> 00:35:59,240
description of the problem
the we have set up.

641
00:35:59,240 --> 00:36:04,940
So for t's in that range, what
is the probability that V is

642
00:36:04,940 --> 00:36:07,900
bigger than this number?

643
00:36:07,900 --> 00:36:11,550
So V being bigger than that
number is the probability of

644
00:36:11,550 --> 00:36:17,000
this event, so it's going to be
the area under this curve.

645
00:36:17,000 --> 00:36:22,880
So the area under that curve
is the height of the curve,

646
00:36:22,880 --> 00:36:27,300
which is 1/3 over 30
times the base.

647
00:36:27,300 --> 00:36:28,910
How big is the base?

648
00:36:28,910 --> 00:36:33,060
Well it's from that point to 60,
so the base has a length

649
00:36:33,060 --> 00:36:36,500
of 60 minus 200/t.

650
00:36:45,470 --> 00:36:50,580
And this is a formula which is
valid for those t's for which

651
00:36:50,580 --> 00:36:52,420
this picture is correct.

652
00:36:52,420 --> 00:36:57,410
And this picture is correct if
200/T happens to fall in this

653
00:36:57,410 --> 00:37:01,540
interval, which is the same as
T falling in that interval,

654
00:37:01,540 --> 00:37:03,980
which are the t's that
are possible.

655
00:37:03,980 --> 00:37:07,390
So finally let's find the
density of T, which is what

656
00:37:07,390 --> 00:37:09,430
we're looking for.

657
00:37:09,430 --> 00:37:12,450
We find this by taking the
derivative in this expression

658
00:37:12,450 --> 00:37:14,370
with respect to t.

659
00:37:14,370 --> 00:37:18,150
We only get one term
from here.

660
00:37:18,150 --> 00:37:26,045
And this is going to be 200/30,
1 over t squared.

661
00:37:30,820 --> 00:37:34,020
And this is the formula for
the density for t's in the

662
00:37:34,020 --> 00:37:35,270
allowed to range.

663
00:37:46,890 --> 00:37:51,130
OK, so that's the end of the
solution to this particular

664
00:37:51,130 --> 00:37:52,880
problem as well.

665
00:37:52,880 --> 00:37:55,640
I said that there was a little
twist compared to

666
00:37:55,640 --> 00:37:57,130
the previous one.

667
00:37:57,130 --> 00:37:58,410
What was the twist?

668
00:37:58,410 --> 00:38:01,380
Well the twist was that in the
previous problem we dealt with

669
00:38:01,380 --> 00:38:05,580
the X cubed function, which was
monotonically increasing.

670
00:38:05,580 --> 00:38:07,760
Here we dealt with the
function that was

671
00:38:07,760 --> 00:38:09,850
monotonically decreasing.

672
00:38:09,850 --> 00:38:13,850
So when we had to find the
probability that T is less

673
00:38:13,850 --> 00:38:17,220
than something, that translated
into an event that

674
00:38:17,220 --> 00:38:19,640
V was bigger than something.

675
00:38:19,640 --> 00:38:22,410
Your time is less than something
if and only if your

676
00:38:22,410 --> 00:38:25,090
velocity is bigger
than something.

677
00:38:25,090 --> 00:38:27,510
So for when you're dealing
with the monotonically

678
00:38:27,510 --> 00:38:31,950
decreasing function, at some
point some inequalities will

679
00:38:31,950 --> 00:38:33,200
have to get reversed.

680
00:38:38,540 --> 00:38:43,700
Finally let's look at
a very useful one.

681
00:38:43,700 --> 00:38:47,990
Which is the case where we take
a linear function of a

682
00:38:47,990 --> 00:38:49,700
random variable.

683
00:38:49,700 --> 00:38:55,810
So X is a random variable with
given distribution, and we can

684
00:38:55,810 --> 00:38:57,110
see there is a linear
function.

685
00:38:57,110 --> 00:38:59,920
So in this particular instance,
we take a to be

686
00:38:59,920 --> 00:39:03,590
equal to 2 and b equal to 5.

687
00:39:03,590 --> 00:39:08,680
And let us first argue
just by picture.

688
00:39:08,680 --> 00:39:13,920
So X is a random variable that
has a given distribution.

689
00:39:13,920 --> 00:39:16,150
Let's say it's this
weird shape here.

690
00:39:16,150 --> 00:39:20,170
And x ranges from -1 to +2.

691
00:39:20,170 --> 00:39:22,140
Let's do things one
step at the time.

692
00:39:22,140 --> 00:39:26,190
Let's first find the
distribution of 2X.

693
00:39:26,190 --> 00:39:28,960
Why do you think you
know about 2X?

694
00:39:28,960 --> 00:39:35,330
Well if x ranges from -1 to 2,
then the random variable X is

695
00:39:35,330 --> 00:39:36,580
going to range from -2 to +4.

696
00:39:39,560 --> 00:39:42,360
So that's what the range
is going to be.

697
00:39:42,360 --> 00:39:48,840
Now dealing with the random
variable 2X, as opposed to the

698
00:39:48,840 --> 00:39:52,520
random variable X, in some sense
it's just changing the

699
00:39:52,520 --> 00:39:55,270
units in which we measure
that random variable.

700
00:39:55,270 --> 00:39:58,130
It's just changing the
scale on which we

701
00:39:58,130 --> 00:39:59,730
draw and plot things.

702
00:39:59,730 --> 00:40:03,180
So if it's just a scale change,
then intuition should

703
00:40:03,180 --> 00:40:08,120
tell you that the random
variable X should have a PDF

704
00:40:08,120 --> 00:40:12,850
of the same shape, except that
it's scaled out by a factor of

705
00:40:12,850 --> 00:40:16,540
2, because our random variable
of 2X now has a range that's

706
00:40:16,540 --> 00:40:18,570
twice as large.

707
00:40:18,570 --> 00:40:23,720
So we take the same PDF and
scale it up by stretching the

708
00:40:23,720 --> 00:40:26,790
x-axis by a factor of 2.

709
00:40:26,790 --> 00:40:30,330
So what does scaling
correspond to

710
00:40:30,330 --> 00:40:33,870
in terms of a formula?

711
00:40:33,870 --> 00:40:39,500
So the distribution of 2X as a
function, let's say, a generic

712
00:40:39,500 --> 00:40:45,760
argument z, is going to be the
distribution of X, but scaled

713
00:40:45,760 --> 00:40:47,010
by a factor of 2.

714
00:40:50,060 --> 00:40:54,100
So taking a function and
replacing its arguments by the

715
00:40:54,100 --> 00:40:58,740
argument over 2, what it
does is it stretches it

716
00:40:58,740 --> 00:41:00,430
by a factor of 2.

717
00:41:00,430 --> 00:41:04,410
You have probably been tortured
ever since middle

718
00:41:04,410 --> 00:41:08,150
school to figure out when need
to stretch a function, whether

719
00:41:08,150 --> 00:41:12,470
you need to put 2z or z/2.

720
00:41:12,470 --> 00:41:15,450
And the one that actually does
the stretching is to put the

721
00:41:15,450 --> 00:41:18,000
z/2 in that place.

722
00:41:18,000 --> 00:41:21,180
So that's what the
stretching does.

723
00:41:21,180 --> 00:41:23,670
Could that to be the
full answer?

724
00:41:23,670 --> 00:41:24,930
Well there's a catch.

725
00:41:24,930 --> 00:41:29,730
If you stretch this function by
a factor of 2, what happens

726
00:41:29,730 --> 00:41:32,100
to the area under
the function?

727
00:41:32,100 --> 00:41:34,120
It's going to get doubled.

728
00:41:34,120 --> 00:41:38,670
But the total probability must
add up to 1, so we need to do

729
00:41:38,670 --> 00:41:41,840
something else to make sure that
the area under the curve

730
00:41:41,840 --> 00:41:44,300
stays to 1.

731
00:41:44,300 --> 00:41:47,980
So we need to take that function
and scale it down by

732
00:41:47,980 --> 00:41:51,720
this factor of 2.

733
00:41:51,720 --> 00:41:55,580
So when you're dealing with a
multiple of a random variable,

734
00:41:55,580 --> 00:42:00,580
what happens to the PDF is you
stretch it according to the

735
00:42:00,580 --> 00:42:04,320
multiple, and then scale it
down by the same number so

736
00:42:04,320 --> 00:42:07,460
that you preserve the area
under that curve.

737
00:42:07,460 --> 00:42:10,800
So now we found the distribution
of 2X.

738
00:42:10,800 --> 00:42:14,910
How about the distribution
of 2X + 5?

739
00:42:14,910 --> 00:42:18,560
Well what does adding 5
to random variable do?

740
00:42:18,560 --> 00:42:20,940
You're going to get essentially
the same values

741
00:42:20,940 --> 00:42:23,720
with the same probability,
except that those values all

742
00:42:23,720 --> 00:42:26,260
get shifted by 5.

743
00:42:26,260 --> 00:42:30,650
So all that you need to do is
to take this PDF here, and

744
00:42:30,650 --> 00:42:32,690
shift it by 5 units.

745
00:42:32,690 --> 00:42:35,530
So the range used to
be from -2 to 4.

746
00:42:35,530 --> 00:42:38,750
The new range is going
to be from 3 to 9.

747
00:42:38,750 --> 00:42:40,390
And that's the final answer.

748
00:42:40,390 --> 00:42:44,900
This is the distribution of
2X + 5, starting with this

749
00:42:44,900 --> 00:42:48,240
particular distribution of X.

750
00:42:48,240 --> 00:42:53,600
Now shifting to the
right by b, what

751
00:42:53,600 --> 00:42:55,700
does it do to a function?

752
00:42:55,700 --> 00:42:58,620
Shifting to the right to
by a certain amount,

753
00:42:58,620 --> 00:43:04,960
mathematically, it corresponds
to putting -b in the argument

754
00:43:04,960 --> 00:43:06,000
of the function.

755
00:43:06,000 --> 00:43:09,750
So I'm taking the formula that
I had here, which is the

756
00:43:09,750 --> 00:43:12,220
scaling by a factor of a.

757
00:43:12,220 --> 00:43:17,200
The scaling down to keep the
total area equal to 1.

758
00:43:17,200 --> 00:43:19,740
And then I need to introduce
this extra

759
00:43:19,740 --> 00:43:20,990
term to do the shifting.

760
00:43:23,300 --> 00:43:26,200
So this is a plausible
argument.

761
00:43:26,200 --> 00:43:31,080
The proof by picture that this
should be the right answer.

762
00:43:31,080 --> 00:43:38,295
But just in order to keep our
skills tuned and refined, let

763
00:43:38,295 --> 00:43:42,950
us do this derivation in a
more formal way using our

764
00:43:42,950 --> 00:43:45,135
two-step cookbook procedure.

765
00:43:48,000 --> 00:43:51,010
And I'm going to do it under
the assumption that a is

766
00:43:51,010 --> 00:43:54,910
positive, as in the example
that's we just did.

767
00:43:54,910 --> 00:43:59,090
So what's the two-step
procedure?

768
00:43:59,090 --> 00:44:03,700
We want to find the cumulative
of Y, and after that we're

769
00:44:03,700 --> 00:44:05,720
going to differentiate.

770
00:44:05,720 --> 00:44:09,220
By definition the cumulative
is the probability that the

771
00:44:09,220 --> 00:44:13,280
random variable takes values
less than a certain number.

772
00:44:13,280 --> 00:44:17,190
And now we need to take this
event and translate it, and

773
00:44:17,190 --> 00:44:21,110
express it in terms of the
original random variables.

774
00:44:21,110 --> 00:44:24,970
So Y is, by definition,
aX + b, so we're

775
00:44:24,970 --> 00:44:28,970
looking at this event.

776
00:44:28,970 --> 00:44:33,580
And now we want to express this
event in a clean form

777
00:44:33,580 --> 00:44:39,730
where X shows up in
a straight way.

778
00:44:39,730 --> 00:44:42,740
Let's say I'm going to massage
this event and

779
00:44:42,740 --> 00:44:44,640
write it in this form.

780
00:44:44,640 --> 00:44:48,070
For this inequality to be true,
x should be less than or

781
00:44:48,070 --> 00:44:53,820
equal to (y minus
b) divided by a.

782
00:44:53,820 --> 00:44:56,820
OK, now what is this?

783
00:44:56,820 --> 00:45:01,330
This is the cumulative
distribution of X evaluated at

784
00:45:01,330 --> 00:45:02,580
the particular point.

785
00:45:07,850 --> 00:45:14,760
So we got a formula for the
cumulative Y based on the

786
00:45:14,760 --> 00:45:17,880
cumulative of X. What's
the next step?

787
00:45:17,880 --> 00:45:21,550
Next step is to take derivatives
of both sides.

788
00:45:21,550 --> 00:45:28,810
So the density of Y is going to
be the derivative of this

789
00:45:28,810 --> 00:45:31,270
expression with respect to y.

790
00:45:31,270 --> 00:45:36,830
OK, so now here we need
to use the chain rule.

791
00:45:36,830 --> 00:45:40,670
It's going to be the derivative
of the F function

792
00:45:40,670 --> 00:45:43,080
with respect to its argument.

793
00:45:43,080 --> 00:45:46,930
And then we need to take the
derivative of the argument

794
00:45:46,930 --> 00:45:48,780
with respect to y.

795
00:45:48,780 --> 00:45:51,530
What is the derivative
of the cumulative?

796
00:45:51,530 --> 00:45:53,190
The derivative of
the cumulative

797
00:45:53,190 --> 00:45:56,290
is the density itself.

798
00:45:56,290 --> 00:45:59,578
And we evaluate it at the
point of interest.

799
00:46:02,180 --> 00:46:05,340
And then the chain rule tells
us that we need to take the

800
00:46:05,340 --> 00:46:08,800
derivative of this with
respect to y, and the

801
00:46:08,800 --> 00:46:11,370
derivative of this with
respect to y is 1/a.

802
00:46:14,290 --> 00:46:18,330
And this gives us the formula
which is consistent with what

803
00:46:18,330 --> 00:46:21,810
I had written down here,
for the case where a

804
00:46:21,810 --> 00:46:25,030
is a positive number.

805
00:46:25,030 --> 00:46:27,915
What if a was a negative
number?

806
00:46:30,570 --> 00:46:31,910
Could this formula be true?

807
00:46:35,120 --> 00:46:36,140
Of course not.

808
00:46:36,140 --> 00:46:39,000
Densities cannot be
negative, right?

809
00:46:39,000 --> 00:46:41,180
So that formula cannot
be true.

810
00:46:41,180 --> 00:46:43,750
Something needs to change.

811
00:46:43,750 --> 00:46:45,140
What should change?

812
00:46:45,140 --> 00:46:50,970
Where does this argument break
down when a is negative?

813
00:46:56,470 --> 00:47:01,570
So when I write this inequality
in this form, I

814
00:47:01,570 --> 00:47:03,940
divide by a.

815
00:47:03,940 --> 00:47:07,730
But when you divide by a
negative number, the direction

816
00:47:07,730 --> 00:47:10,390
of an inequality is
going to change.

817
00:47:10,390 --> 00:47:14,520
So when a is negative, this
inequality becomes larger than

818
00:47:14,520 --> 00:47:16,190
or equal to.

819
00:47:16,190 --> 00:47:18,770
And in that case, the expression
that I have up

820
00:47:18,770 --> 00:47:24,360
there would change when this
is larger than here.

821
00:47:24,360 --> 00:47:27,900
Instead of getting the
cumulative, I would get 1

822
00:47:27,900 --> 00:47:32,350
minus the cumulative of (y
minus b) divided by a.

823
00:47:35,240 --> 00:47:39,890
So this is the probability that
X is bigger than this

824
00:47:39,890 --> 00:47:41,170
particular number.

825
00:47:41,170 --> 00:47:44,000
And now when you take the
derivatives, there's going to

826
00:47:44,000 --> 00:47:46,570
be a minus sign that shows up.

827
00:47:46,570 --> 00:47:49,810
And that minus sign will
end up being here.

828
00:47:49,810 --> 00:47:53,730
And so we're taking the negative
of a negative number,

829
00:47:53,730 --> 00:47:56,420
and that basically is equivalent
to taking the

830
00:47:56,420 --> 00:47:58,660
absolute value of that number.

831
00:47:58,660 --> 00:48:03,830
So all that happens when we have
a negative a is that we

832
00:48:03,830 --> 00:48:07,010
have to take the absolute value
of the scaling factor

833
00:48:07,010 --> 00:48:10,250
instead of the factor itself.

834
00:48:10,250 --> 00:48:14,020
All right, so this general
formula is quite useful for

835
00:48:14,020 --> 00:48:16,690
dealing with linear functions
of random variables.

836
00:48:16,690 --> 00:48:21,330
And one nice application of it
is to take the formula for a

837
00:48:21,330 --> 00:48:25,460
normal random variable, consider
a linear function of

838
00:48:25,460 --> 00:48:29,600
a normal random variable, plug
into this formula, and what

839
00:48:29,600 --> 00:48:34,000
you will find is that Y also
has a normal distribution.

840
00:48:34,000 --> 00:48:37,310
So using this formula, now we
can prove a statement that I

841
00:48:37,310 --> 00:48:40,565
had made a couple of lectures
ago, that a linear function of

842
00:48:40,565 --> 00:48:43,900
a normal random variable
is also linear.

843
00:48:43,900 --> 00:48:47,600
That's how you would prove it.

844
00:48:47,600 --> 00:48:51,190
I think this is it
for today so.