1
00:00:00,499 --> 00:00:03,220
PROFESSOR: Our topic is
deviation from the mean,

2
00:00:03,220 --> 00:00:06,550
meaning the probability that
a random variable returns

3
00:00:06,550 --> 00:00:10,030
a value that differs
significantly from its mean.

4
00:00:10,030 --> 00:00:14,930
Now, the Markov bound gave you a
course bound on the probability

5
00:00:14,930 --> 00:00:19,580
that R was overly large
using very little information

6
00:00:19,580 --> 00:00:22,300
about R. Not surprisingly,
if you know a little bit more

7
00:00:22,300 --> 00:00:25,930
about the distribution of R,
simply that it's not negative,

8
00:00:25,930 --> 00:00:28,560
you can state tighter bounds.

9
00:00:28,560 --> 00:00:32,870
And this was noticed by a
mathematician named Chebyshev.

10
00:00:32,870 --> 00:00:35,820
And he has a bound called
the Chebyshev bound.

11
00:00:35,820 --> 00:00:39,060
Now, it's interesting that the
Markov bound, even though it's

12
00:00:39,060 --> 00:00:42,920
very weak and seems
not very useful,

13
00:00:42,920 --> 00:00:44,720
the Chebyshev bound,
which generally

14
00:00:44,720 --> 00:00:47,920
gives you a significantly
stronger, invaluably stronger

15
00:00:47,920 --> 00:00:50,970
bound on the probability that
a random variable differs much

16
00:00:50,970 --> 00:00:54,580
from its mean is actually
a trivial corollary

17
00:00:54,580 --> 00:00:55,630
of Markov theorem.

18
00:00:55,630 --> 00:00:57,860
So that's just a very
simple ingenious way

19
00:00:57,860 --> 00:01:02,340
to use Markov's bound to
derive Chebyshev bound.

20
00:01:02,340 --> 00:01:04,519
And let's look at how.

21
00:01:04,519 --> 00:01:07,390
So we're interested
in the probability

22
00:01:07,390 --> 00:01:11,200
that a random variable R differs
from its mean by an amount x.

23
00:01:11,200 --> 00:01:12,970
The distance between
R and its mean,

24
00:01:12,970 --> 00:01:14,940
the absolute value
of R minus mu,

25
00:01:14,940 --> 00:01:16,680
is greater than or equal to x.

26
00:01:16,680 --> 00:01:19,850
We're trying to get a
grip on that probability

27
00:01:19,850 --> 00:01:21,180
as a function of x.

28
00:01:21,180 --> 00:01:26,260
Now, the point is that the event
that the distance between R

29
00:01:26,260 --> 00:01:29,830
and its mean is greater than
or equal to x, another way

30
00:01:29,830 --> 00:01:32,820
to say that is to square both
sides of this inequality.

31
00:01:32,820 --> 00:01:37,670
It says that the event that
R minus mu squared is greater

32
00:01:37,670 --> 00:01:40,404
or equal to x squared happens.

33
00:01:40,404 --> 00:01:42,070
These two events are
just different ways

34
00:01:42,070 --> 00:01:43,156
of saying the same set.

35
00:01:43,156 --> 00:01:44,530
So therefore,
their probabilities

36
00:01:44,530 --> 00:01:46,470
are equal trivially.

37
00:01:46,470 --> 00:01:51,370
Now, what's nice about this
is, of course, that R minus mu

38
00:01:51,370 --> 00:01:55,150
squared is a non-negative random
variable to which Markov's

39
00:01:55,150 --> 00:01:57,040
theorem applies.

40
00:01:57,040 --> 00:01:58,930
The square of a real
number is always

41
00:01:58,930 --> 00:02:00,900
going to be non-negative.

42
00:02:00,900 --> 00:02:03,390
So let's just apply
Markov's theorem

43
00:02:03,390 --> 00:02:08,130
to this new random variable,
R minus mu squared.

44
00:02:08,130 --> 00:02:12,650
And what does Markov's bound
tell us about this probability,

45
00:02:12,650 --> 00:02:14,940
that the square
variable is greater

46
00:02:14,940 --> 00:02:17,030
than or equal to an
amount x squared.

47
00:02:17,030 --> 00:02:18,706
Well, just plug in Markov.

48
00:02:18,706 --> 00:02:22,100
And it tells you
that this probability

49
00:02:22,100 --> 00:02:31,540
that the square variable,
that it's as big as x squared,

50
00:02:31,540 --> 00:02:35,520
is simply the expectation
of that squared variable

51
00:02:35,520 --> 00:02:36,600
divided by x squared.

52
00:02:36,600 --> 00:02:40,620
This is just applying Markov's
bound to this variable, R

53
00:02:40,620 --> 00:02:42,680
minus u squared.

54
00:02:42,680 --> 00:02:47,160
Now, this numerator is a
weird thing to stare at,

55
00:02:47,160 --> 00:02:50,640
expectation of R minus
mu squared, and may not

56
00:02:50,640 --> 00:02:51,557
seem very memorable.

57
00:02:51,557 --> 00:02:53,640
But you should remember,
because it's so important

58
00:02:53,640 --> 00:02:55,230
that it has name all it's own.

59
00:02:55,230 --> 00:02:58,080
It's called the
variance of R. And this

60
00:02:58,080 --> 00:03:01,180
is an extra bit of
information about the shape

61
00:03:01,180 --> 00:03:03,620
of the distribution
of R that turns out

62
00:03:03,620 --> 00:03:08,230
to allow you to state much more
powerful theorems in general

63
00:03:08,230 --> 00:03:12,390
about the probability that
R deviates from its mean

64
00:03:12,390 --> 00:03:14,560
by a given amount.

65
00:03:14,560 --> 00:03:17,220
So we could just restate
the Chebyshev bound.

66
00:03:17,220 --> 00:03:19,690
Just replacing that
expectation formula

67
00:03:19,690 --> 00:03:21,960
in terms of its
name, variance of R,

68
00:03:21,960 --> 00:03:24,050
this is what the
Chebyshev bound says.

69
00:03:24,050 --> 00:03:27,470
The probability that the
distance between R and its mean

70
00:03:27,470 --> 00:03:30,650
is greater than or equal to x
is the variance of R divided

71
00:03:30,650 --> 00:03:33,480
by x squared,
where variance of R

72
00:03:33,480 --> 00:03:37,150
is the expectation of
the square of R minus u.

73
00:03:37,150 --> 00:03:40,260
Now, the very important
technical aspect

74
00:03:40,260 --> 00:03:41,840
of the Chebyshev
bound is that we're

75
00:03:41,840 --> 00:03:46,730
getting an inverse square
reduction in the probability.

76
00:03:46,730 --> 00:03:49,650
Remember, with Markov,
the denominator

77
00:03:49,650 --> 00:03:52,340
was behaving linearly.

78
00:03:52,340 --> 00:03:54,180
And here, it behaves
quite quadratically.

79
00:03:54,180 --> 00:03:57,770
So these bounds get
smaller, much more

80
00:03:57,770 --> 00:04:01,480
rapidly as we ask about the
probability of differing

81
00:04:01,480 --> 00:04:04,090
by a larger amount.

82
00:04:04,090 --> 00:04:07,280
The variance of R, maybe
in a way that will help you

83
00:04:07,280 --> 00:04:10,150
remember it is to remember
another name that it has.

84
00:04:10,150 --> 00:04:12,060
It's called the
mean square error.

85
00:04:12,060 --> 00:04:15,620
If you think of R
minus mu as the error

86
00:04:15,620 --> 00:04:17,959
that R is making in
how much it differs

87
00:04:17,959 --> 00:04:21,250
from what it ought to
be, and we square it,

88
00:04:21,250 --> 00:04:24,550
and then we take the
average, so we're taking

89
00:04:24,550 --> 00:04:29,520
the mean of the squared errors.

90
00:04:29,520 --> 00:04:31,730
And here, we're back
to restating Markov

91
00:04:31,730 --> 00:04:35,130
bound in terms of the variance.

92
00:04:35,130 --> 00:04:38,220
The variance has one
difficulty with it.

93
00:04:38,220 --> 00:04:41,324
And that leads us to want to
look at another object, which

94
00:04:41,324 --> 00:04:42,990
is just the square
root of the variance,

95
00:04:42,990 --> 00:04:44,970
called the standard deviation.

96
00:04:44,970 --> 00:04:47,810
So you wonder why-- I mean,
if you understand variance,

97
00:04:47,810 --> 00:04:49,560
what's the point of
taking the square root

98
00:04:49,560 --> 00:04:50,650
and working with that?

99
00:04:50,650 --> 00:04:53,660
And the answer is
simply that if you

100
00:04:53,660 --> 00:04:56,840
think of R as a random
variable whose values have

101
00:04:56,840 --> 00:05:02,300
some dimension, like seconds or
dollars, then the variance of R

102
00:05:02,300 --> 00:05:06,150
is the expectation of a
square variable of R minus mu

103
00:05:06,150 --> 00:05:09,710
squared, which means its
units are second squared

104
00:05:09,710 --> 00:05:11,840
or dollar squared or whatever.

105
00:05:11,840 --> 00:05:15,570
And the variance of R
itself is a squared value,

106
00:05:15,570 --> 00:05:22,270
which is not reflecting the
magnitude of the distance

107
00:05:22,270 --> 00:05:23,895
that you expect-- of
the kind of errors

108
00:05:23,895 --> 00:05:26,660
that you expect R to make,
the distance that you expect

109
00:05:26,660 --> 00:05:28,530
part R to be from its mean.

110
00:05:28,530 --> 00:05:32,940
So we can get the
units of this quantity

111
00:05:32,940 --> 00:05:35,300
back into matching
the units of R

112
00:05:35,300 --> 00:05:38,265
and also get a number that's
closer to the kind of variance

113
00:05:38,265 --> 00:05:41,390
that you'd expect to observe
by just taking the square root.

114
00:05:41,390 --> 00:05:44,545
And it's called the
standard deviation of R.

115
00:05:44,545 --> 00:05:46,670
If it helps you any, the
standard deviation is also

116
00:05:46,670 --> 00:05:49,020
called the root
mean square error.

117
00:05:49,020 --> 00:05:50,690
And you might have
heard that phrase.

118
00:05:50,690 --> 00:05:52,490
It comes up all the
time in discussions

119
00:05:52,490 --> 00:05:55,080
of experimental error.

120
00:05:55,080 --> 00:05:56,870
So again, we're
taking the error--

121
00:05:56,870 --> 00:06:00,500
means the distance between the
random variable and its mean.

122
00:06:00,500 --> 00:06:01,450
We're squaring it.

123
00:06:01,450 --> 00:06:05,510
We're taking the expectation
of that squared error.

124
00:06:05,510 --> 00:06:08,220
And then we're taking
the square root of it.

125
00:06:08,220 --> 00:06:10,580
It's the standard deviation.

126
00:06:10,580 --> 00:06:15,120
So going back to understand what
the standard deviation means

127
00:06:15,120 --> 00:06:19,230
intuitively in terms of a
familiar shaped distribution

128
00:06:19,230 --> 00:06:21,060
function for a
random variable R,

129
00:06:21,060 --> 00:06:23,670
suppose that R is a
random variable that

130
00:06:23,670 --> 00:06:25,850
has this fairly
standard kind of bell

131
00:06:25,850 --> 00:06:30,200
curved shape or Gaussian
shape, that it's got one hump.

132
00:06:30,200 --> 00:06:31,480
It's unimodal.

133
00:06:31,480 --> 00:06:35,920
And it kind of trails off
with some moderate rate,

134
00:06:35,920 --> 00:06:38,530
as you get further and
further away from the mean.

135
00:06:38,530 --> 00:06:41,830
Well, the mean of a distribution
that's shaped like this,

136
00:06:41,830 --> 00:06:46,590
it's symmetric around that
high point, that's going

137
00:06:46,590 --> 00:06:48,080
to be the mean by symmetry.

138
00:06:48,080 --> 00:06:50,585
It's equally likely
to be-- well,

139
00:06:50,585 --> 00:06:53,570
the values average out
to this middle value.

140
00:06:53,570 --> 00:06:56,560
A standard deviation
for a curve like this

141
00:06:56,560 --> 00:07:00,160
is going to be an interval
that you can interpret

142
00:07:00,160 --> 00:07:01,980
as an interval around the mean.

143
00:07:01,980 --> 00:07:07,100
And the probability that
you're within that interval

144
00:07:07,100 --> 00:07:09,952
is fairly high for
standard distributions.

145
00:07:09,952 --> 00:07:13,170
Now, we'll see that the
Chebyshev bound is not

146
00:07:13,170 --> 00:07:15,990
going to tell us much about for
arbitrary unknown distribution.

147
00:07:15,990 --> 00:07:18,100
But in general, for the
typical distributions,

148
00:07:18,100 --> 00:07:21,330
you expect to find that the
standard deviation tells you

149
00:07:21,330 --> 00:07:24,560
that that's where you're most
likely to be when you take

150
00:07:24,560 --> 00:07:27,480
a random value of the variable.

151
00:07:27,480 --> 00:07:31,755
So let's return to the Chebyshev
bound, as we've stated it.

152
00:07:31,755 --> 00:07:34,380
And I'm just replacing here, I'm
restating the Chebyshev bound,

153
00:07:34,380 --> 00:07:37,230
just replacing the variance
of R in the numerator

154
00:07:37,230 --> 00:07:39,010
by the square of
its square root,

155
00:07:39,010 --> 00:07:43,260
by sigma squared R. It's a
useful way to restate it.

156
00:07:43,260 --> 00:07:45,210
Because by restating
it this way,

157
00:07:45,210 --> 00:07:47,700
it motivates another
reformulation

158
00:07:47,700 --> 00:07:49,860
of the Chebyshev bound
as we reformulated

159
00:07:49,860 --> 00:07:52,180
the Markov bound
previously in terms

160
00:07:52,180 --> 00:07:53,740
of a multiple of something.

161
00:07:53,740 --> 00:07:56,850
I'm going to replace
x by a constant times

162
00:07:56,850 --> 00:07:58,680
the standard deviation.

163
00:07:58,680 --> 00:08:01,422
So I'm going to see the
probability that the error is

164
00:08:01,422 --> 00:08:03,130
greater than or equal
to a constant times

165
00:08:03,130 --> 00:08:04,560
the standard deviation.

166
00:08:04,560 --> 00:08:06,220
And this term is
going to simplify.

167
00:08:06,220 --> 00:08:10,430
Once x is a constant times
the standard deviation,

168
00:08:10,430 --> 00:08:12,930
the standard deviations
are going to cancel out.

169
00:08:12,930 --> 00:08:15,660
And I'm just going to wind
up with 1 over x squared.

170
00:08:15,660 --> 00:08:18,500
So let's just do that.

171
00:08:18,500 --> 00:08:24,510
And there's the
formula-- the probability

172
00:08:24,510 --> 00:08:27,270
that the distance
of R from its mean

173
00:08:27,270 --> 00:08:29,506
is greater than or
equal to a multiple c

174
00:08:29,506 --> 00:08:33,030
of its standard deviation
is less than or equal to 1

175
00:08:33,030 --> 00:08:34,559
over c squared.

176
00:08:34,559 --> 00:08:41,039
So it's getting much more
rapidly smaller as c grows.

177
00:08:41,039 --> 00:08:44,680
Let's look at what that
means for just some numbers,

178
00:08:44,680 --> 00:08:47,210
to make the thing a
little bit more real.

179
00:08:47,210 --> 00:08:49,120
What this assertion
is telling us

180
00:08:49,120 --> 00:08:51,930
is that R is probably
not going to return

181
00:08:51,930 --> 00:08:55,420
a value that's a
significant multiple

182
00:08:55,420 --> 00:08:57,500
of its standard deviation.

183
00:08:57,500 --> 00:08:59,680
For example, what
does this formula

184
00:08:59,680 --> 00:09:02,220
tell us about the
probability that R

185
00:09:02,220 --> 00:09:06,790
is going to be greater than or
equal to one standard deviation

186
00:09:06,790 --> 00:09:07,850
away from its mean?

187
00:09:07,850 --> 00:09:08,800
Well, it actually
tells us nothing.

188
00:09:08,800 --> 00:09:10,383
That's the case in
which it's no good.

189
00:09:10,383 --> 00:09:12,117
Because c is 1,
it's just telling us

190
00:09:12,117 --> 00:09:14,450
that the probability is at
most 1, which we always know,

191
00:09:14,450 --> 00:09:16,350
because probabilities
are at most 1.

192
00:09:16,350 --> 00:09:22,450
But if I ask, what's the
probability that the error of R

193
00:09:22,450 --> 00:09:25,302
is greater than or equal to
twice the standard deviation,

194
00:09:25,302 --> 00:09:27,510
then this theorem is telling
me something nontrivial.

195
00:09:27,510 --> 00:09:29,880
It's telling me that the
probability that it's twice

196
00:09:29,880 --> 00:09:33,240
the deviation is 1
over 2 squared or 1/4.

197
00:09:33,240 --> 00:09:36,890
An arbitrary random variable
with standard deviation sigma

198
00:09:36,890 --> 00:09:39,560
is going to exceed
twice-- the error

199
00:09:39,560 --> 00:09:42,850
is going to exceed twice the
standard deviation at most 1/4

200
00:09:42,850 --> 00:09:46,140
of the time, three times at
most 1/9 of the time, four times

201
00:09:46,140 --> 00:09:48,080
at most the 1/16 of the time.

202
00:09:48,080 --> 00:09:51,080
So the qualitative
message to take away

203
00:09:51,080 --> 00:09:54,570
is that, for any random
variable whatsoever, as long

204
00:09:54,570 --> 00:09:57,450
as it has a standard
deviation sigma,

205
00:09:57,450 --> 00:10:01,340
then you can say some definite
things about the probability

206
00:10:01,340 --> 00:10:03,030
that the random
variable is going

207
00:10:03,030 --> 00:10:08,140
to take a value that
differs by a large multiple

208
00:10:08,140 --> 00:10:11,700
of the standard
deviation from its mean.

209
00:10:11,700 --> 00:10:13,570
That probability is
going to be small

210
00:10:13,570 --> 00:10:16,620
and get smaller
and rapidly smaller

211
00:10:16,620 --> 00:10:20,970
as the multiple of the
standard deviation continues.