1
00:00:11,000 --> 00:00:17,000
So, we're going to talk today
about binary search trees.

2
00:00:17,000 --> 00:00:23,000
It's something called randomly
built binary search trees.

3
00:00:23,000 --> 00:00:29,000
And, I'll abbreviate binary
search trees as BST's throughout

4
00:00:29,000 --> 00:00:33,000
the lecture.
And, you of all seen binary

5
00:00:33,000 --> 00:00:39,000
search trees in one place or
another, in particular,

6
00:00:39,000 --> 00:00:45,000
recitation on Friday.
So, we're going to build up the

7
00:00:45,000 --> 00:00:49,000
basic ideas presented there,
and talk about how to randomize

8
00:00:49,000 --> 00:00:54,000
them, and make them good.
So, you know that there are

9
00:00:54,000 --> 00:00:58,000
good binary search trees,
which are relatively balanced,

10
00:00:58,000 --> 00:01:02,000
something like this.
The height is log n.

11
00:01:02,000 --> 00:01:04,000
We called unbalanced,
and that's good.

12
00:01:04,000 --> 00:01:06,000
Anything order log n will be
fine.

13
00:01:06,000 --> 00:01:10,000
In terms of searching,
it will then cost order log n.

14
00:01:10,000 --> 00:01:14,000
And, there are bad binary
search trees which have really

15
00:01:14,000 --> 00:01:16,000
large height,
possibly as big as n.

16
00:01:16,000 --> 00:01:19,000
So, this is good,
and this is bad.

17
00:01:19,000 --> 00:01:22,000
We'd sort of like to know,
we'd like to build binary

18
00:01:22,000 --> 00:01:26,000
search trees in such a way that
they are good all the time,

19
00:01:26,000 --> 00:01:31,000
or at least most of the time.
There are lots of ways to do

20
00:01:31,000 --> 00:01:36,000
this, and in the next couple of
weeks, we will see four of them,

21
00:01:36,000 --> 00:01:39,000
if you count the problem set,
I believe.

22
00:01:39,000 --> 00:01:42,000
Today, we are going to use
randomization to make them

23
00:01:42,000 --> 00:01:45,000
balanced most of the time in a
certain sense.

24
00:01:45,000 --> 00:01:49,000
And then, in your problem set,
you will make that in a broader

25
00:01:49,000 --> 00:01:52,000
sense.
But, one way to motivate this

26
00:01:52,000 --> 00:01:56,000
topic, so I'm not going to
define randomly built binary

27
00:01:56,000 --> 00:02:00,000
search trees for a little bit.
One way to motivate the topic

28
00:02:00,000 --> 00:02:04,000
is through sorting,
our good friend.

29
00:02:04,000 --> 00:02:09,000
So, there's a natural way to
sort n numbers using binary

30
00:02:09,000 --> 00:02:13,000
search trees.
So, if I give you an array,

31
00:02:13,000 --> 00:02:18,000
A, how would you sort that
array using binary search tree

32
00:02:18,000 --> 00:02:23,000
operations as a black box?
Build the binary search tree,

33
00:02:23,000 --> 00:02:27,000
and then traverse it in order.
Exactly.

34
00:02:27,000 --> 00:02:30,000
So, let's say we have some
initial tree,

35
00:02:30,000 --> 00:02:35,000
which is empty,
and then for each element of

36
00:02:35,000 --> 00:02:40,000
the array, we insert it into the
tree.

37
00:02:40,000 --> 00:02:46,000
That's what you meant by
building the search tree.

38
00:02:46,000 --> 00:02:53,000
So, we insert AI into the tree.
This is the binary search tree

39
00:02:53,000 --> 00:03:00,000
insertion, standard insertion.
And then, we do an in order

40
00:03:00,000 --> 00:03:09,000
traversal, which in the book is
called in order tree walk.

41
00:03:09,000 --> 00:03:11,000
OK, you should know these
algorithms are,

42
00:03:11,000 --> 00:03:14,000
but just for very quick
reminder, tree insert basically

43
00:03:14,000 --> 00:03:18,000
searches for that element AI
until it finds the place where

44
00:03:18,000 --> 00:03:21,000
it should have been if it was in
the tree already,

45
00:03:21,000 --> 00:03:24,000
and then adds a new leaf there
to insert that value.

46
00:03:24,000 --> 00:03:27,000
Tree walk recursively walks the
left subtree,

47
00:03:27,000 --> 00:03:30,000
then prints out the root,
and then recursively walks the

48
00:03:30,000 --> 00:03:33,000
right subtree.
And, by the binary search tree

49
00:03:33,000 --> 00:03:38,000
property, that will print the
elements out in sorted order.

50
00:03:38,000 --> 00:03:43,000
So, let's do a quick example
because this turns out to be

51
00:03:43,000 --> 00:03:48,000
related to another sorting
algorithm we've seen already.

52
00:03:48,000 --> 00:03:52,000
So, while the example is
probably pretty trivial,

53
00:03:52,000 --> 00:03:55,000
the connection is pretty
surprising.

54
00:03:55,000 --> 00:04:02,000
At least, it was to me the
first time I taught this class.

55
00:04:02,000 --> 00:04:04,000
So, my array is three,
one, eight, two,

56
00:04:04,000 --> 00:04:08,000
six, seven, five.
And, I'm going to visit these

57
00:04:08,000 --> 00:04:12,000
elements in order from left to
right, and just build a tree.

58
00:04:12,000 --> 00:04:15,000
So, the first element I see is
three.

59
00:04:15,000 --> 00:04:18,000
So, I insert three into an
empty tree.

60
00:04:18,000 --> 00:04:21,000
That requires no comparisons.
Then I insert one.

61
00:04:21,000 --> 00:04:24,000
I see, is one bigger or less
than three?

62
00:04:24,000 --> 00:04:27,000
It's smaller.
So, I put it over here.

63
00:04:27,000 --> 00:04:31,000
Then I insert eight.
That's bigger than three,

64
00:04:31,000 --> 00:04:35,000
so it get's a new leaf over
here.

65
00:04:35,000 --> 00:04:38,000
Then I insert two.
That sits between one and

66
00:04:38,000 --> 00:04:41,000
three.
And so, it would fall off this

67
00:04:41,000 --> 00:04:44,000
right child of one.
So, I add two there.

68
00:04:44,000 --> 00:04:48,000
Six is bigger than three,
and less than eight.

69
00:04:48,000 --> 00:04:51,000
So, it goes here.
Seven is bigger than three,

70
00:04:51,000 --> 00:04:54,000
and less than eight,
bigger than six.

71
00:04:54,000 --> 00:04:58,000
So, it goes here,
and five fits in between three

72
00:04:58,000 --> 00:05:03,000
and five, three and six rather.
And so, that's the binary

73
00:05:03,000 --> 00:05:06,000
search tree that again.
Then I run an in order

74
00:05:06,000 --> 00:05:10,000
traversal, which will print one,
two, three, five,

75
00:05:10,000 --> 00:05:13,000
six, seven, eight.
OK, I can run I quickly in my

76
00:05:13,000 --> 00:05:15,000
head because I've got a big
stack.

77
00:05:15,000 --> 00:05:18,000
I've got to be a little bit
careful.

78
00:05:18,000 --> 00:05:22,000
Of course, you should check
that they come out in sorted

79
00:05:22,000 --> 00:05:24,000
order: one, two,
three, five,

80
00:05:24,000 --> 00:05:27,000
six, seven, eight.
And, if you don't have a big

81
00:05:27,000 --> 00:05:32,000
stack, you can go and buy one.
That's always useful.

82
00:05:32,000 --> 00:05:36,000
Memory costs are going up a bit
these days, or going down.

83
00:05:36,000 --> 00:05:40,000
They should be because of
politics, but price-fixing,

84
00:05:40,000 --> 00:05:43,000
or whatever.
So, the question is,

85
00:05:43,000 --> 00:05:46,000
what's the running time of the
algorithm?

86
00:05:46,000 --> 00:05:50,000
Here, this is one of those
answers where it depends.

87
00:05:50,000 --> 00:05:53,000
The parts that are easy to
analyze are, well,

88
00:05:53,000 --> 00:05:56,000
initialization.
The in order tree walk,

89
00:05:56,000 --> 00:06:00,000
how long does that take?
n, good.

90
00:06:00,000 --> 00:06:05,000
So, it's order n for the walk,
and for the initialization,

91
00:06:05,000 --> 00:06:08,000
which is constant.
The question is,

92
00:06:08,000 --> 00:06:13,000
how long does it take me to do
n tree inserts?

93
00:06:21,000 --> 00:06:26,000
Anyone want to guess any kind
of answer to that question,

94
00:06:26,000 --> 00:06:32,000
other than it depends?
I've already stolen the thunder

95
00:06:32,000 --> 00:06:34,000
there.
Yeah?

96
00:06:34,000 --> 00:06:38,000
Big Omega of n log n,
that's good.

97
00:06:38,000 --> 00:06:42,000
It's at least n log n.
Why?

98
00:06:56,000 --> 00:06:58,000
Right.
So, you gave two reasons.

99
00:06:58,000 --> 00:07:02,000
The first one is because of the
decision tree lower bound.

100
00:07:02,000 --> 00:07:04,000
That doesn't actually prove
this.

101
00:07:04,000 --> 00:07:07,000
You have to be a little bit
careful.

102
00:07:07,000 --> 00:07:10,000
This is a claim that it's omega
n log n all the time.

103
00:07:10,000 --> 00:07:14,000
It's certainly omega n log n in
the worst case.

104
00:07:14,000 --> 00:07:18,000
Every comparison-based sorting
algorithm is omega n log n in

105
00:07:18,000 --> 00:07:21,000
the worst case.
It's also n log n every single

106
00:07:21,000 --> 00:07:25,000
time, omega n log n because of
the second reason you gave,

107
00:07:25,000 --> 00:07:29,000
which is the best thing that
could happen is we have a

108
00:07:29,000 --> 00:07:33,000
perfectly balanced tree.
So, this is the figure that I

109
00:07:33,000 --> 00:07:36,000
have drawn the most on a
blackboard in my life,

110
00:07:36,000 --> 00:07:41,000
the perfect tree on 15 nodes,
I guess.

111
00:07:41,000 --> 00:07:42,000
So, if we're lucky,
we have this.

112
00:07:42,000 --> 00:07:45,000
And if you add up all the
depths of the nodes here,

113
00:07:45,000 --> 00:07:48,000
which gives you the search tree
cost, in particular,

114
00:07:48,000 --> 00:07:52,000
these n over two nodes in the
bottom, each have depth log n.

115
00:07:52,000 --> 00:07:54,000
And, therefore,
you're going to have to pay it

116
00:07:54,000 --> 00:07:57,000
least n log n for those.
And, if you're less balanced,

117
00:07:57,000 --> 00:08:02,000
it's going to be even worse.
That takes some proving,

118
00:08:02,000 --> 00:08:08,000
but it's true.
So, it's actually omega n log n

119
00:08:08,000 --> 00:08:13,000
all the time.
OK, there are some cases,

120
00:08:13,000 --> 00:08:19,000
like you do know that the
elements are almost already in

121
00:08:19,000 --> 00:08:25,000
order, you can do it in linear
number comparisons.

122
00:08:25,000 --> 00:08:32,000
But here, you can't.
Any other guesses at an answer

123
00:08:32,000 --> 00:08:34,000
to this question?
Yeah?

124
00:08:34,000 --> 00:08:39,000
Big O n^2?
Good, why?

125
00:08:39,000 --> 00:08:41,000
Right.
We are doing n things,

126
00:08:41,000 --> 00:08:44,000
and each node has depth,
at most, n.

127
00:08:44,000 --> 00:08:49,000
So, the number of comparisons
we're making per element we

128
00:08:49,000 --> 00:08:51,000
insert, is, at most,
n.

129
00:08:51,000 --> 00:08:53,000
So that's, at most,
n^2.

130
00:08:53,000 --> 00:08:56,000
Any other answers?
Is it possible for this

131
00:08:56,000 --> 00:09:03,000
algorithm to take n^2 time?
Are there instances where it

132
00:09:03,000 --> 00:09:08,000
takes theta n^2?
If it's already sorted,

133
00:09:08,000 --> 00:09:14,000
that would be pretty bad.
So, if it's already sorted or

134
00:09:14,000 --> 00:09:21,000
if it's reverse sorted,
you are in bad shape because

135
00:09:21,000 --> 00:09:27,000
then you get a tree like this.
This is the sorted case.

136
00:09:27,000 --> 00:09:32,000
And, you compute.
So, the total cost,

137
00:09:32,000 --> 00:09:38,000
the time in general is going to
be the sum of the depths of the

138
00:09:38,000 --> 00:09:41,000
nodes for each node,
X, in the tree.

139
00:09:41,000 --> 00:09:45,000
And in this case,
it's one plus two plus three

140
00:09:45,000 --> 00:09:48,000
plus four, this arithmetic
series.

141
00:09:48,000 --> 00:09:52,000
There's n of them,
so this is theta n squared.

142
00:09:52,000 --> 00:09:56,000
It's like n^2 over two.
So, that's bad news.

143
00:09:56,000 --> 00:10:03,000
The worst-case running time of
this algorithm is n^2.

144
00:10:03,000 --> 00:10:08,000
Does that sound familiar at
all, and algorithms worst-case

145
00:10:08,000 --> 00:10:11,000
running time is n^2,
in particular,

146
00:10:11,000 --> 00:10:16,000
in the already-sorted case?
But if we're lucky,

147
00:10:16,000 --> 00:10:20,000
at the lucky case,
as we said, it's a balanced

148
00:10:20,000 --> 00:10:23,000
tree.
Wouldn't that be great?

149
00:10:23,000 --> 00:10:28,000
Anything with omega log n
height would give us a sorting

150
00:10:28,000 --> 00:10:36,000
algorithm that runs in n log n.
So, in the lucky case,

151
00:10:36,000 --> 00:10:43,000
we are n log n.
But in the unlucky case,

152
00:10:43,000 --> 00:10:48,000
we are n^2 and unlucky use
sorted.

153
00:10:48,000 --> 00:10:57,000
Does it remind you of any
algorithm we've seen before?

154
00:10:57,000 --> 00:11:02,000
Quicksort.
It turns out the running time

155
00:11:02,000 --> 00:11:09,000
of this algorithm is the same as
the running time of quicksort in

156
00:11:09,000 --> 00:11:13,000
a very strong sense.
It turns out the comparisons

157
00:11:13,000 --> 00:11:19,000
that this algorithm makes are
exactly the same comparisons

158
00:11:19,000 --> 00:11:24,000
that quicksort makes.
It makes them in a different

159
00:11:24,000 --> 00:11:29,000
order, but it's really the same
algorithm in disguise.

160
00:11:29,000 --> 00:11:34,000
That's the surprise here.
So, in particular,

161
00:11:34,000 --> 00:11:36,000
we've already analyzed
quicksort.

162
00:11:36,000 --> 00:11:40,000
We should get something for
free out of that analysis.

163
00:11:54,000 --> 00:12:05,000
So, the relation is,
BST sort and quicksort make the

164
00:12:05,000 --> 00:12:15,000
same comparisons but in a
different order.

165
00:12:25,000 --> 00:12:29,000
So, let me walk through the
same example we did before:

166
00:12:29,000 --> 00:12:33,000
three, one, eight,
two, six, seven,

167
00:12:33,000 --> 00:12:35,000
five.
So, there is an array.

168
00:12:35,000 --> 00:12:40,000
We are going to run a
particular version of quicksort.

169
00:12:40,000 --> 00:12:43,000
I have to be a little bit
careful here.

170
00:12:43,000 --> 00:12:47,000
It's sort of the obvious
version of quicksort.

171
00:12:47,000 --> 00:12:52,000
Remember, our standard,
boring quicksort is you take

172
00:12:52,000 --> 00:12:56,000
the first element as the
partition element.

173
00:12:56,000 --> 00:13:01,000
So, I'll take three here.
And, I split into the elements

174
00:13:01,000 --> 00:13:04,000
less than three,
which is one and two.

175
00:13:04,000 --> 00:13:07,000
And, the elements bigger than
three, which is eight,

176
00:13:07,000 --> 00:13:09,000
six, seven, five.
And, in this version of

177
00:13:09,000 --> 00:13:12,000
quicksort, I don't change the
order of the elements,

178
00:13:12,000 --> 00:13:13,000
eight, six, seven,
five.

179
00:13:13,000 --> 00:13:17,000
So, let's say the order is
preserved because only then will

180
00:13:17,000 --> 00:13:20,000
this equivalence hold.
So, this is sort of a stable

181
00:13:20,000 --> 00:13:22,000
partition algorithm.
It's easy enough to do.

182
00:13:22,000 --> 00:13:25,000
It's a particular version of
quicksort.

183
00:13:25,000 --> 00:13:27,000
And soon, we're going to
randomize it.

184
00:13:27,000 --> 00:13:32,000
And after we randomize,
this difference doesn't matter.

185
00:13:32,000 --> 00:13:35,000
OK, then on the left recursion,
we split in the partition

186
00:13:35,000 --> 00:13:38,000
element.
There is things less than one,

187
00:13:38,000 --> 00:13:41,000
which is nothing,
things bigger than one,

188
00:13:41,000 --> 00:13:44,000
which is two.
And then, that's our partition

189
00:13:44,000 --> 00:13:45,000
element.
We are done.

190
00:13:45,000 --> 00:13:48,000
Over here, we partition on
eight.

191
00:13:48,000 --> 00:13:51,000
Everything is less than eight.
So, we get six,

192
00:13:51,000 --> 00:13:53,000
seven, five,
nothing on the right.

193
00:13:53,000 --> 00:13:57,000
Then we partition at six.
We get things less than six,

194
00:13:57,000 --> 00:13:59,000
mainly five,
things bigger than six,

195
00:13:59,000 --> 00:14:03,000
mainly seven.
And, those are sort of

196
00:14:03,000 --> 00:14:06,000
partition elements in a trivial
way.

197
00:14:06,000 --> 00:14:11,000
Now, this tree that we get on
the partition elements looks an

198
00:14:11,000 --> 00:14:16,000
awful lot like this tree.
OK, it should be exactly the

199
00:14:16,000 --> 00:14:19,000
same tree.
And, you can walk through,

200
00:14:19,000 --> 00:14:22,000
what comparisons does quicksort
make?

201
00:14:22,000 --> 00:14:25,000
Well, first,
it compares everything to

202
00:14:25,000 --> 00:14:30,000
three, OK, except three itself.
Now, if you look over here,

203
00:14:30,000 --> 00:14:32,000
what happens when we are
inserting elements?

204
00:14:32,000 --> 00:14:35,000
Well, each time we insert an
element, the first thing we do

205
00:14:35,000 --> 00:14:37,000
is compare with three.
If it's less than,

206
00:14:37,000 --> 00:14:40,000
we go to the left branch.
If it's greater than,

207
00:14:40,000 --> 00:14:43,000
we go to the right branch.
So, we are making all these

208
00:14:43,000 --> 00:14:44,000
comparisons with three in both
cases.

209
00:14:44,000 --> 00:14:47,000
Then, if we have an element
less than three,

210
00:14:47,000 --> 00:14:49,000
it's either one or two.
If it's one,

211
00:14:49,000 --> 00:14:51,000
we're done.
No comparisons happen here one

212
00:14:51,000 --> 00:14:52,000
to one.
But, we compare two to one.

213
00:14:52,000 --> 00:14:56,000
And indeed, when we insert two
over there after comparing it to

214
00:14:56,000 --> 00:14:59,000
three, we compare it to one.
And then we figure out that it

215
00:14:59,000 --> 00:15:01,000
happens here.
Same thing happens in

216
00:15:01,000 --> 00:15:04,000
quicksort.
For elements greater than

217
00:15:04,000 --> 00:15:08,000
three, we compare everyone to
eight here because we are

218
00:15:08,000 --> 00:15:12,000
partitioning with respect to
eight, and here because that's

219
00:15:12,000 --> 00:15:16,000
the next node after three.
As soon as eight is inserted,

220
00:15:16,000 --> 00:15:20,000
we compare everything with
eight to see in fact that's less

221
00:15:20,000 --> 00:15:23,000
than eight, and so on:
so, all of the same

222
00:15:23,000 --> 00:15:25,000
comparisons, just in a different
order.

223
00:15:25,000 --> 00:15:29,000
So, we turn 90∞.
Kind of cool.

224
00:15:29,000 --> 00:15:34,000
So, this has various
consequences in the analysis.

225
00:15:50,000 --> 00:15:54,000
So, in particular,
the worst-case running time is

226
00:15:54,000 --> 00:15:58,000
theta n^2, which is not so
exciting.

227
00:15:58,000 --> 00:16:04,000
What we really care about is
the randomized version because

228
00:16:04,000 --> 00:16:10,000
that's what performs well.
So, randomized BST sort is just

229
00:16:10,000 --> 00:16:16,000
like randomized quicksort.
So, the first thing you do is

230
00:16:16,000 --> 00:16:21,000
randomly permute the array
uniformly, picking all

231
00:16:21,000 --> 00:16:24,000
permutations with equal
probability.

232
00:16:24,000 --> 00:16:31,000
And then, we call BST sort.
OK, this is basically what

233
00:16:31,000 --> 00:16:35,000
randomized quicksort could be
formulated as.

234
00:16:35,000 --> 00:16:40,000
And then, randomized BST sort
is going to make exactly the

235
00:16:40,000 --> 00:16:43,000
same comparisons as randomized
quicksort.

236
00:16:43,000 --> 00:16:48,000
Here, we are picking the root
essentially randomly.

237
00:16:48,000 --> 00:16:52,000
And here in quicksort,
you are picking the partition

238
00:16:52,000 --> 00:16:56,000
elements randomly.
It's the same difference.

239
00:16:56,000 --> 00:17:00,000
OK, so the time of this
algorithm equals the time of

240
00:17:00,000 --> 00:17:08,000
randomized quicksort because we
are making the same comparisons.

241
00:17:08,000 --> 00:17:10,000
So, the number of comparisons
is equal.

242
00:17:10,000 --> 00:17:11,000
And this is true as random
variables.

243
00:17:11,000 --> 00:17:13,000
The random variable,
the running time,

244
00:17:13,000 --> 00:17:16,000
this algorithm is equal to the
random variable of this

245
00:17:16,000 --> 00:17:17,000
algorithm.
In particular,

246
00:17:17,000 --> 00:17:20,000
the expectations are the same.

247
00:17:33,000 --> 00:17:37,000
OK, and we know that the
expected running time of

248
00:17:37,000 --> 00:17:40,000
randomized quicksort on n
elements is?

249
00:17:40,000 --> 00:17:42,000
Oh boy.
n log n.

250
00:17:42,000 --> 00:17:45,000
Good.
I was a little worried there.

251
00:17:45,000 --> 00:17:49,000
OK, so in particular,
the expected running time of

252
00:17:49,000 --> 00:17:53,000
BST sort is n log n.
Obviously, this is not too

253
00:17:53,000 --> 00:17:57,000
exciting from a sorting point of
view.

254
00:17:57,000 --> 00:18:03,000
Sorting was just sort of to see
this connection.

255
00:18:03,000 --> 00:18:05,000
What we actually care about,
and the reason I've introduced

256
00:18:05,000 --> 00:18:08,000
this BST sort is what the tree
looks like.

257
00:18:08,000 --> 00:18:10,000
What we really want is that
search tree.

258
00:18:10,000 --> 00:18:11,000
The search tree can do more
than sort.

259
00:18:11,000 --> 00:18:14,000
n order traversals are a pretty
boring thing to do with the

260
00:18:14,000 --> 00:18:16,000
search tree.
You can search in a search

261
00:18:16,000 --> 00:18:18,000
tree.
So, OK, that's still not so

262
00:18:18,000 --> 00:18:20,000
exciting.
You could sort the elements and

263
00:18:20,000 --> 00:18:22,000
then put them in an array and do
binary search.

264
00:18:22,000 --> 00:18:26,000
But, the point of binary search
trees, instead of binary search

265
00:18:26,000 --> 00:18:28,000
arrays, is that you can update
them dynamically.

266
00:18:28,000 --> 00:18:31,000
We won't be updating them
dynamically in this lecture,

267
00:18:31,000 --> 00:18:35,000
and we will in Wednesday and on
your problem set.

268
00:18:35,000 --> 00:18:36,000
For now, it's just sort of
warm-up.

269
00:18:36,000 --> 00:18:39,000
Let's say that the elements
aren't changing.

270
00:18:39,000 --> 00:18:41,000
We are building one tree from
the beginning.

271
00:18:41,000 --> 00:18:43,000
We have all n elements ahead of
time.

272
00:18:43,000 --> 00:18:45,000
We are going to build it
randomly.

273
00:18:45,000 --> 00:18:49,000
We randomly permute that array.
Then we throw all the elements

274
00:18:49,000 --> 00:18:52,000
into a binary search tree.
That's what BST sort does.

275
00:18:52,000 --> 00:18:54,000
Then it calls n order
traversal.

276
00:18:54,000 --> 00:18:56,000
I don't really care about n
order traversal.

277
00:18:56,000 --> 00:19:00,000
What I want,
because we've just analyzed it.

278
00:19:00,000 --> 00:19:04,000
It would be a short lecture if
I were done.

279
00:19:04,000 --> 00:19:11,000
What we want is this randomly
built BST, which is what we get

280
00:19:11,000 --> 00:19:18,000
out of this algorithm.
So, this is the tree resulting

281
00:19:18,000 --> 00:19:24,000
from randomized BST sort,
OK, resulting from randomly

282
00:19:24,000 --> 00:19:30,000
permute in the array of just
inserting those elements using

283
00:19:30,000 --> 00:19:36,000
the simple tree insert
algorithm.

284
00:19:36,000 --> 00:19:40,000
The question is,
what does that tree look like?

285
00:19:40,000 --> 00:19:45,000
And in particular,
is there anything we can

286
00:19:45,000 --> 00:19:50,000
conclude out of this fact?
The expected running time of

287
00:19:50,000 --> 00:19:55,000
BST sort is n log n.
OK, I've mentioned cursorily

288
00:19:55,000 --> 00:20:02,000
what the running time of BST
sort is, several times.

289
00:20:02,000 --> 00:20:06,000
It was the sum.
So, this is the time of BST

290
00:20:06,000 --> 00:20:11,000
sort on n elements.
It's the sum over all nodes,

291
00:20:11,000 --> 00:20:17,000
X, of the depth of that node.
OK, depth starts at zero and

292
00:20:17,000 --> 00:20:21,000
works its way down because the
root element,

293
00:20:21,000 --> 00:20:27,000
you don't make any comparisons
beyond that, you are making

294
00:20:27,000 --> 00:20:32,000
whatever the depth is
comparisons.

295
00:20:32,000 --> 00:20:40,000
OK, so we know that this thing
is, in expectation we know that

296
00:20:40,000 --> 00:20:47,000
this is n log n.
What does that tell us about

297
00:20:47,000 --> 00:20:52,000
the tree?
This is for all nodes,

298
00:20:52,000 --> 00:20:58,000
X, in the tree.
Does it tell us anything about

299
00:20:58,000 --> 00:21:03,000
the height of the tree,
for example?

300
00:21:03,000 --> 00:21:07,000
Yeah?
Right, intuitively,

301
00:21:07,000 --> 00:21:11,000
it says that the height of the
tree is theta log n,

302
00:21:11,000 --> 00:21:13,000
and not n.
But, in fact,

303
00:21:13,000 --> 00:21:17,000
it doesn't show that.
And that's why if you feel that

304
00:21:17,000 --> 00:21:21,000
that's just intuition,
but it may not be quite right.

305
00:21:21,000 --> 00:21:24,000
Indeed it's not.
Let me tell you what it does

306
00:21:24,000 --> 00:21:27,000
say.
So, if we take expectation of

307
00:21:27,000 --> 00:21:31,000
both sides, here we get n log n.
So, the expected value of that

308
00:21:31,000 --> 00:21:35,000
is n log n.
So, over here,

309
00:21:35,000 --> 00:21:41,000
well, we get the expected total
depth, which is not so exciting.

310
00:21:41,000 --> 00:21:45,000
Let's look at the expected
average depth.

311
00:21:45,000 --> 00:21:51,000
So, if I look at one over n,
the sum over all n nodes in the

312
00:21:51,000 --> 00:21:57,000
tree of the depth of X,
that would be the average depth

313
00:21:57,000 --> 00:22:02,000
over all the nodes.
And what I should get is theta

314
00:22:02,000 --> 00:22:06,000
n log n over n because I divided
n on both sides.

315
00:22:06,000 --> 00:22:10,000
And, I'm using,
here, linearity of expectation,

316
00:22:10,000 --> 00:22:14,000
which is log n.
So, what this fact about the

317
00:22:14,000 --> 00:22:19,000
expected running time tells me
is that the average depth in the

318
00:22:19,000 --> 00:22:23,000
tree is log n,
which is not quite the height

319
00:22:23,000 --> 00:22:26,000
of the tree being log n.

320
00:22:35,000 --> 00:22:39,000
OK, remember the height of the
tree is the maximum depth of any

321
00:22:39,000 --> 00:22:41,000
node.
Here, we are just bounding the

322
00:22:41,000 --> 00:22:43,000
average depth.

323
00:23:04,000 --> 00:23:08,000
Let's look at an example of a
tree.

324
00:23:08,000 --> 00:23:14,000
I'll draw my favorite picture.
So, here we have a nice

325
00:23:14,000 --> 00:23:20,000
balanced tree,
let's say, on half of the nodes

326
00:23:20,000 --> 00:23:25,000
or a little more.
And then, I have one really

327
00:23:25,000 --> 00:23:30,000
long path hanging off one
particular leaf.

328
00:23:30,000 --> 00:23:37,000
It doesn't matter which one.
And, I'm going to say that this

329
00:23:37,000 --> 00:23:41,000
path has length,
with a total height here,

330
00:23:41,000 --> 00:23:45,000
I want to make root n,
which is a lot bigger than log

331
00:23:45,000 --> 00:23:47,000
n.
This is roughly log n.

332
00:23:47,000 --> 00:23:51,000
It's going to be log of n minus
root n, or so,

333
00:23:51,000 --> 00:23:54,000
roughly.
So, most of the nodes have

334
00:23:54,000 --> 00:23:58,000
logarithmic height and,
sorry, logarithmic depth.

335
00:23:58,000 --> 00:24:03,000
If you compute the average
depth in this particular tree,

336
00:24:03,000 --> 00:24:06,000
for most of the nodes,
let's say it's,

337
00:24:06,000 --> 00:24:12,000
at most, n of the nodes have
height log n.

338
00:24:12,000 --> 00:24:15,000
And then, there are root n
nodes, at most,

339
00:24:15,000 --> 00:24:19,000
down here, which have depth,
at most, root n.

340
00:24:19,000 --> 00:24:22,000
So, it's, at most,
root n times root n.

341
00:24:22,000 --> 00:24:26,000
In fact, it's like half that,
but not a big deal.

342
00:24:26,000 --> 00:24:29,000
So, this is n.
So, this is n log n,

343
00:24:29,000 --> 00:24:34,000
or, sorry, average depth:
I have to divide everything by

344
00:24:34,000 --> 00:24:38,000
n.
n log n would be rather large

345
00:24:38,000 --> 00:24:42,000
for an average height,
average depth.

346
00:24:42,000 --> 00:24:48,000
So, the average depth here is
log n, but the height of the

347
00:24:48,000 --> 00:24:53,000
tree is square root of n.
So, this is not enough.

348
00:24:53,000 --> 00:24:59,000
Just to know that the average
depth is log n doesn't mean that

349
00:24:59,000 --> 00:25:04,000
the height is log n.
OK, but the claim is this

350
00:25:04,000 --> 00:25:10,000
theorem for today is that the
expected height of a randomly

351
00:25:10,000 --> 00:25:16,000
built binary search tree is
indeed log n.

352
00:25:16,000 --> 00:25:21,000
BST is order log n.
This is what we like to know

353
00:25:21,000 --> 00:25:26,000
because that tells us,
if we just build a binary

354
00:25:26,000 --> 00:25:31,000
search tree randomly,
then we can search in it in log

355
00:25:31,000 --> 00:25:34,000
n time.
OK, for sorting,

356
00:25:34,000 --> 00:25:38,000
it's not as big a deal.
We just care about the expected

357
00:25:38,000 --> 00:25:41,000
running time of creating the
thing.

358
00:25:41,000 --> 00:25:44,000
Here, now we know that once we
prove this theorem,

359
00:25:44,000 --> 00:25:48,000
we know that we can search
quickly in expectation,

360
00:25:48,000 --> 00:25:53,000
in fact, most of the time.
So, the rest of today's lecture

361
00:25:53,000 --> 00:25:56,000
will be proving this theorem.
It's quite tricky,

362
00:25:56,000 --> 00:26:00,000
as you might imagine.
It's another big probability

363
00:26:00,000 --> 00:26:06,000
analysis along the lines of
quicksort and everything.

364
00:26:22,000 --> 00:26:26,000
So, I'm going to start with an
outline of the proof,

365
00:26:26,000 --> 00:26:31,000
unless there are any questions
about the theorem.

366
00:26:31,000 --> 00:26:35,000
It should be pretty clear what
we want to prove.

367
00:26:35,000 --> 00:26:40,000
This is even weirder than most
of the analyses we've seen.

368
00:26:40,000 --> 00:26:45,000
It's going to use a fancy
trick, which is exponentiating a

369
00:26:45,000 --> 00:26:50,000
random variable.
And to do that we need a tool

370
00:26:50,000 --> 00:26:54,000
called Jenson's inequality.
We are going to prove that

371
00:26:54,000 --> 00:26:57,000
tool.
Usually, we don't prove

372
00:26:57,000 --> 00:27:01,000
probability tools.
But this one we are going to

373
00:27:01,000 --> 00:27:03,000
prove.
It's not too hard.

374
00:27:03,000 --> 00:27:09,000
It's also basic analysis.
So, the lemma,

375
00:27:09,000 --> 00:27:13,000
says that if we have what's
called to a convex function,

376
00:27:13,000 --> 00:27:17,000
f, and you should all know what
that means, but I'll define it

377
00:27:17,000 --> 00:27:21,000
soon in case you have forgotten.
If you have a convex function,

378
00:27:21,000 --> 00:27:25,000
f, and you have a random
variable, X, you take f of the

379
00:27:25,000 --> 00:27:27,000
expectation.
That's, at most,

380
00:27:27,000 --> 00:27:32,000
the expectation of f of that
random variable.

381
00:27:32,000 --> 00:27:40,000
Think about it enough and draw
a convex function that is fairly

382
00:27:40,000 --> 00:27:46,000
intuitive, I guess.
But we will prove it.

383
00:27:46,000 --> 00:27:54,000
What that allows us to do is
instead of analyzing the random

384
00:27:54,000 --> 00:28:00,000
variable that tells us the
height of a tree,

385
00:28:00,000 --> 00:28:06,000
so, X_n I'll call the random
variable, RV,

386
00:28:06,000 --> 00:28:13,000
of the height of a BST,
randomly constructed BST on n

387
00:28:13,000 --> 00:28:21,000
nodes we will analyze.
Well, instead of analyzing this

388
00:28:21,000 --> 00:28:27,000
desired random variable,
X_n, sorry, this should have

389
00:28:27,000 --> 00:28:32,000
been in capital X.
We can analyze any convex

390
00:28:32,000 --> 00:28:35,000
function of X_n.
And, we're going to analyze the

391
00:28:35,000 --> 00:28:39,000
exponentiation.
So, I'm going to define Y_n to

392
00:28:39,000 --> 00:28:43,000
be two to the power of X_n.
OK, the big question here is

393
00:28:43,000 --> 00:28:47,000
why bother doing this?
The answer is because it works

394
00:28:47,000 --> 00:28:50,000
and it wouldn't work if we
analyze X_n.

395
00:28:50,000 --> 00:28:54,000
We will see some intuition of
that later on,

396
00:28:54,000 --> 00:28:59,000
but it's not very intuitive.
This is our analysis where you

397
00:28:59,000 --> 00:29:03,000
need this extra trick.
So, we're going to bound the

398
00:29:03,000 --> 00:29:05,000
expectation of Y_n,
and from that,

399
00:29:05,000 --> 00:29:09,000
and using Jensen's inequality,
we're going to get a bound on

400
00:29:09,000 --> 00:29:12,000
the expectation of X_n,
a pretty tight bound,

401
00:29:12,000 --> 00:29:16,000
actually, because if we can
bound the exponent up to

402
00:29:16,000 --> 00:29:18,000
constant factors,
the exponentiation up to

403
00:29:18,000 --> 00:29:21,000
constant factors,
we can bound X_n even better

404
00:29:21,000 --> 00:29:23,000
because you take logs to get
X_n.

405
00:29:23,000 --> 00:29:28,000
So, we will even figure out
what the constant is.

406
00:29:28,000 --> 00:29:33,000
So, what we will prove,
this is the heart of the proof,

407
00:29:33,000 --> 00:29:37,000
is that the expected value of
Y_n is order n^3.

408
00:29:37,000 --> 00:29:42,000
Here, we won't really know what
the constant is.

409
00:29:42,000 --> 00:29:46,000
We don't need to.
And then, we put these pieces

410
00:29:46,000 --> 00:29:49,000
together.
So, let's do that.

411
00:29:49,000 --> 00:29:54,000
What we really care about is
the expectation of X_n,

412
00:29:54,000 --> 00:29:57,000
which is the height of our
tree.

413
00:29:57,000 --> 00:30:02,000
What we find out about is this
fact.

414
00:30:02,000 --> 00:30:05,000
So, leave some horizontal space
here.

415
00:30:05,000 --> 00:30:09,000
We get the expectation of two
to the X_n.

416
00:30:09,000 --> 00:30:14,000
That's the expectation of Y_n.
So, we learned that that's

417
00:30:14,000 --> 00:30:18,000
order n^3.
And, Jensen's inequality tells

418
00:30:18,000 --> 00:30:23,000
us that if we take this
function, two to the X,

419
00:30:23,000 --> 00:30:27,000
we plug it in here,
that on the left-hand side we

420
00:30:27,000 --> 00:30:33,000
get two to the E of X.
So, we get two to the E of X_n

421
00:30:33,000 --> 00:30:38,000
is at most E of two to the X_n.
So, that's where we use

422
00:30:38,000 --> 00:30:43,000
Jensen's inequality,
because what we care about is E

423
00:30:43,000 --> 00:30:46,000
of X_n.
So now, we have a bound.

424
00:30:46,000 --> 00:30:50,000
We say, well,
two to the E of X_n is,

425
00:30:50,000 --> 00:30:54,000
at most, n^3.
So, if we take the log of both

426
00:30:54,000 --> 00:31:00,000
sides, we get E of X_n is,
at most, the log of n^3.

427
00:31:00,000 --> 00:31:05,000
OK, I will write it in this
funny way, log of order n^3,

428
00:31:05,000 --> 00:31:09,000
which will actually tell us the
constant.

429
00:31:09,000 --> 00:31:12,000
This is three log n plus order
one.

430
00:31:12,000 --> 00:31:18,000
So, we will prove that the
expected height of a randomly

431
00:31:18,000 --> 00:31:24,000
constructed binary search tree
on n nodes is roughly three log

432
00:31:24,000 --> 00:31:28,000
n, at most.
OK, I will say more about that

433
00:31:28,000 --> 00:31:31,000
later.
So, you've now seen the end of

434
00:31:31,000 --> 00:31:35,000
the proof.
That's the foreshadowing.

435
00:31:35,000 --> 00:31:38,000
And now, this is the top-down
approach.

436
00:31:38,000 --> 00:31:41,000
So, you sort of see what the
steps are.

437
00:31:41,000 --> 00:31:44,000
Now, we just have to do the
steps.

438
00:31:44,000 --> 00:31:46,000
OK, step one:
take a bit of work,

439
00:31:46,000 --> 00:31:50,000
but it's easy because it's
pretty basic stuff.

440
00:31:50,000 --> 00:31:54,000
Step two is just a definition
and we are done.

441
00:31:54,000 --> 00:31:57,000
Step three is probably the
hardest part.

442
00:31:57,000 --> 00:32:03,000
Step four, we've already done.
So, let's start with step one.

443
00:32:16,000 --> 00:32:22,000
So, the first thing I need to
do is define a convex function

444
00:32:22,000 --> 00:32:29,000
because we are going to
manipulate the definition a fair

445
00:32:29,000 --> 00:32:33,000
amount.
So, this is a notion from real

446
00:32:33,000 --> 00:32:36,000
analysis.
Analysis is a fancy word for

447
00:32:36,000 --> 00:32:40,000
calculus if you haven't taken
the proper analysis class.

448
00:32:40,000 --> 00:32:44,000
You should have seen convexity
in any calculus class.

449
00:32:44,000 --> 00:32:47,000
A convex function is one that
looks like this.

450
00:32:47,000 --> 00:32:50,000
OK, good.
One way to formalize that

451
00:32:50,000 --> 00:32:53,000
notion is to consider any two
points on this curve.

452
00:32:53,000 --> 00:32:57,000
So, I'm only interested in
functions from reals to reals.

453
00:32:57,000 --> 00:33:01,000
So, it looks like this.
This is f of something.

454
00:33:01,000 --> 00:33:05,000
And, this is the something.
If I take two points on this

455
00:33:05,000 --> 00:33:08,000
curve, and I draw a line segment
connecting them,

456
00:33:08,000 --> 00:33:11,000
that line segment is always
above the curve.

457
00:33:11,000 --> 00:33:13,000
That's the meaning of
convexity.

458
00:33:13,000 --> 00:33:16,000
It has a geometric notion,
which is basically the same.

459
00:33:16,000 --> 00:33:19,000
But for functions,
this line segment should stay

460
00:33:19,000 --> 00:33:22,000
above the curve.
The line does not stay above

461
00:33:22,000 --> 00:33:24,000
the curve.
If I extended it farther,

462
00:33:24,000 --> 00:33:26,000
it goes beneath the curve,
of course.

463
00:33:26,000 --> 00:33:31,000
But, that segment should.
So, I'm going to formalize that

464
00:33:31,000 --> 00:33:33,000
a little bit.
I'll call this x,

465
00:33:33,000 --> 00:33:37,000
and then this is f of x.
And, I'll call this y,

466
00:33:37,000 --> 00:33:41,000
and this is f of y.
So, the claim is that I take

467
00:33:41,000 --> 00:33:44,000
any number between x and y,
and I look up,

468
00:33:44,000 --> 00:33:48,000
and I say, OK,
here's the point on the curve.

469
00:33:48,000 --> 00:33:50,000
Here's the point on the line
segment.

470
00:33:50,000 --> 00:33:54,000
The value of that point on the
y value, here,

471
00:33:54,000 --> 00:33:58,000
should be greater than or equal
to the y value here,

472
00:33:58,000 --> 00:34:01,000
OK?
To figure out what the point

473
00:34:01,000 --> 00:34:06,000
is, we need some,
I would call it geometry.

474
00:34:06,000 --> 00:34:08,000
I'm sure it's an analysis
concept, too.

475
00:34:08,000 --> 00:34:12,000
But, I'm a geometer,
so I get to call it geometry.

476
00:34:12,000 --> 00:34:16,000
If you have two points,
p and q, and you want to

477
00:34:16,000 --> 00:34:19,000
parameterize this line segment
between them,

478
00:34:19,000 --> 00:34:24,000
so, I want to parameterize some
points here, the way to do it is

479
00:34:24,000 --> 00:34:29,000
to take a linear combination.
And, if you should have taken

480
00:34:29,000 --> 00:34:32,000
some linear algebra,
linear combination look

481
00:34:32,000 --> 00:34:35,000
something like this.
And, in fact,

482
00:34:35,000 --> 00:34:39,000
we're going to take something
called an affine combination

483
00:34:39,000 --> 00:34:41,000
where alpha plus beta equals
one.

484
00:34:41,000 --> 00:34:43,000
It turns out,
if you take all such points,

485
00:34:43,000 --> 00:34:45,000
some number,
alpha, times the point,

486
00:34:45,000 --> 00:34:48,000
p, plus some number,
beta times the point,

487
00:34:48,000 --> 00:34:50,000
q, where alpha plus beta equals
one.

488
00:34:50,000 --> 00:34:53,000
If you take all those points,
you get the entire line here,

489
00:34:53,000 --> 00:34:56,000
which is nifty.
But, we don't want the entire

490
00:34:56,000 --> 00:34:58,000
line.
If you also constrained alpha

491
00:34:58,000 --> 00:35:01,000
and beta to be nonnegative,
you just get this line segment.

492
00:35:01,000 --> 00:35:05,000
So, this forces alpha and beta
to be between zero and one

493
00:35:05,000 --> 00:35:10,000
because they have to sum to one,
and they are nonnegative.

494
00:35:10,000 --> 00:35:14,000
So, what we are going to do
here is take alpha times x plus

495
00:35:14,000 --> 00:35:17,000
beta times y.
That's going to be our point

496
00:35:17,000 --> 00:35:22,000
between with these constraints:
alpha plus beta equals one.

497
00:35:22,000 --> 00:35:26,000
Alpha and beta are greater than
or equal to zero.

498
00:35:26,000 --> 00:35:31,000
Then, this point is f of that.
This is f of alpha x plus beta,

499
00:35:31,000 --> 00:35:34,000
y.
And, this point is the linear

500
00:35:34,000 --> 00:35:38,000
interpolation between f of x and
f of y, the same one.

501
00:35:38,000 --> 00:35:42,000
So, it's alpha times f of x
plus beta times f of y.

502
00:35:42,000 --> 00:35:46,000
OK, that's the intuition.
If you didn't follow it,

503
00:35:46,000 --> 00:35:51,000
it's not too big a deal because
all we care about are the

504
00:35:51,000 --> 00:35:54,000
symbolic answer for proving
things.

505
00:35:54,000 --> 00:35:56,000
But, that's where this comes
from.

506
00:35:56,000 --> 00:36:03,000
So, here's the definition.
Its function is convex.

507
00:36:03,000 --> 00:36:09,000
If, for all x and y,
and all alpha and beta are

508
00:36:09,000 --> 00:36:16,000
greater than or equal to zero,
whose sum is one,

509
00:36:16,000 --> 00:36:25,000
we have f of alpha x plus beta
y is less than or equal to alpha

510
00:36:25,000 --> 00:36:32,000
f of x plus beta f of y.
So, that's just saying that

511
00:36:32,000 --> 00:36:38,000
this y coordinate here is less
than or equal to this y

512
00:36:38,000 --> 00:36:41,000
coordinate.
OK, but that's the symbolism

513
00:36:41,000 --> 00:36:46,000
behind that picture.
OK, so now we want to prove

514
00:36:46,000 --> 00:36:51,000
Jensen's inequality.
OK, we're not quite there yet.

515
00:36:51,000 --> 00:36:57,000
We are going to prove a simple
lemma, from which it will be

516
00:36:57,000 --> 00:37:02,000
easy to derive Jenson's
equality.

517
00:37:02,000 --> 00:37:07,000
So, this is the theorem we are
proving.

518
00:37:07,000 --> 00:37:13,000
So, here's a lemma about convex
functions.

519
00:37:13,000 --> 00:37:22,000
You may have seen it before.
It will be crucial to Jensen's

520
00:37:22,000 --> 00:37:25,000
inequality.
So, suppose,

521
00:37:25,000 --> 00:37:34,000
this is a statement about
affine combinations of n things

522
00:37:34,000 --> 00:37:41,000
instead of two things.
So, this will say that

523
00:37:41,000 --> 00:37:46,000
convexity can be generalized to
taking n things.

524
00:37:46,000 --> 00:37:52,000
So, suppose we have n real
numbers, and we have n values

525
00:37:52,000 --> 00:37:55,000
alpha i, alpha one up to alpha
n.

526
00:37:55,000 --> 00:38:00,000
They are all nonnegative.
And, their sum is one.

527
00:38:00,000 --> 00:38:06,000
So, the sum of alpha k,
I guess, k equals one to n,

528
00:38:06,000 --> 00:38:11,000
is one.
So, those are the assumptions.

529
00:38:11,000 --> 00:38:18,000
The conclusion is the same
thing, but summing over all k.

530
00:38:18,000 --> 00:38:22,000
So, k equals one to n,
alpha_k * x_k.

531
00:38:22,000 --> 00:38:29,000
Take f of that versus taking
the sum of the alphas times the

532
00:38:29,000 --> 00:38:32,000
f's.
k equals one to n.

533
00:38:32,000 --> 00:38:37,000
So, the definition of convexity
is exactly that statement,

534
00:38:37,000 --> 00:38:42,000
but where n equals two.
OK, alpha one and alpha two are

535
00:38:42,000 --> 00:38:46,000
alpha and beta.
This is just a statement for

536
00:38:46,000 --> 00:38:50,000
general n.
And, you can interpret this in

537
00:38:50,000 --> 00:38:53,000
some funnier way,
which I won't get into.

538
00:38:53,000 --> 00:38:56,000
Oh, sure, why not?
I'm a geometer.

539
00:38:56,000 --> 00:39:03,000
So, this is saying you take
several points on this curve.

540
00:39:03,000 --> 00:39:05,000
You take the polygon that they
define.

541
00:39:05,000 --> 00:39:07,000
So, these are straight-line
segments.

542
00:39:07,000 --> 00:39:10,000
You take the interior.
If you take an affine

543
00:39:10,000 --> 00:39:13,000
combination like that,
you will get a point inside

544
00:39:13,000 --> 00:39:16,000
that polygon,
or possibly on the boundary.

545
00:39:16,000 --> 00:39:20,000
The claim is that all those
points are above the curve.

546
00:39:20,000 --> 00:39:23,000
Again, intuitively:
true if you draw a nice,

547
00:39:23,000 --> 00:39:25,000
canonical convex curve,
but in fact,

548
00:39:25,000 --> 00:39:27,000
it's true algebraically,
too.

549
00:39:27,000 --> 00:39:33,000
It's always a good thing.
Any suggestions on how we might

550
00:39:33,000 --> 00:39:36,000
prove this theorem,
this lemma?

551
00:39:36,000 --> 00:39:40,000
It's pretty easy.
So, what technique might we use

552
00:39:40,000 --> 00:39:44,000
to prove it?
One word: induction.

553
00:39:44,000 --> 00:39:46,000
Always a good answer,
yeah.

554
00:39:46,000 --> 00:39:52,000
Induction should shout out at
you here because we already know

555
00:39:52,000 --> 00:40:00,000
that this is true by definition
of convexity for n equals two.

556
00:40:00,000 --> 00:40:04,000
So, the base case is clear.
In fact, there's an even

557
00:40:04,000 --> 00:40:08,000
simpler base case,
which is when n equals one.

558
00:40:08,000 --> 00:40:13,000
If n equals one,
then you have one number that

559
00:40:13,000 --> 00:40:16,000
sums to one.
So, alpha one is one.

560
00:40:16,000 --> 00:40:19,000
And so, nothing is going on
here.

561
00:40:19,000 --> 00:40:23,000
This is just saying that f of
one times x_1 is,

562
00:40:23,000 --> 00:40:28,000
at most, one times f of x_1:
so, not terribly exciting

563
00:40:28,000 --> 00:40:33,000
because that holds with the
quality.

564
00:40:33,000 --> 00:40:37,000
OK, so we don't even need the n
equals two base case.

565
00:40:37,000 --> 00:40:42,000
So, the interesting part,
although still not terribly

566
00:40:42,000 --> 00:40:45,000
interesting, is the induction
step.

567
00:40:45,000 --> 00:40:48,000
This is good practice in
induction.

568
00:40:48,000 --> 00:40:53,000
So, what we care about is this
f of this linear combination,

569
00:40:53,000 --> 00:40:57,000
f on combination,
x_k times x_k summed over all

570
00:40:57,000 --> 00:41:01,000
k.
Now, what I would like to do is

571
00:41:01,000 --> 00:41:05,000
apply induction.
What I know about inductively,

572
00:41:05,000 --> 00:41:09,000
is say f of this sum,
if it's summed only up to n

573
00:41:09,000 --> 00:41:12,000
minus one instead of all the way
up to n.

574
00:41:12,000 --> 00:41:16,000
Any smaller sum I can deal with
by induction.

575
00:41:16,000 --> 00:41:20,000
So, I'm going to try and get
rid of the nth term.

576
00:41:20,000 --> 00:41:24,000
I want to separate it out.
And, this is fairly natural if

577
00:41:24,000 --> 00:41:28,000
you've played with affine
combinations before.

578
00:41:28,000 --> 00:41:35,000
But it's just some algebra.
So, I want to separate out the

579
00:41:35,000 --> 00:41:40,000
alpha_n*x_n term.
And, I'd also like to make it

580
00:41:40,000 --> 00:41:45,000
an affine combination.
This is the trick.

581
00:41:45,000 --> 00:41:50,000
Sorry, no f here.
If I just removed the last

582
00:41:50,000 --> 00:41:57,000
term, the alpha k's from one up
to n minus one wouldn't sum to

583
00:41:57,000 --> 00:42:02,000
one anymore.
They'd sum to something

584
00:42:02,000 --> 00:42:05,000
smaller.
So, I can't just take out this

585
00:42:05,000 --> 00:42:08,000
term.
I'm going to have to do some

586
00:42:08,000 --> 00:42:10,000
trickery here,
x_k plus the f.

587
00:42:10,000 --> 00:42:13,000
Good.
So, you should see why this is

588
00:42:13,000 --> 00:42:17,000
true, because the one minus
alpha n's cancel.

589
00:42:17,000 --> 00:42:22,000
And then, I'm just getting the
sum of alpha_k*x_k,

590
00:42:22,000 --> 00:42:28,000
k equals one to n minus one,
plus the alpha_n*x_n term.

591
00:42:28,000 --> 00:42:30,000
So, I haven't done anything
here.

592
00:42:30,000 --> 00:42:32,000
These are equal.
But now, I have this nifty

593
00:42:32,000 --> 00:42:36,000
feature, that on the one hand,
these two numbers,

594
00:42:36,000 --> 00:42:38,000
alpha n and one minus alpha n
sum to one.

595
00:42:38,000 --> 00:42:41,000
And on the other hand,
if I did it right,

596
00:42:41,000 --> 00:42:45,000
these numbers should sum up to
one just going from one up to n

597
00:42:45,000 --> 00:42:47,000
minus one.
Why do they sum up to one?

598
00:42:47,000 --> 00:42:51,000
Well, these numbers summed up
to one minus alpha n.

599
00:42:51,000 --> 00:42:54,000
And so, I'm dividing everything
by one minus alpha n.

600
00:42:54,000 --> 00:42:57,000
So, they will sum to one.
So now, I have two affine

601
00:42:57,000 --> 00:43:02,000
combinations.
I just apply the two things

602
00:43:02,000 --> 00:43:07,000
that I know.
I know this affine combination

603
00:43:07,000 --> 00:43:10,000
will work because,
well, why?

604
00:43:10,000 --> 00:43:16,000
Why can I say that this is
alpha n f of x_n plus one minus

605
00:43:16,000 --> 00:43:20,000
alpha n f of this crazy sum?

606
00:43:35,000 --> 00:43:41,000
Shout it out.
There are two possible answers.

607
00:43:41,000 --> 00:43:47,000
One is correct,
and one is incorrect.

608
00:43:47,000 --> 00:43:55,000
So, which will it be?
This should have been less than

609
00:43:55,000 --> 00:44:01,000
or equal to.
That's important.

610
00:44:01,000 --> 00:44:04,000
It's on the board.
It can't be too difficult.

611
00:44:17,000 --> 00:44:21,000
So, I'm treating this as just
one big X value.

612
00:44:21,000 --> 00:44:26,000
So, I have some x_n,
and I have some crazy X.

613
00:44:26,000 --> 00:44:31,000
I want f of the affine
combination of those two X

614
00:44:31,000 --> 00:44:36,000
values is, at most,
the affine combinations of the

615
00:44:36,000 --> 00:44:40,000
f's of those X values.
This is?

616
00:44:40,000 --> 00:44:43,000
It is the inductive hypothesis
where n equals two.

617
00:44:43,000 --> 00:44:45,000
Unfortunately,
we didn't prove the n equals

618
00:44:45,000 --> 00:44:49,000
two case is a special base case.
So, we can't use induction here

619
00:44:49,000 --> 00:44:52,000
the way that I've stated the
base case.

620
00:44:52,000 --> 00:44:55,000
If you did n equals two base
case, you can do that.

621
00:44:55,000 --> 00:44:58,000
Here, we can't.
So, the other answer is by

622
00:44:58,000 --> 00:45:02,000
convexity, good.
That's right here.

623
00:45:02,000 --> 00:45:08,000
So, f is convex.
We know that this is true for

624
00:45:08,000 --> 00:45:15,000
any two X values,
and provided these two sum to

625
00:45:15,000 --> 00:45:20,000
one.
So, we know that this is true.

626
00:45:20,000 --> 00:45:28,000
Now is when we apply induction.
So, now we are going to

627
00:45:28,000 --> 00:45:35,000
manipulate this right term by
induction.

628
00:45:35,000 --> 00:45:40,000
See, before we didn't
necessarily know that n was

629
00:45:40,000 --> 00:45:44,000
bigger than two.
But, we know that n is bigger

630
00:45:44,000 --> 00:45:49,000
than n minus one.
That much, I can be sure of.

631
00:45:49,000 --> 00:45:53,000
So, this is one minus alpha n
times the sum,

632
00:45:53,000 --> 00:46:00,000
k equals one to n minus one of
alpha k over one minus alpha n

633
00:46:00,000 --> 00:46:05,000
times f of x_k,
if I got that right.

634
00:46:05,000 --> 00:46:09,000
This is by induction,
the induction hypothesis,

635
00:46:09,000 --> 00:46:16,000
because these alpha k's over
one minus alpha n sum to one.

636
00:46:16,000 --> 00:46:22,000
Now, these one minus alpha n's
cancel, and we just get what we

637
00:46:22,000 --> 00:46:26,000
want.
This is sum k equals one to n

638
00:46:26,000 --> 00:46:31,000
of alpha k, f of x_k.
So, we get f of the sum is,

639
00:46:31,000 --> 00:46:37,000
at most, sum of the f's.
That proves the lemma.

640
00:46:37,000 --> 00:46:43,000
OK, a bit tedious,
but each step is pretty

641
00:46:43,000 --> 00:46:46,000
straightforward.
Do you agree?

642
00:46:46,000 --> 00:46:53,000
Now, it turns out to be
relatively straightforward to

643
00:46:53,000 --> 00:47:00,000
prove Jensen's inequality.
That's the magic.

644
00:47:00,000 --> 00:47:04,000
And then, we get to do the
expectation analysis.

645
00:47:04,000 --> 00:47:09,000
So, we use our good friends,
indicator random variables.

646
00:47:09,000 --> 00:47:13,000
OK, but for now,
we just want to prove this

647
00:47:13,000 --> 00:47:16,000
statement.
If we have a convex function,

648
00:47:16,000 --> 00:47:21,000
f of the expectation is,
at most, expectation of f of

649
00:47:21,000 --> 00:47:26,000
that random variable.
OK, this is a random variable,

650
00:47:26,000 --> 00:47:29,000
right?
If you want to sample from this

651
00:47:29,000 --> 00:47:33,000
random variable,
you sample from X,

652
00:47:33,000 --> 00:47:39,000
and then you apply f to it.
That's the meaning of this

653
00:47:39,000 --> 00:47:45,000
notation, f of X because X is a
random variable.

654
00:47:45,000 --> 00:47:51,000
We get to use that f is convex.
OK, it turns out this is not

655
00:47:51,000 --> 00:47:57,000
hard, if you remember the
definition of expectation,

656
00:47:57,000 --> 00:48:01,000
oh, I want to make one more
assumption here,

657
00:48:01,000 --> 00:48:08,000
which is that X is integral.
So, it's an integer random

658
00:48:08,000 --> 00:48:11,000
variable, meaning it takes
integer values.

659
00:48:11,000 --> 00:48:16,000
OK, that's all we care about
because we're looking at running

660
00:48:16,000 --> 00:48:19,000
times.
This statement is true for

661
00:48:19,000 --> 00:48:24,000
continuous random variables,
too, but I would like to do the

662
00:48:24,000 --> 00:48:29,000
discrete case because then I get
to write down what U of X is.

663
00:48:29,000 --> 00:48:34,000
So, what is the definition of E
of X?

664
00:48:34,000 --> 00:48:40,000
X only takes on integer values.
This is easy,

665
00:48:40,000 --> 00:48:47,000
but you have to remember it.
It's a good drill.

666
00:48:47,000 --> 00:48:55,000
I don't really know much about
X except that it takes on

667
00:48:55,000 --> 00:49:02,000
integer values.
Any suggestions on how I should

668
00:49:02,000 --> 00:49:10,000
expand the expectation of X?
How many people know this by

669
00:49:10,000 --> 00:49:14,000
heart?
OK, it's not too easy then.

670
00:49:14,000 --> 00:49:20,000
Well, expectation has something
to do with probability,

671
00:49:20,000 --> 00:49:23,000
right?
So, I should be looking at

672
00:49:23,000 --> 00:49:29,000
something like the probability
that X equals some value,

673
00:49:29,000 --> 00:49:32,000
x.
That would seem like a good

674
00:49:32,000 --> 00:49:36,000
thing to do.
What else goes here?

675
00:49:36,000 --> 00:49:39,000
A sum, yeah.
The sum, well,

676
00:49:39,000 --> 00:49:44,000
X could be somewhere between
minus infinity and infinity.

677
00:49:44,000 --> 00:49:49,000
That's certainly true.
And, we have some more.

678
00:49:49,000 --> 00:49:54,000
There's something missing here.
What is this sum?

679
00:49:54,000 --> 00:49:58,000
What does it come out to for
any random variable,

680
00:49:58,000 --> 00:50:03,000
X, that takes on integer
values?

681
00:50:03,000 --> 00:50:06,000
One, good.
So, I need to add in something

682
00:50:06,000 --> 00:50:10,000
here, namely X.
OK, that's the definition of

683
00:50:10,000 --> 00:50:13,000
the expectation.
Now, f of a sum of things,

684
00:50:13,000 --> 00:50:18,000
where these coefficients sum to
one looks an awful lot like the

685
00:50:18,000 --> 00:50:23,000
lemma that we just proved.
OK, we proved it in the finite

686
00:50:23,000 --> 00:50:25,000
case.
It turns out,

687
00:50:25,000 --> 00:50:30,000
it holds just as well if you
take all integers.

688
00:50:30,000 --> 00:50:33,000
So, I'm just going to assume
that.

689
00:50:33,000 --> 00:50:39,000
So, I have these probabilities,
these alpha values sum to one.

690
00:50:39,000 --> 00:50:44,000
Therefore, I can use this
inequality, that this is,

691
00:50:44,000 --> 00:50:49,000
at most, let me get this right,
I have the alphas,

692
00:50:49,000 --> 00:50:53,000
so I have a sum,
x equals minus infinity to

693
00:50:53,000 --> 00:50:58,000
infinity of the alphas,
which are a probability;

694
00:50:58,000 --> 00:51:03,000
capital X equals little x times
f of the value,

695
00:51:03,000 --> 00:51:09,000
f of little x.
OK, so there it is.

696
00:51:09,000 --> 00:51:16,000
I've used the lemma.
So, maybe now I'll erase the

697
00:51:16,000 --> 00:51:21,000
lemma.
OK, I cheated by using the

698
00:51:21,000 --> 00:51:31,000
countable version of the lemma
while only proving the finite

699
00:51:31,000 --> 00:51:36,000
case.
It's all I can do in lecture.

700
00:51:36,000 --> 00:51:42,000
So, this is by a lemma.
Now, what I'd like to prove and

701
00:51:42,000 --> 00:51:47,000
leave some blank space here is
this is, at most,

702
00:51:47,000 --> 00:51:51,000
E of f of X,
so that this summation is,

703
00:51:51,000 --> 00:51:56,000
at most, E of f of X.
Actually, it's equal to E of f

704
00:51:56,000 --> 00:52:00,000
of X.
And, it really looks kind of

705
00:52:00,000 --> 00:52:05,000
equal, right?
You've got sum of some

706
00:52:05,000 --> 00:52:09,000
probabilities times f of X.
It almost looks like the

707
00:52:09,000 --> 00:52:13,000
definition of E of f of X,
but it isn't.

708
00:52:13,000 --> 00:52:18,000
You've got to be a little bit
careful because E of f of X

709
00:52:18,000 --> 00:52:23,000
should talk about the
probability that f of X equals a

710
00:52:23,000 --> 00:52:28,000
particular value.
We can relate these as follows.

711
00:52:28,000 --> 00:52:32,000
It's not too hard.
You can look at each value that

712
00:52:32,000 --> 00:52:37,000
f takes on, and then look at all
the values, k,

713
00:52:37,000 --> 00:52:41,000
that map to that value,
x.

714
00:52:41,000 --> 00:52:48,000
So all the k's where f of X
equals x, the probability that X

715
00:52:48,000 --> 00:52:54,000
equals k, OK,
this is another way of writing

716
00:52:54,000 --> 00:53:00,000
the probability that f of X
equals x.

717
00:53:00,000 --> 00:53:04,000
OK, so, in other words,
I'm grouping the terms in a

718
00:53:04,000 --> 00:53:07,000
particular way.
I'm saying, well,

719
00:53:07,000 --> 00:53:12,000
f of X takes on various values.
Clever me to switch.

720
00:53:12,000 --> 00:53:18,000
I used to use k's unannounced,
so I better call this something

721
00:53:18,000 --> 00:53:20,000
else.
Let's call this Y,

722
00:53:20,000 --> 00:53:25,000
sorry, switch notation here.
It makes sense.

723
00:53:25,000 --> 00:53:31,000
I should look at the
probability that X equals x.

724
00:53:31,000 --> 00:53:35,000
So, what I really care about is
what this f of X value takes on.

725
00:53:35,000 --> 00:53:38,000
Let's just call it Y,
look at all the values,

726
00:53:38,000 --> 00:53:41,000
Y, that f could take on.
That's the range of f.

727
00:53:41,000 --> 00:53:46,000
And then, I'll look at all the
different values of X where f of

728
00:53:46,000 --> 00:53:47,000
X equals Y.
If I add up those

729
00:53:47,000 --> 00:53:50,000
probabilities,
because these are different

730
00:53:50,000 --> 00:53:53,000
values of X.
Those are sort of independent

731
00:53:53,000 --> 00:53:56,000
events.
So, this summation will be the

732
00:53:56,000 --> 00:53:58,000
probability that f of X equals
Y.

733
00:53:58,000 --> 00:54:02,000
This is capital X.
This is little y.

734
00:54:02,000 --> 00:54:09,000
And then, if I multiply that by
y, I'm getting the expectation

735
00:54:09,000 --> 00:54:12,000
of f of X.
So, think about this,

736
00:54:12,000 --> 00:54:18,000
these two inequalities hold.
This may be a bit bizarre here

737
00:54:18,000 --> 00:54:22,000
because these sums are
potentially infinite.

738
00:54:22,000 --> 00:54:26,000
But, it's true.
OK, this proves Jensen's

739
00:54:26,000 --> 00:54:30,000
inequality.
So, it wasn't very hard,

740
00:54:30,000 --> 00:54:35,000
just a couple of boards,
once we had this powerful

741
00:54:35,000 --> 00:54:41,000
convexity lemma.
So, we just used convexity.

742
00:54:41,000 --> 00:54:43,000
We used the definition of E of
X.

743
00:54:43,000 --> 00:54:47,000
We used convexity.
That lets us put the f's

744
00:54:47,000 --> 00:54:50,000
inside.
Then we do this regrouping of

745
00:54:50,000 --> 00:54:54,000
terms, and we figure out,
oh, that's just E of f of X.

746
00:54:54,000 --> 00:54:58,000
So, the only inequality here is
coming from convexity.

747
00:54:58,000 --> 00:55:01,000
All right, now comes the
algorithms.

748
00:55:01,000 --> 00:55:05,000
So, this was just some basic
probability stuff,

749
00:55:05,000 --> 00:55:10,000
which is good to practice.
OK, we could see in the quiz,

750
00:55:10,000 --> 00:55:13,000
which is not surprising.
This is the case for me,

751
00:55:13,000 --> 00:55:15,000
too.
You have a lot of intuition

752
00:55:15,000 --> 00:55:17,000
with algorithms.
Whenever it's algorithmic,

753
00:55:17,000 --> 00:55:21,000
it makes a lot of sense because
you're sort of grounded in some

754
00:55:21,000 --> 00:55:24,000
things that you know because you
are computer scientists,

755
00:55:24,000 --> 00:55:27,000
or something of that ilk.
For the purposes of this class,

756
00:55:27,000 --> 00:55:32,000
you are computer scientists.
But, with sort of the basic

757
00:55:32,000 --> 00:55:36,000
probability, unless you happen
to be a mathematician,

758
00:55:36,000 --> 00:55:40,000
it's less intuitive,
and therefore harder to get

759
00:55:40,000 --> 00:55:42,000
fast.
And, in quiz one,

760
00:55:42,000 --> 00:55:45,000
speed is pretty important.
On the final,

761
00:55:45,000 --> 00:55:50,000
speed will also be important.
The take home certainly doesn't

762
00:55:50,000 --> 00:55:53,000
hurt.
So, the take home is more

763
00:55:53,000 --> 00:55:56,000
interesting because it requires
being clever.

764
00:55:56,000 --> 00:56:01,000
You have to actually be
creative.

765
00:56:01,000 --> 00:56:03,000
And, that really tests
algorithmic design.

766
00:56:03,000 --> 00:56:06,000
So far, we've mainly tested
analysis, and just,

767
00:56:06,000 --> 00:56:09,000
can you work through
probability?

768
00:56:09,000 --> 00:56:12,000
Can you figure out what the,
can you remember what your

769
00:56:12,000 --> 00:56:15,000
running time of randomized
quicksort is,

770
00:56:15,000 --> 00:56:17,000
and so on?
Quiz two will actually test

771
00:56:17,000 --> 00:56:20,000
creativity because you have more
time.

772
00:56:20,000 --> 00:56:22,000
It's hard to be creative in two
hours.

773
00:56:22,000 --> 00:56:26,000
OK, so we want to analyze the
expected height of a randomly

774
00:56:26,000 --> 00:56:32,000
constructed binary search tree.
So, I've defined this before,

775
00:56:32,000 --> 00:56:38,000
but let me repeat it because it
was a while ago almost at the

776
00:56:38,000 --> 00:56:42,000
beginning of lecture.
I'm going to take the random

777
00:56:42,000 --> 00:56:48,000
variable of the height of a
randomly built binary search

778
00:56:48,000 --> 00:56:51,000
tree on n nodes.
So, that was randomized,

779
00:56:51,000 --> 00:56:55,000
the n values.
Take a random permutation,

780
00:56:55,000 --> 00:57:02,000
insert them one by one from
left to right with tree insert.

781
00:57:02,000 --> 00:57:05,000
What is the height of the tree
that you get?

782
00:57:05,000 --> 00:57:08,000
What is the maximum depth of
any node?

783
00:57:08,000 --> 00:57:11,000
I'm not going to look so much
at X_n.

784
00:57:11,000 --> 00:57:14,000
I'm going to look at the
exponentiation of X_n.

785
00:57:14,000 --> 00:57:17,000
And, still we have no intuition
why.

786
00:57:17,000 --> 00:57:20,000
But, two to the X is a convex
function.

787
00:57:20,000 --> 00:57:23,000
OK, it looks like that.
It's very sharp.

788
00:57:23,000 --> 00:57:27,000
That's the best I can do for
drawing, two to the X.

789
00:57:27,000 --> 00:57:31,000
You saw how I drew my
histogram.

790
00:57:31,000 --> 00:57:34,000
So, we want to somehow write
this random variable as

791
00:57:34,000 --> 00:57:36,000
something, OK,
in some algebra.

792
00:57:36,000 --> 00:57:39,000
The main thing here is to split
into cases.

793
00:57:39,000 --> 00:57:42,000
That's how we usually go
because there's lots of

794
00:57:42,000 --> 00:57:45,000
different scenarios on what
happens.

795
00:57:45,000 --> 00:57:48,000
So, I mean, how do we construct
a tree from the beginning?

796
00:57:48,000 --> 00:57:51,000
First thing we do is we take
the first node.

797
00:57:51,000 --> 00:57:54,000
We throw it in,
make it the root.

798
00:57:54,000 --> 00:57:58,000
OK, so whatever the first value
happens to be in the array,

799
00:57:58,000 --> 00:58:02,000
which we don't really know how
that falls into sorted order,

800
00:58:02,000 --> 00:58:06,000
we put it at the root.
And, it stays the root.

801
00:58:06,000 --> 00:58:08,000
We never change the root from
then on.

802
00:58:08,000 --> 00:58:12,000
Now, of all the remaining
elements, some of them are less

803
00:58:12,000 --> 00:58:14,000
than this value,
and they go over here.

804
00:58:14,000 --> 00:58:17,000
So, let's call this r at the
root.

805
00:58:17,000 --> 00:58:19,000
And, some of them are greater
than r.

806
00:58:19,000 --> 00:58:22,000
So, they go over here.
Maybe there's more over here.

807
00:58:22,000 --> 00:58:25,000
Maybe there's more over here.
Who knows?

808
00:58:25,000 --> 00:58:28,000
Arbitrary partition,
in fact, uniformly random

809
00:58:28,000 --> 00:58:31,000
partition, which should sound
familiar, whether there are k

810
00:58:31,000 --> 00:58:34,000
elements over here,
and n minus k minus one

811
00:58:34,000 --> 00:58:36,000
elements over here,
for any value of k,

812
00:58:36,000 --> 00:58:42,000
that's equally likely because
this is chosen uniformly.

813
00:58:42,000 --> 00:58:44,000
The root is chosen uniformly.
It's the first element in a

814
00:58:44,000 --> 00:58:47,000
random permutation.
So, what I'm going to do is

815
00:58:47,000 --> 00:58:49,000
parameterize by that.
How many elements are over

816
00:58:49,000 --> 00:58:51,000
here, and how many elements are
over here?

817
00:58:51,000 --> 00:58:54,000
Because this thing is,
again, a randomly built binary

818
00:58:54,000 --> 00:58:57,000
search tree on however many
nodes are in there because after

819
00:58:57,000 --> 00:59:00,000
I pick r, it's determined who is
to the left and who is to the

820
00:59:00,000 --> 00:59:03,000
right.
And so, I can just partition.

821
00:59:03,000 --> 00:59:07,000
It's like running quicksort.
I partition the elements left

822
00:59:07,000 --> 00:59:11,000
of r, the elements right of r,
and I'm sort of recursively

823
00:59:11,000 --> 00:59:15,000
constructing a randomly built
binary search tree on those two

824
00:59:15,000 --> 00:59:18,000
sub-permutations because
sub-permutations of uniform

825
00:59:18,000 --> 00:59:22,000
permutations are uniform.
OK, so these are essentially

826
00:59:22,000 --> 00:59:25,000
recursive problems.
And, we know how to analyze

827
00:59:25,000 --> 00:59:28,000
recursive problems.
All we need to know is that

828
00:59:28,000 --> 00:59:31,000
there are k minus one elements
over here, and n minus k

829
00:59:31,000 --> 00:59:38,000
elements over here.
And, that would mean that r has

830
00:59:38,000 --> 00:59:45,000
rank k, remember,
rank in the sense of the index

831
00:59:45,000 --> 00:59:52,000
in assorted order.
So, where should I go?

832
01:00:08,000 --> 01:00:11,034
So, if the root,
r, has rank,

833
01:00:11,034 --> 01:00:17,318
k, so if this is a statement
about condition on this event,

834
01:00:17,318 --> 01:00:23,278
which is a random event,
then what we have is X_n equals

835
01:00:23,278 --> 01:00:29,888
one plus the max of X_(k minus
one), X_(n minus k) because the

836
01:00:29,888 --> 01:00:35,848
height of this tree is the max
of the heights of the two

837
01:00:35,848 --> 01:00:43,000
subtrees plus one because we
have one more level up top.

838
01:00:43,000 --> 01:00:46,728
OK, so that's the natural thing
to do.

839
01:00:46,728 --> 01:00:51,263
What we are trying to analyze,
though, is Y_n.

840
01:00:51,263 --> 01:00:55,193
So, for Y_n,
we have to take two to this

841
01:00:55,193 --> 01:00:58,720
power.
So, it's two times the max of

842
01:00:58,720 --> 01:01:03,961
two to the X_(k minus one),
which is Y_(k minus one),

843
01:01:03,961 --> 01:01:09,000
and two to this,
which is Y_(n minus k).

844
01:01:09,000 --> 01:01:12,536
And, now you start to see,
maybe, why we are interested in

845
01:01:12,536 --> 01:01:16,260
Y's instead of X's in the sense
that it's what we know how to

846
01:01:16,260 --> 01:01:18,059
do.
When we solve a recursion,

847
01:01:18,059 --> 01:01:20,541
when we solve,
like, the expected running

848
01:01:20,541 --> 01:01:22,713
time, we haven't taken
expectations,

849
01:01:22,713 --> 01:01:24,823
yet, here.
But, when we compute the

850
01:01:24,823 --> 01:01:28,050
expected running time of
quicksort, we have something

851
01:01:28,050 --> 01:01:30,656
like two times,
I mean, we have a couple of

852
01:01:30,656 --> 01:01:35,000
recursive subproblems,
which are being added together.

853
01:01:35,000 --> 01:01:37,015
OK, here, we have a factor of
two.

854
01:01:37,015 --> 01:01:39,276
Here, we have a max.
But, intuitively,

855
01:01:39,276 --> 01:01:43,002
we know how to multiply random
variables by a constant because

856
01:01:43,002 --> 01:01:45,079
that's, like,
there's two recursive

857
01:01:45,079 --> 01:01:48,500
subproblems of the size is equal
to the max of these two,

858
01:01:48,500 --> 01:01:50,576
which we don't happen to know
here.

859
01:01:50,576 --> 01:01:52,653
But, there it is,
whereas one plus,

860
01:01:52,653 --> 01:01:54,791
we don't know how to handle so
well.

861
01:01:54,791 --> 01:01:57,357
And, indeed,
our techniques are really good

862
01:01:57,357 --> 01:02:00,289
at solving recurrences,
except up to the constant

863
01:02:00,289 --> 01:02:03,355
factors.
And, this one plus really

864
01:02:03,355 --> 01:02:05,685
doesn't affect the constant
factor too much,

865
01:02:05,685 --> 01:02:07,745
it would seem.
OK, but it's a big deal.

866
01:02:07,745 --> 01:02:09,859
In exponentiation,
it's a factor of two.

867
01:02:09,859 --> 01:02:13,112
So here, it's really hard to
see what this one plus is doing.

868
01:02:13,112 --> 01:02:14,900
And, our analysis,
if we tried it,

869
01:02:14,900 --> 01:02:18,099
and it's a good idea to try it
at home and see what happens,

870
01:02:18,099 --> 01:02:20,700
if you tried to do what I'm
about to do with X_n,

871
01:02:20,700 --> 01:02:24,007
the one plus will sort of get
lost, and you won't get a bound.

872
01:02:24,007 --> 01:02:26,771
You just can't prove anything.
With a factor of two,

873
01:02:26,771 --> 01:02:29,319
we're in good shape.
We sort of know how to deal

874
01:02:29,319 --> 01:02:33,980
with that.
We'll say more when we've

875
01:02:33,980 --> 01:02:41,015
actually done the proof about
why we use Y_n instead of X_n.

876
01:02:41,015 --> 01:02:44,353
But for now,
we're using Y_n.

877
01:02:44,353 --> 01:02:49,480
So, this is sort of a
recursion, except it's

878
01:02:49,480 --> 01:02:56,038
conditioned on this event.
So, how do I turn this into a

879
01:02:56,038 --> 01:02:59,973
statement that holds all the
time?

880
01:02:59,973 --> 01:03:04,896
Sorry?
Divide by the probability of

881
01:03:04,896 --> 01:03:07,275
the event?
More or less.

882
01:03:07,275 --> 01:03:11,000
Indeed, these events are
independent.

883
01:03:11,000 --> 01:03:15,551
Or, they're all equally likely,
I should say.

884
01:03:15,551 --> 01:03:21,241
They're not independent.
In fact, one determines all the

885
01:03:21,241 --> 01:03:24,241
others.
So, how do I generally

886
01:03:24,241 --> 01:03:30,137
represent an event in algebra?
Indicator random variables:

887
01:03:30,137 --> 01:03:34,995
good.
Remember your friends,

888
01:03:34,995 --> 01:03:42,076
indicator random variables.
All of these analyses use

889
01:03:42,076 --> 01:03:49,565
indicator random variables.
So, they will just represent

890
01:03:49,565 --> 01:03:54,195
this event, and we'll call it
Z_nk.

891
01:03:54,195 --> 01:03:59,778
It's going to be one if the
root has rank,

892
01:03:59,778 --> 01:04:05,415
k, and zero otherwise.
So, in particular,

893
01:04:05,415 --> 01:04:09,110
the probability of,
these things are all equally

894
01:04:09,110 --> 01:04:13,828
likely for, a particular value
of n if you try all the values

895
01:04:13,828 --> 01:04:16,186
of k.
The probability that this

896
01:04:16,186 --> 01:04:20,746
equals one, which is also the
expectation of that indicator

897
01:04:20,746 --> 01:04:23,734
random variable,
which you should know,

898
01:04:23,734 --> 01:04:26,486
is it only takes values one or
zero.

899
01:04:26,486 --> 01:04:29,788
The zero doesn't matter in the
expectation.

900
01:04:29,788 --> 01:04:34,034
So, this is going to be,
hopefully, one over n if I got

901
01:04:34,034 --> 00:00:00,000
right.

902
01:04:36,000 --> 01:04:43,013
So, there are n possibility of
what the rank of the root could

903
01:04:43,013 --> 01:04:46,922
be.
Each of them are equally likely

904
01:04:46,922 --> 01:04:51,176
because we have a uniform
permutation.

905
01:04:51,176 --> 01:04:57,040
So, now, I can rewrite this
condition statement as a

906
01:04:57,040 --> 01:05:04,168
summation where the Z_nk's will
let me choose what case I'm in.

907
01:05:04,168 --> 01:05:10,836
So, we have Y_n is the sum,
k equals one to n of Z_nk times

908
01:05:10,836 --> 01:05:16,010
two times the max of X,
sorry, Y, k minus one,

909
01:05:16,010 --> 01:05:20,478
Y_n minus k.
So, now we have our good

910
01:05:20,478 --> 01:05:23,126
friend, the recurrence.
We need to solve it.

911
01:05:23,126 --> 01:05:26,329
OK, we can't really solve it
because this is a random

912
01:05:26,329 --> 01:05:29,963
variable, and it's talking about
recursive random variables.

913
01:05:29,963 --> 01:05:32,858
So, we first take the
expectation of both sides.

914
01:05:32,858 --> 01:05:36,000
That's the only thing we can
really bound.

915
01:05:36,000 --> 01:05:40,074
Y_n could be n^2 in an unlucky
case, sorry, not n^2.

916
01:05:40,074 --> 01:05:43,190
It could be n^2.
It could be two to the,

917
01:05:43,190 --> 01:05:47,903
boy, two to the n if you are
unlucky because X_n could be as

918
01:05:47,903 --> 01:05:50,460
big as n, the height of the
tree.

919
01:05:50,460 --> 01:05:54,694
And, Y_n is two to that.
So, it could be two to the n.

920
01:05:54,694 --> 01:05:58,688
What we want to prove is that
it's polynomial in n.

921
01:05:58,688 --> 01:06:02,203
If it's n to some constant,
and we take logs,

922
01:06:02,203 --> 01:06:07,341
it'll be order log n.
OK, so we'll take the

923
01:06:07,341 --> 01:06:14,254
expectation, and hopefully that
will guarantee that this holds.

924
01:06:14,254 --> 01:06:20,163
OK, so we have expectation of
this summation of random

925
01:06:20,163 --> 01:06:24,846
variables times recursive random
variables.

926
01:06:24,846 --> 01:06:30,198
So, what is the first,
woops, I forgot a bracket.

927
01:06:30,198 --> 01:06:37,000
What is the first thing that we
do in this analysis?

928
01:06:37,000 --> 01:06:41,300
This should,
yeah, linearity of expectation.

929
01:06:41,300 --> 01:06:45,900
That one's easy to remember.
OK, we have a sum.

930
01:06:45,900 --> 01:06:49,000
So, let's put the E inside.

931
01:07:04,000 --> 01:07:08,842
OK, now we have the expectation
of our product.

932
01:07:08,842 --> 01:07:12,210
What should we use?
Independence.

933
01:07:12,210 --> 01:07:15,684
Hopefully, things are
independent.

934
01:07:15,684 --> 01:07:21,052
And then, we could write this.
Then, it would be the

935
01:07:21,052 --> 01:07:26,842
expectation of the product.
And, heck, let's put the two

936
01:07:26,842 --> 01:07:34,000
outside, because it's not,
no sense in keeping it in here.

937
01:07:34,000 --> 01:07:37,956
Y is there starting to look
like X's?

938
01:07:37,956 --> 01:07:42,351
I can't even read them.
Sorry about that.

939
01:07:42,351 --> 01:07:46,417
This should all be Y's.
OK, very wise,

940
01:07:46,417 --> 01:07:48,615
random variables.
So.

941
01:07:48,615 --> 01:07:54,769
Why are these independent?
So, here we are looking at the

942
01:07:54,769 --> 01:08:00,703
choice of what the root is,
what rank the root has in a

943
01:08:00,703 --> 01:08:05,608
problem of size n.
In here, we're looking at what

944
01:08:05,608 --> 01:08:08,020
the root, I mean,
there are various choices of

945
01:08:08,020 --> 01:08:11,290
what the search tree looks like
in the stuff left of the root,

946
01:08:11,290 --> 01:08:13,112
and in the stuff right of the
root.

947
01:08:13,112 --> 01:08:16,220
Those are independent choices
because everything is uniform

948
01:08:16,220 --> 01:08:18,096
here.
So, the choice of this guy was

949
01:08:18,096 --> 01:08:20,081
uniform.
And then, that determines who

950
01:08:20,081 --> 01:08:22,011
partitions in the left and the
right.

951
01:08:22,011 --> 01:08:24,798
Those are completely
independent recursive choices of

952
01:08:24,798 --> 01:08:26,621
who's the root in the left
subtree?

953
01:08:26,621 --> 01:08:29,086
Who's the root in the left of
the left subtree,

954
01:08:29,086 --> 01:08:31,176
and so on?
So, this is a little trickier

955
01:08:31,176 --> 01:08:36,385
than usual.
Before, it was random choices

956
01:08:36,385 --> 01:08:41,871
in the algorithm.
Now, it's in some construction

957
01:08:41,871 --> 01:08:47,474
where we choose the random
numbers ahead of time.

958
01:08:47,474 --> 01:08:52,961
It's a bit funny,
but this is still independent.

959
01:08:52,961 --> 01:08:58,214
So, we get this just like we
did in quicksort,

960
01:08:58,214 --> 01:08:59,731
and so on.
OK.

961
01:08:59,731 --> 01:09:05,374
Now, we continue.
And, now it's time to be a bit

962
01:09:05,374 --> 01:09:08,143
sloppy.
Well, one of these things we

963
01:09:08,143 --> 01:09:09,568
know.
OK, E of ZNK,

964
01:09:09,568 --> 01:09:12,812
that, we wrote over here.
It's one over n.

965
01:09:12,812 --> 01:09:15,899
So, that's cool.
So, we get a two over n

966
01:09:15,899 --> 01:09:20,488
outside, and we get this sum of
the expectation of a max of

967
01:09:20,488 --> 01:09:23,812
these two things.
Normally, we would write,

968
01:09:23,812 --> 01:09:27,136
well, I think sometimes you
write T of max,

969
01:09:27,136 --> 01:09:30,143
or Y of the max of the two
things here.

970
01:09:30,143 --> 01:09:36,000
You've got to write it as the
max of these two variables.

971
01:09:36,000 --> 01:09:41,547
And, the trick,
I mean, it's not too much of a

972
01:09:41,547 --> 01:09:46,849
trick, is that the max is,
at most, the sum.

973
01:09:46,849 --> 01:09:53,506
So, we have nonnegative things.
So, we have two over n,

974
01:09:53,506 --> 01:10:00,657
sum k equals one to n of the
expectation of the sum instead

975
01:10:00,657 --> 01:10:03,943
of the max.
OK, this is,

976
01:10:03,943 --> 01:10:07,014
in some sense,
the key step where we are

977
01:10:07,014 --> 01:10:11,344
losing something in our bound.
So far, we've been exact.

978
01:10:11,344 --> 01:10:15,437
Now, we're being pretty sloppy.
It's true the max is,

979
01:10:15,437 --> 01:10:19,137
at most, the sum.
But, it's a pretty loose upper

980
01:10:19,137 --> 01:10:22,758
bound as things go.
We'll keep that in mind for

981
01:10:22,758 --> 01:10:25,434
later.
What else can we do with the

982
01:10:25,434 --> 01:10:27,166
summation?
This should,

983
01:10:27,166 --> 01:10:33,470
again, look familiar.
Now that we have a sum of a sum

984
01:10:33,470 --> 01:10:38,283
of two things,
I'm trying to like it to be a

985
01:10:38,283 --> 01:10:40,858
sum of one thing.
Sorry?

986
01:10:40,858 --> 01:10:45,559
You can use linearity of
expectation, good.

987
01:10:45,559 --> 01:10:49,813
So, that's the first thing I
should do.

988
01:10:49,813 --> 01:10:55,410
So, linearity of expectation
lets me separate that.

989
01:10:55,410 --> 01:11:02,079
Now I have a sum of 2n things.
Right, I could break that into

990
01:11:02,079 --> 01:11:05,405
the sum of these guys,
and the sum of these guys.

991
01:11:05,405 --> 01:11:08,247
Do you know anything about
those two sums?

992
01:11:08,247 --> 01:11:11,019
Do we know anything about those
two sums?

993
01:11:11,019 --> 01:11:14,068
They're the same.
In fact, every term here is

994
01:11:14,068 --> 01:11:17,326
appearing exactly twice.
One says a k minus one.

995
01:11:17,326 --> 01:11:20,722
One says an n minus k,
and that even works if it's

996
01:11:20,722 --> 01:11:22,455
odd, I think.
So, in fact,

997
01:11:22,455 --> 01:11:26,267
we can just take one of the
sums and multiply it by two.

998
01:11:26,267 --> 01:11:30,356
So, this is four over n times
the sum, and I'll rewrite it a

999
01:11:30,356 --> 01:11:35,000
little bit from zero to n minus
one of E of Y_k.

1000
01:11:35,000 --> 01:11:40,425
Just check the number of times
each Y_k appears from zero up to

1001
01:11:40,425 --> 01:11:45,237
n minus one is exactly two.
So, now I have a recurrence.

1002
01:11:45,237 --> 01:11:48,649
I have E of Y_n is,
at most, this thing.

1003
01:11:48,649 --> 01:11:51,800
Let's just write that for our
memory.

1004
01:11:51,800 --> 01:11:53,550
So, how's that?
Cool.

1005
01:11:53,550 --> 01:11:57,050
Now, I just have to solve the
recurrence.

1006
01:11:57,050 --> 01:12:03,000
How should I solve an ugly,
hairy, recurrence like this?

1007
01:12:03,000 --> 01:12:05,125
Substitution:
yea!

1008
01:12:05,125 --> 01:12:10,750
Not the master method.
OK, it's a pretty nasty

1009
01:12:10,750 --> 01:12:15,875
recurrence.
So, I'm going to make a guess,

1010
01:12:15,875 --> 01:12:22,125
and I've already told you the
guess, that it's n^3.

1011
01:12:22,125 --> 01:12:29,375
I think n^3 is pretty much
exactly where this proof will be

1012
01:12:29,375 --> 01:12:34,239
obtainable.
So, substitution method,

1013
01:12:34,239 --> 01:12:38,720
substitution method is just a
proof by induction.

1014
01:12:38,720 --> 01:12:44,506
And, there are two things every
proof by induction should have,

1015
01:12:44,506 --> 01:12:49,826
well, almost every proof by
induction, unless you're being

1016
01:12:49,826 --> 01:12:52,906
fancy.
It should have a base case,

1017
01:12:52,906 --> 01:12:57,013
and the base case here is n
equals order one.

1018
01:12:57,013 --> 01:13:00,093
I didn't write it,
but, of course,

1019
01:13:00,093 --> 01:13:05,318
if you have a constant size
tree, it has constant height.

1020
01:13:05,318 --> 01:13:10,640
So, this thing will be true as
long as we set true if c is

1021
01:13:10,640 --> 01:13:15,684
sufficiently large.
OK, so, don't forget that.

1022
01:13:15,684 --> 01:13:18,080
A lot of people forgot it on
the quiz.

1023
01:13:18,080 --> 01:13:20,089
We even mentioned the base
case.

1024
01:13:20,089 --> 01:13:22,939
Usually, we don't even mention
the base case.

1025
01:13:22,939 --> 01:13:25,854
And, you should assume that
there's one there.

1026
01:13:25,854 --> 01:13:30,000
And, you have to say this in
any proof by substitution.

1027
01:13:30,000 --> 01:13:33,107
OK, now, we have the induction
step.

1028
01:13:33,107 --> 01:13:37,279
So, I claim that E of Y_n is,
at most, Ccof n^3,

1029
01:13:37,279 --> 01:13:40,563
assuming that it's true for
smaller n.

1030
01:13:40,563 --> 01:13:44,647
You should write the induction
hypothesis here,

1031
01:13:44,647 --> 01:13:49,618
but I'm going to skip it
because I'm running out of time.

1032
01:13:49,618 --> 01:13:53,613
Now, we have this recurrence
that E of Y_n is,

1033
01:13:53,613 --> 01:13:56,809
at most, this thing.
So, E of Y_n is,

1034
01:13:56,809 --> 01:14:01,159
at most, four over n,
sum k equals zero to n minus

1035
01:14:01,159 --> 01:14:07,223
one of E of Y_k.
Now, notice that k is always

1036
01:14:07,223 --> 01:14:12,059
smaller than n.
So, we can apply induction.

1037
01:14:12,059 --> 01:14:15,858
So, this is,
at most, four over n,

1038
01:14:15,858 --> 01:14:21,269
sum k equals zero to n minus
one of c times k^3.

1039
01:14:21,269 --> 01:14:24,838
That's the induction
hypothesis.

1040
01:14:24,838 --> 01:14:28,753
Cool.
Now, I need an upper bound on

1041
01:14:28,753 --> 01:14:35,430
this sum, if you have a good
memory, then you know a closed

1042
01:14:35,430 --> 01:14:40,801
form for this sum.
But, I don't have such a good

1043
01:14:40,801 --> 01:14:43,970
memory as I used to.
I never memorized this sum when

1044
01:14:43,970 --> 01:14:47,884
I was a kid, so I don't remember
everything when I memorize when

1045
01:14:47,884 --> 01:14:51,612
I was less than 12 years old.
I still remember all the digits

1046
01:14:51,612 --> 01:14:54,532
of pi, whatever.
But, anything I try to memorize

1047
01:14:54,532 --> 01:14:57,079
now just doesn't quite stick the
same way.

1048
01:14:57,079 --> 01:15:00,000
So, I don't happen to know this
sum.

1049
01:15:00,000 --> 01:15:03,169
What's a good way to
approximate this sum?

1050
01:15:03,169 --> 01:15:05,256
Integral: good.
So, in fact,

1051
01:15:05,256 --> 01:15:07,653
I'm going to take the c
outside.

1052
01:15:07,653 --> 01:15:10,900
So, this is 4c over n.
The sum is, at most,

1053
01:15:10,900 --> 01:15:13,992
the integral.
If you get the range right,

1054
01:15:13,992 --> 01:15:18,089
so, you have to go one larger.
Instead of n minus one,

1055
01:15:18,089 --> 01:15:21,104
you go up to n.
This is in the textbook.

1056
01:15:21,104 --> 01:15:24,274
It's intuitive,
too, as long as you have a

1057
01:15:24,274 --> 01:15:26,516
monotone function.
That's key.

1058
01:15:26,516 --> 01:15:31,000
So, you have something that's
like this.

1059
01:15:31,000 --> 01:15:34,075
And, you know,
the sum is taking each of these

1060
01:15:34,075 --> 01:15:36,671
and weighting them with a value
of one.

1061
01:15:36,671 --> 01:15:40,157
The integral is computing the
area under this curve.

1062
01:15:40,157 --> 01:15:42,684
So, in particular,
if you look at this

1063
01:15:42,684 --> 01:15:45,624
approximation of the integral,
then, I mean,

1064
01:15:45,624 --> 01:15:49,382
this thing is certainly,
this would be the sum if you go

1065
01:15:49,382 --> 01:15:52,252
one larger at the end,
and that's, at most,

1066
01:15:52,252 --> 01:15:55,054
the integral.
So, that's proof by picture.

1067
01:15:55,054 --> 01:15:57,309
But, you can see this in the
book.

1068
01:15:57,309 --> 01:16:01,000
You should know it from 042 I
guess.

1069
01:16:01,000 --> 01:16:04,448
Now, integrals,
hopefully, you can solve.

1070
01:16:04,448 --> 01:16:07,206
Integral of x^3 is x^4 over
four.

1071
01:16:07,206 --> 01:16:11,172
I got it right.
And then, we're valuing that at

1072
01:16:11,172 --> 01:16:12,637
n.
And, it's zero.

1073
01:16:12,637 --> 01:16:17,293
Subtracting the zero doesn't
matter because zero to the

1074
01:16:17,293 --> 01:16:21,517
fourth power is zero.
So, it's just n^4 over four.

1075
01:16:21,517 --> 01:16:25,051
So, this is 4c over n times n^4
over four.

1076
01:16:25,051 --> 01:16:28,931
And, conveniently,
this four cancels with this

1077
01:16:28,931 --> 01:16:31,689
four.
The four turns into a three

1078
01:16:31,689 --> 01:16:36,000
because of this,
and we get n^3.

1079
01:16:36,000 --> 01:16:38,159
We get cn^3.
Damn convenient,

1080
01:16:38,159 --> 01:16:41,089
because that's what we wanted
to prove.

1081
01:16:41,089 --> 01:16:44,404
OK, so this proof is just
barely snaking by:

1082
01:16:44,404 --> 01:16:48,028
no residual term.
We've been sloppy all over the

1083
01:16:48,028 --> 01:16:50,727
place, and yet we were really
lucky.

1084
01:16:50,727 --> 01:16:54,120
And, we were just sloppy in the
right places.

1085
01:16:54,120 --> 01:16:56,510
So, this is a very tricky
proof.

1086
01:16:56,510 --> 01:17:01,214
If you just tried to do it by
hand, it's pretty easy to be too

1087
01:17:01,214 --> 01:17:04,452
sloppy, and not get quite the
right answer.

1088
01:17:04,452 --> 01:17:09,869
But, this just barely works.
So, let me say a couple of

1089
01:17:09,869 --> 01:17:12,890
things about it in my remaining
one minute.

1090
01:17:12,890 --> 01:17:15,407
So, we can do the conclusion,
again.

1091
01:17:15,407 --> 01:17:18,428
I won't write it because I
don't have time,

1092
01:17:18,428 --> 01:17:21,664
but here it is.
We just proved a bound on Y_n,

1093
01:17:21,664 --> 01:17:25,907
which was two to the power X_n.
What we cared about was X_n.

1094
01:17:25,907 --> 01:17:29,000
So, we used Jensen's
inequality.

1095
01:17:29,000 --> 01:17:32,350
We get the two to the E of X_n
is, at most, E of two to the

1096
01:17:32,350 --> 01:17:34,083
X_n.
This is what we know about

1097
01:17:34,083 --> 01:17:36,740
because that's Y_n.
So, we know E of Y_n is now

1098
01:17:36,740 --> 01:17:39,108
order n^3.
OK, we had to set this constant

1099
01:17:39,108 --> 01:17:41,187
sufficiently large for the base
case.

1100
01:17:41,187 --> 01:17:44,306
We didn't really figure out
what the constant was here.

1101
01:17:44,306 --> 01:17:47,599
It didn't matter because now
we're taking the logs of both

1102
01:17:47,599 --> 01:17:49,043
sides.
We get E of X_n is,

1103
01:17:49,043 --> 01:17:51,584
at most, log of order n^3.
This constant is a

1104
01:17:51,584 --> 01:17:54,241
multiplicative constant.
So, you take the logs.

1105
01:17:54,241 --> 01:17:57,072
It becomes additive.
This constant is an exponent.

1106
01:17:57,072 --> 01:18:01,000
So, it would take logs.
It becomes a multiple.

1107
01:18:01,000 --> 01:18:07,361
Three log n plus order one.
This is a pretty damn tight

1108
01:18:07,361 --> 01:18:13,486
bound on the height of a
randomly built binary search

1109
01:18:13,486 --> 01:18:18,081
tree, the expected height,
I should say.

1110
01:18:18,081 --> 01:18:23,617
In fact, the expected height of
X_n is equal to,

1111
01:18:23,617 --> 01:18:28,447
well, roughly,
I'll just say it's roughly,

1112
01:18:28,447 --> 01:18:34,925
I don't want to be too precise
here, 2.9882 times log n.

1113
01:18:34,925 --> 01:18:40,934
This is the result by a friend
of mine, Luke Devroy,

1114
01:18:40,934 --> 01:18:46,000
if I spell it right,
in 1986.

1115
01:18:46,000 --> 01:18:49,572
He's a professor at McGill
University in Montreal.

1116
01:18:49,572 --> 01:18:52,270
So, we're pretty close,
three to 2.98.

1117
01:18:52,270 --> 01:18:56,572
And, I won't prove this here.
The hard part here is actually

1118
01:18:56,572 --> 01:19:00,000
the lower bound,
but it's only that much.

1119
01:19:00,000 --> 01:19:04,273
I should say a little bit more
about why we use Y_n instead of

1120
01:19:04,273 --> 01:19:06,166
X_n.
And, it's all about the

1121
01:19:06,166 --> 01:19:08,268
sloppiness.
And, in particular,

1122
01:19:08,268 --> 01:19:12,193
this step, where we said that
the max of these two random

1123
01:19:12,193 --> 01:19:14,295
variables is,
at most, the sum.

1124
01:19:14,295 --> 01:19:18,359
And, while that's true for X
just as well as it is true for

1125
01:19:18,359 --> 01:19:21,653
Y, it's more true for Y.
OK, this is a bit weird

1126
01:19:21,653 --> 01:19:24,876
because, remember,
what we're analyzing here is

1127
01:19:24,876 --> 01:19:28,800
all possible values of k.
This has to work no matter what

1128
01:19:28,800 --> 01:19:32,234
k is, in some sense.
I mean, we're bounding all of

1129
01:19:32,234 --> 01:19:37,000
those cases simultaneously,
the sum of them all.

1130
01:19:37,000 --> 01:19:41,576
So, here we're looking at k
minus one versus n minus k.

1131
01:19:41,576 --> 01:19:44,881
And, in fact,
here, there's a polynomial

1132
01:19:44,881 --> 01:19:48,186
version.
But, so, if you take two values

1133
01:19:48,186 --> 01:19:51,576
a and b, and you say,
well, max of ab is,

1134
01:19:51,576 --> 01:19:55,728
at most, a plus b.
And, on the other hand you say,

1135
01:19:55,728 --> 01:19:59,541
well, max of two to the a and
two to the b is,

1136
01:19:59,541 --> 01:20:02,847
at most, two to the a plus two
to the b.

1137
01:20:02,847 --> 01:20:07,000
Doesn't this feel better than
that?

1138
01:20:07,000 --> 01:20:09,820
Well, they are,
of course, the same.

1139
01:20:09,820 --> 01:20:13,367
But, if you look at a minus b,
as that grows,

1140
01:20:13,367 --> 01:20:17,719
this becomes a tighter bound
faster than this becomes a

1141
01:20:17,719 --> 01:20:22,716
tighter bound because here we're
looking at absolute difference

1142
01:20:22,716 --> 01:20:26,504
between a minus b.
So, that's why this is pretty

1143
01:20:26,504 --> 01:20:31,259
good and this is pretty bad.
We're still really bad if a and

1144
01:20:31,259 --> 01:20:35,812
b are almost the same.
But, we're trying to solve this

1145
01:20:35,812 --> 01:20:38,677
for all partitions into k minus
one and n minus k.

1146
01:20:38,677 --> 01:20:42,127
So, it's OK if we get a few of
the cases wrong in the middle

1147
01:20:42,127 --> 01:20:45,284
where it evenly partitions.
But, as soon as we get some

1148
01:20:45,284 --> 01:20:49,026
skew, this will be very close to
this, whereas this will be still

1149
01:20:49,026 --> 01:20:52,066
pretty far from this.
You have to get pretty close to

1150
01:20:52,066 --> 01:20:54,580
the edge before you're not
losing much here,

1151
01:20:54,580 --> 01:20:57,504
whereas pretty quickly you're
not losing much here.

1152
01:20:57,504 --> 01:21:00,368
That's the intuition.
Try it, and see what happens

1153
01:21:00,368 --> 01:21:03,000
with X_n, and it won't work.
See you Wednesday.