1
00:00:00,090 --> 00:00:02,490
The following content is
provided under a Creative

2
00:00:02,490 --> 00:00:04,030
Commons license.

3
00:00:04,030 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,720
continue to offer high-quality
educational resources for free.

5
00:00:10,720 --> 00:00:13,320
To make a donation or
view additional materials

6
00:00:13,320 --> 00:00:17,280
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,280 --> 00:00:18,450
at ocw.mit.edu.

8
00:00:20,860 --> 00:00:21,860
ERIK DEMAINE: All right.

9
00:00:21,860 --> 00:00:26,480
Today, we resume our theme of
memory hierarchy efficient data

10
00:00:26,480 --> 00:00:27,680
structures.

11
00:00:27,680 --> 00:00:30,920
And last time, we saw
cache-oblivious b-trees,

12
00:00:30,920 --> 00:00:37,400
which achieve log base B
of N for all operations--

13
00:00:40,350 --> 00:00:42,410
insert, delete, search.

14
00:00:46,160 --> 00:00:48,110
And the cool part is
that we could do that

15
00:00:48,110 --> 00:00:51,770
without knowing what B was.

16
00:00:51,770 --> 00:00:55,160
And it was basically
a binary search tree

17
00:00:55,160 --> 00:00:58,310
stored in a funny order,
this van Emde Boas order,

18
00:00:58,310 --> 00:01:01,040
with an ordered file
on the bottom, which

19
00:01:01,040 --> 00:01:02,150
we left as a black box.

20
00:01:02,150 --> 00:01:03,830
And today, we're
going to see how

21
00:01:03,830 --> 00:01:05,810
to actually do ordered files--

22
00:01:05,810 --> 00:01:11,960
in log squared N, data
moves per insert and delete.

23
00:01:11,960 --> 00:01:14,060
And then as a little
diversion, we'll

24
00:01:14,060 --> 00:01:15,860
see a closer related
problem to this

25
00:01:15,860 --> 00:01:19,160
is called list labeling,
which we needed in Lecture 1

26
00:01:19,160 --> 00:01:22,100
and left it as a black
box for full persistence.

27
00:01:22,100 --> 00:01:26,030
We had this version tree
with full persistence,

28
00:01:26,030 --> 00:01:30,140
and we needed to linearize
that version tree

29
00:01:30,140 --> 00:01:32,810
into a bunch of numbers
so that we could then

30
00:01:32,810 --> 00:01:35,824
compare whether one version
was an ancestor of another.

31
00:01:35,824 --> 00:01:38,240
And for that, we needed to be
able to store a linked list,

32
00:01:38,240 --> 00:01:40,400
and insert and delete
in the linked list,

33
00:01:40,400 --> 00:01:44,930
and be able to query, who is
this node in the linked list,

34
00:01:44,930 --> 00:01:47,690
precede this node
in the linked list

35
00:01:47,690 --> 00:01:49,035
in constant time per operation.

36
00:01:49,035 --> 00:01:54,109
So we'll also do that
today because it's time.

37
00:01:54,109 --> 00:01:56,150
And then we're going to
do a completely different

38
00:01:56,150 --> 00:01:58,070
cache-oblivious data structure.

39
00:01:58,070 --> 00:02:01,670
That's interesting, mainly in
the way that it adapts to M,

40
00:02:01,670 --> 00:02:06,350
not just B. So remember, B was
the size of a memory block.

41
00:02:06,350 --> 00:02:07,850
When we fetch
something from memory,

42
00:02:07,850 --> 00:02:12,050
we get the entire block of size
B. M was the size of the cache.

43
00:02:12,050 --> 00:02:14,810
And so there were M over B
blocks in the cache of size

44
00:02:14,810 --> 00:02:20,180
B. So that's what
we'll do today.

45
00:02:20,180 --> 00:02:24,395
I'm also going to need a claim--

46
00:02:27,332 --> 00:02:29,710
which we won't prove here--

47
00:02:29,710 --> 00:02:33,270
that you can sort
cache-obliviously in N

48
00:02:33,270 --> 00:02:36,920
over B log base N
over B of N over B.

49
00:02:36,920 --> 00:02:39,120
So I'm going to use this
as a black box today.

50
00:02:39,120 --> 00:02:41,203
And we're not going to
fill it in because it's not

51
00:02:41,203 --> 00:02:43,480
a data structure, and it's
a data structures class.

52
00:02:43,480 --> 00:02:46,340
To give you some feeling for
why this is the right bound

53
00:02:46,340 --> 00:02:49,970
for sorting, if
you know M and B,

54
00:02:49,970 --> 00:02:55,040
then the answer is M
over B way mergesort.

55
00:03:01,480 --> 00:03:04,290
So you all know
binary mergesort,

56
00:03:04,290 --> 00:03:06,060
where you split into two parts.

57
00:03:06,060 --> 00:03:08,160
If you split into M
over B parts and then do

58
00:03:08,160 --> 00:03:10,950
an M over B way merge,
that's exactly what

59
00:03:10,950 --> 00:03:12,390
a cache can handle.

60
00:03:12,390 --> 00:03:15,320
It can read one block
from each of the lists

61
00:03:15,320 --> 00:03:17,470
that it's trying to merge.

62
00:03:17,470 --> 00:03:21,000
It has just enough
cache blocks for that.

63
00:03:21,000 --> 00:03:24,210
And then you do the merge
block by block, load new blocks

64
00:03:24,210 --> 00:03:25,166
as necessary.

65
00:03:25,166 --> 00:03:26,790
That will give you
a linear time merge.

66
00:03:26,790 --> 00:03:29,850
And so you'll get N over
B times log base M over B.

67
00:03:29,850 --> 00:03:32,460
And it turns out the right
thing in here is N over B or N

68
00:03:32,460 --> 00:03:35,520
is basically the same,
because it's inside the log.

69
00:03:35,520 --> 00:03:37,530
It's not a big deal.

70
00:03:37,530 --> 00:03:40,090
So external memory wise,
that's how you do it.

71
00:03:40,090 --> 00:03:42,450
You can do this
cache-obliviously in a similar

72
00:03:42,450 --> 00:03:43,125
way to--

73
00:03:45,930 --> 00:03:47,850
roughly speaking, in a
similar way to the way

74
00:03:47,850 --> 00:03:50,070
we do b-trees,
where you're binary

75
00:03:50,070 --> 00:03:54,060
searching in the
number of ways you

76
00:03:54,060 --> 00:03:57,401
should divide your array into.

77
00:03:57,401 --> 00:03:59,150
I'm not going to get
into details on that.

78
00:03:59,150 --> 00:04:01,860
We'll focus on
cache-oblivious priority

79
00:04:01,860 --> 00:04:06,131
queues, which do a similar kind
of thing, but get it a dynamic.

80
00:04:06,131 --> 00:04:06,630
All right.

81
00:04:06,630 --> 00:04:10,740
But before we go there, let's
do ordered file maintenance.

82
00:04:20,070 --> 00:04:23,640
So let me first remind
you of the problem.

83
00:04:23,640 --> 00:04:32,520
We want to store N
items in a file, which

84
00:04:32,520 --> 00:04:41,040
think of as an array, size
order N. This constant's

85
00:04:41,040 --> 00:04:46,170
bigger than 1, with
constant-sized gaps.

86
00:04:53,970 --> 00:05:06,030
And then-- I should
say in specified order,

87
00:05:06,030 --> 00:05:09,330
subject to inserting and
deleting items in that order.

88
00:05:29,420 --> 00:05:34,240
So this was the picture.

89
00:05:34,240 --> 00:05:35,070
We have an array.

90
00:05:37,820 --> 00:05:42,680
We get to store some
objects in the array

91
00:05:42,680 --> 00:05:45,180
and have these blank
cells in between.

92
00:05:45,180 --> 00:05:47,040
But each of these gaps
has constant size.

93
00:05:52,960 --> 00:05:55,420
Maybe these data items
are sorted, maybe not.

94
00:05:58,540 --> 00:06:00,310
And then we're able
to say things like,

95
00:06:00,310 --> 00:06:05,550
OK, insert a new
item 8 right after 7.

96
00:06:05,550 --> 00:06:07,540
And so then you'd
like to do that.

97
00:06:07,540 --> 00:06:12,910
Then you'd also, then, like to
say OK, now insert new item 9,

98
00:06:12,910 --> 00:06:14,530
here.

99
00:06:14,530 --> 00:06:17,500
And then this guy will
maybe get shifted over.

100
00:06:17,500 --> 00:06:19,570
So 12 is over here.

101
00:06:19,570 --> 00:06:22,257
This becomes blank, and then
you can fit the 9, and so on.

102
00:06:22,257 --> 00:06:24,340
You want to be able to do
insertions and deletions

103
00:06:24,340 --> 00:06:25,810
like that quickly.

104
00:06:33,970 --> 00:06:35,890
And quickly, here,
means whenever

105
00:06:35,890 --> 00:06:38,440
we do an insert or delete,
we're going to rearrange items

106
00:06:38,440 --> 00:06:39,180
in an interval.

107
00:06:42,530 --> 00:06:46,360
And that interval is
going to be small--

108
00:06:46,360 --> 00:06:50,060
log squared N amortized.

109
00:06:55,542 --> 00:06:57,300
That's all I need to say here.

110
00:07:00,689 --> 00:07:02,480
I guess we also want
to say that when we're

111
00:07:02,480 --> 00:07:03,938
moving these items
in the interval,

112
00:07:03,938 --> 00:07:06,230
we can do it efficiently
cache-obliviously, because we

113
00:07:06,230 --> 00:07:09,800
really want log
squared N divided by B.

114
00:07:09,800 --> 00:07:19,810
And we say that via constant
number of interleaved scans.

115
00:07:19,810 --> 00:07:23,000
Scans, we know, as long as
there's a number of them

116
00:07:23,000 --> 00:07:26,319
and your cache has at least a
constant size number of blocks,

117
00:07:26,319 --> 00:07:28,610
then interleave scans are
always going to be efficient.

118
00:07:28,610 --> 00:07:32,150
You always get to divide
by B. But the focus

119
00:07:32,150 --> 00:07:34,370
will be on making sure
the interval is small.

120
00:07:34,370 --> 00:07:37,040
The rearrangement will actually
be very simple, so not too hard

121
00:07:37,040 --> 00:07:37,540
to do.

122
00:07:39,920 --> 00:07:49,310
So this will give us log squared
N over B amortized memory

123
00:07:49,310 --> 00:07:51,080
transfers.

124
00:07:51,080 --> 00:07:53,030
So that was the
black box we needed

125
00:07:53,030 --> 00:07:56,120
to get cache-oblivious b-trees.

126
00:07:56,120 --> 00:07:58,100
Remember, we got rid of
the square in the log

127
00:07:58,100 --> 00:08:01,310
by using a level of indirection
that removed one of the logs.

128
00:08:01,310 --> 00:08:03,260
So we got log N
over B. So we were

129
00:08:03,260 --> 00:08:05,000
dominated by log
base B of N, which

130
00:08:05,000 --> 00:08:09,240
is what we had for
the search over here.

131
00:08:09,240 --> 00:08:10,670
So this is the step we need.

132
00:08:10,670 --> 00:08:12,517
And this is a general
tool used in a bunch

133
00:08:12,517 --> 00:08:14,600
of different cache-oblivious
data structures, sort

134
00:08:14,600 --> 00:08:16,910
of one of the first
cache-oblivious data structure

135
00:08:16,910 --> 00:08:18,690
tools.

136
00:08:18,690 --> 00:08:21,090
It's pretty handy.

137
00:08:21,090 --> 00:08:23,170
It's actually much older
than cache-oblivious

138
00:08:23,170 --> 00:08:25,200
or external memory models.

139
00:08:25,200 --> 00:08:28,790
This results-- removing the
last line and this part,

140
00:08:28,790 --> 00:08:30,920
which makes it
efficient in this model.

141
00:08:30,920 --> 00:08:34,159
Just thinking about moving
around intervals in a file

142
00:08:34,159 --> 00:08:38,360
goes back to Itai,
Konheim, and Rodeh in 1981.

143
00:08:38,360 --> 00:08:40,070
So it's pretty old.

144
00:08:40,070 --> 00:08:44,430
And then it was brought to the
cache-oblivious world in 2000,

145
00:08:44,430 --> 00:08:48,740
right when this model
was getting started.

146
00:08:48,740 --> 00:08:52,550
So that's the goal.

147
00:08:52,550 --> 00:08:56,730
Now let me tell you how
this is going to work.

148
00:08:56,730 --> 00:08:58,980
So a rough idea is very simple.

149
00:08:58,980 --> 00:09:01,370
You have your array.

150
00:09:01,370 --> 00:09:05,550
And when you insert an
item, what we want to do

151
00:09:05,550 --> 00:09:09,330
is find an interval
containing that item

152
00:09:09,330 --> 00:09:12,860
of some reasonable
size that's not

153
00:09:12,860 --> 00:09:15,800
too full and not too sparse.

154
00:09:15,800 --> 00:09:18,089
If we can find--

155
00:09:18,089 --> 00:09:19,880
so like right here,
when we're inserting 9,

156
00:09:19,880 --> 00:09:21,525
it looks really bad
right around there.

157
00:09:21,525 --> 00:09:23,150
And so there's, like,
too many elements

158
00:09:23,150 --> 00:09:24,524
packed right around
that element.

159
00:09:24,524 --> 00:09:25,940
And that feels bad to us.

160
00:09:25,940 --> 00:09:29,120
So we grow an interval around
it until we've got enough gaps.

161
00:09:29,120 --> 00:09:31,400
And then we just evenly
redistribute the items

162
00:09:31,400 --> 00:09:33,669
in that interval.

163
00:09:33,669 --> 00:09:35,460
So that's basically
what we're going to do.

164
00:09:35,460 --> 00:09:37,550
We just have to
find the right size

165
00:09:37,550 --> 00:09:39,510
interval to rearrange items in.

166
00:09:39,510 --> 00:09:41,060
Then when we do
the rearrangement,

167
00:09:41,060 --> 00:09:43,610
it's always going to be
evenly redistributing

168
00:09:43,610 --> 00:09:45,380
within the interval.

169
00:09:45,380 --> 00:09:46,640
So that strategy is simple.

170
00:09:46,640 --> 00:09:49,760
And to think about intervals
in a nice controlled way,

171
00:09:49,760 --> 00:09:52,620
we're going to build
a binary tree--

172
00:09:52,620 --> 00:09:54,020
our good friend.

173
00:09:54,020 --> 00:10:02,510
So let me just draw
this binary tree.

174
00:10:07,550 --> 00:10:09,320
Now I need to do
something a little bit

175
00:10:09,320 --> 00:10:12,470
special at the leaves.

176
00:10:12,470 --> 00:10:17,750
I'm going to cluster
together log N items.

177
00:10:17,750 --> 00:10:20,870
So down here is the array,
and all of this stuff

178
00:10:20,870 --> 00:10:22,250
up here is conceptual.

179
00:10:22,250 --> 00:10:23,580
We don't really build it.

180
00:10:28,370 --> 00:10:30,770
In my lecture notes, I
can just copy and paste

181
00:10:30,770 --> 00:10:33,960
and this is a lot easier.

182
00:10:33,960 --> 00:10:37,565
So we have these chunks of
size theta log N at the bottom.

183
00:10:41,610 --> 00:10:45,050
I don't really care
what the constant is.

184
00:10:45,050 --> 00:10:46,250
1 is probably fine.

185
00:10:54,840 --> 00:10:56,640
So this is the array
down here, and we're

186
00:10:56,640 --> 00:11:03,810
splitting every log N items,
or log N cells in the array.

187
00:11:03,810 --> 00:11:06,520
And then we say, OK,
well conceptually build

188
00:11:06,520 --> 00:11:07,770
a binary structure tree, here.

189
00:11:07,770 --> 00:11:11,100
And then this node
represents this interval.

190
00:11:11,100 --> 00:11:14,930
And this node represents
this interval.

191
00:11:14,930 --> 00:11:17,730
Every node just represents the
interval of all its descendant

192
00:11:17,730 --> 00:11:19,080
leaves.

193
00:11:19,080 --> 00:11:23,040
We've seen this trick
over and over again.

194
00:11:23,040 --> 00:11:25,030
But we're not going to
build any data structure

195
00:11:25,030 --> 00:11:27,930
or have some augmentation
for each of these nodes.

196
00:11:27,930 --> 00:11:30,650
This is how we're going
to build the intervals.

197
00:11:30,650 --> 00:11:32,025
We're going to
start at the leaf.

198
00:11:32,025 --> 00:11:34,740
Let's say we want to
insert an item in here.

199
00:11:34,740 --> 00:11:35,650
So we insert it here.

200
00:11:35,650 --> 00:11:38,155
If there's not room for it,
we're going to walk up the tree

201
00:11:38,155 --> 00:11:41,610
and say, OK, if this
interval is too dense,

202
00:11:41,610 --> 00:11:44,580
I'll look at this node and
its corresponding interval

203
00:11:44,580 --> 00:11:45,840
here to here.

204
00:11:45,840 --> 00:11:49,170
If that's still too dense,
I'll walk up to the parent,

205
00:11:49,170 --> 00:11:52,020
and so look at this
interval from here to here,

206
00:11:52,020 --> 00:11:55,320
and so on, until I find
that in the end, at most,

207
00:11:55,320 --> 00:11:59,280
I redistribute the entire array.

208
00:11:59,280 --> 00:12:02,610
And when I do that, I'll
just evenly redistribute.

209
00:12:02,610 --> 00:12:09,340
Let me write down the
algorithm for update.

210
00:12:09,340 --> 00:12:14,330
So for insert or
delete, same algorithm,

211
00:12:14,330 --> 00:12:21,275
you update the leaf log N chunk.

212
00:12:25,220 --> 00:12:28,990
You can do that just by
rewriting the entire chunk.

213
00:12:28,990 --> 00:12:30,740
We're trying to get a
log squared N bound,

214
00:12:30,740 --> 00:12:32,698
so we can afford to
rewrite it interval as size

215
00:12:32,698 --> 00:12:35,270
log N. So that's for free.

216
00:12:38,270 --> 00:12:41,210
So whatever leaf
contains the element

217
00:12:41,210 --> 00:12:42,560
you want to insert or delete.

218
00:12:47,480 --> 00:12:50,510
And then we're going to
walk up the tree until we

219
00:12:50,510 --> 00:12:51,965
find a suitable interval.

220
00:13:03,640 --> 00:13:10,470
And we're going to call
that node, or that interval,

221
00:13:10,470 --> 00:13:11,610
within threshold.

222
00:13:20,550 --> 00:13:22,650
So let me define
within threshold.

223
00:13:22,650 --> 00:13:27,300
We're going to look at
the density of a node,

224
00:13:27,300 --> 00:13:30,000
or an interval.

225
00:13:30,000 --> 00:13:33,840
And that's just going to be the
ratio of the number of elements

226
00:13:33,840 --> 00:13:37,524
that are actually down there
versus the amount of slots

227
00:13:37,524 --> 00:13:38,940
in the array that
are down there--

228
00:13:42,052 --> 00:13:43,260
so just how much is occupied.

229
00:14:00,794 --> 00:14:01,710
So look at that ratio.

230
00:14:01,710 --> 00:14:04,180
If it's 100%, then there are
no blank cells down there.

231
00:14:04,180 --> 00:14:08,150
If it's 0%, then
everybody is blank.

232
00:14:08,150 --> 00:14:09,900
So we don't want either
of those extremes.

233
00:14:09,900 --> 00:14:11,670
We want something in between.

234
00:14:11,670 --> 00:14:14,010
And we're going to do that
by specifying thresholds

235
00:14:14,010 --> 00:14:16,230
on this density and
try to keep the density

236
00:14:16,230 --> 00:14:18,710
within those thresholds.

237
00:14:18,710 --> 00:14:21,604
Let me let me define
those thresholds.

238
00:14:30,800 --> 00:14:35,870
The fun part is that the density
thresholds that you maintain

239
00:14:35,870 --> 00:14:37,970
depend on which level you are.

240
00:14:41,120 --> 00:14:45,530
Not, like, experience points,
but in which height of the tree

241
00:14:45,530 --> 00:14:46,190
you are.

242
00:14:46,190 --> 00:14:49,550
So down here, we don't really
care how well-distributed

243
00:14:49,550 --> 00:14:50,570
the leaves are.

244
00:14:50,570 --> 00:14:53,300
I mean, it can't
be 0% because then

245
00:14:53,300 --> 00:14:54,920
that would be a really big gap.

246
00:14:54,920 --> 00:14:57,620
But it could be say
between 50% and 100%.

247
00:14:57,620 --> 00:14:59,300
It could be totally full.

248
00:14:59,300 --> 00:15:02,150
And then once it's overflowing,
then we've got to go up.

249
00:15:02,150 --> 00:15:06,940
And the higher we go,
the stricter we get--

250
00:15:06,940 --> 00:15:12,570
hopefully, yes-- strictest
at the top of the tree.

251
00:15:12,570 --> 00:15:24,800
So in general, if we have
a node of the depth, d,

252
00:15:24,800 --> 00:15:32,810
then we want the density
to be at least 1/2

253
00:15:32,810 --> 00:15:36,530
minus 1/4 d over h.

254
00:15:39,590 --> 00:15:52,690
And we want the density to be at
most 3/4 plus 1/4 d over it h.

255
00:15:52,690 --> 00:15:59,165
So h, here, is the height
of this tree, which

256
00:15:59,165 --> 00:16:02,410
is going to be something
like log N minus log-log N.

257
00:16:02,410 --> 00:16:04,230
But it doesn't really matter.

258
00:16:04,230 --> 00:16:06,620
This is depth 0,
depth 1, depth h.

259
00:16:11,850 --> 00:16:15,840
We just are linearly
interpolating between--

260
00:16:15,840 --> 00:16:19,905
let's see, this is always
between 1/4 and 1/2--

261
00:16:22,790 --> 00:16:26,330
1/2 when this is 0,
1/4 when this is h.

262
00:16:26,330 --> 00:16:28,790
So you get a 1/2 minus 1/4.

263
00:16:28,790 --> 00:16:35,960
And this one is always
in the range 3/4 to 1.

264
00:16:35,960 --> 00:16:40,310
It's 3/4 when this is
0, and 1 when this is h.

265
00:16:40,310 --> 00:16:42,622
So at the bottom--

266
00:16:42,622 --> 00:16:43,580
I was a little bit off.

267
00:16:43,580 --> 00:16:49,400
At the bottom, the leaf
level, when these are both h--

268
00:16:49,400 --> 00:16:53,600
the density has to be at
least a 1/4 and at most, 100%.

269
00:16:53,600 --> 00:16:55,430
And then at the root,
it's going to have

270
00:16:55,430 --> 00:17:00,410
to be between 1/2 and 3/4.

271
00:17:00,410 --> 00:17:03,110
So it's a narrower range.

272
00:17:03,110 --> 00:17:06,260
And the higher you go up,
the more narrow the range

273
00:17:06,260 --> 00:17:08,459
on the density gets.

274
00:17:08,459 --> 00:17:11,000
And we do it just sort of in
the obvious linear interpolation

275
00:17:11,000 --> 00:17:12,530
way.

276
00:17:12,530 --> 00:17:15,609
The not so obvious thing is that
this is the right way to do it.

277
00:17:15,609 --> 00:17:17,900
There's a lot of choices
for how to set these density

278
00:17:17,900 --> 00:17:18,550
thresholds.

279
00:17:18,550 --> 00:17:22,010
But we have to basically
maintain constant density

280
00:17:22,010 --> 00:17:24,170
everywhere because
we're trying to maintain

281
00:17:24,170 --> 00:17:25,859
gaps of constant size.

282
00:17:25,859 --> 00:17:28,334
So we don't have a
lot of flexibility.

283
00:17:28,334 --> 00:17:29,750
But it turns out,
this flexibility

284
00:17:29,750 --> 00:17:32,630
between two constants,
like 1/4 and 1/2

285
00:17:32,630 --> 00:17:37,020
is enough to give us
the performance we need.

286
00:17:37,020 --> 00:17:38,780
So let's see why.

287
00:17:41,662 --> 00:17:42,870
Let me finish this algorithm.

288
00:17:42,870 --> 00:17:45,680
We walk up the tree
until reaching a node

289
00:17:45,680 --> 00:17:46,670
within threshold.

290
00:17:46,670 --> 00:17:48,260
Density is this.

291
00:17:48,260 --> 00:17:49,780
Density threshold is this.

292
00:17:49,780 --> 00:17:51,830
So now we know within
threshold means.

293
00:17:51,830 --> 00:17:59,840
And then we evenly
rebalance or redistribute

294
00:17:59,840 --> 00:18:08,270
all the descendant
elements in that interval

295
00:18:08,270 --> 00:18:09,410
that is within threshold.

296
00:18:18,629 --> 00:18:20,170
So what you need to
check is that you

297
00:18:20,170 --> 00:18:22,420
can do this with a
constant number of scans.

298
00:18:22,420 --> 00:18:24,220
It's not that hard.

299
00:18:24,220 --> 00:18:25,570
Just read the elements in order.

300
00:18:25,570 --> 00:18:27,109
Write them out to
a temporary array,

301
00:18:27,109 --> 00:18:28,150
and then write them back.

302
00:18:28,150 --> 00:18:30,980
Or if you're fancy,
you can do it in place.

303
00:18:30,980 --> 00:18:33,430
But you can just do it by
a constant number of scans

304
00:18:33,430 --> 00:18:34,180
through the array.

305
00:18:37,390 --> 00:18:41,920
Just compute what should
be the average gap

306
00:18:41,920 --> 00:18:43,230
between the elements.

307
00:18:43,230 --> 00:18:44,230
Leave that many gaps.

308
00:18:47,070 --> 00:18:50,880
So the algorithm
is pretty simple

309
00:18:50,880 --> 00:18:54,090
once you say, OK, I'm
going to grow intervals.

310
00:18:54,090 --> 00:18:56,776
Then maybe, you think OK,
I guess I'll grow intervals

311
00:18:56,776 --> 00:18:57,900
according to a binary tree.

312
00:18:57,900 --> 00:18:59,316
It's a little bit
more controlled.

313
00:18:59,316 --> 00:19:01,420
Probably don't have
to do it this way.

314
00:19:01,420 --> 00:19:04,800
You could just grow
them by a factor of 2,

315
00:19:04,800 --> 00:19:06,090
just around your point.

316
00:19:06,090 --> 00:19:10,121
But it's easier to analyze in
the setting of a binary tree.

317
00:19:10,121 --> 00:19:12,120
And then once you're doing
that, the tricky part

318
00:19:12,120 --> 00:19:13,600
is to set the
density thresholds.

319
00:19:13,600 --> 00:19:15,210
But you fool around
and this seems

320
00:19:15,210 --> 00:19:18,870
to be the best way to do it.

321
00:19:18,870 --> 00:19:20,920
Now the question is,
why does this work.

322
00:19:20,920 --> 00:19:27,310
How do we prove log squared
amortized interval size when we

323
00:19:27,310 --> 00:19:28,560
follow these dense thresholds?

324
00:19:28,560 --> 00:19:30,990
Notice that we're
not keeping intervals

325
00:19:30,990 --> 00:19:34,350
within density at all times.

326
00:19:34,350 --> 00:19:38,100
I mean, the whole problem
is that things are not

327
00:19:38,100 --> 00:19:40,220
within threshold
right at the start.

328
00:19:40,220 --> 00:19:44,340
And we have to walk up the tree
quite a ways, potentially--

329
00:19:44,340 --> 00:19:48,480
claim is only about log-log
N levels up the tree--

330
00:19:48,480 --> 00:19:50,160
to find something
that's within density.

331
00:19:50,160 --> 00:19:51,570
And then we can redistribute.

332
00:19:51,570 --> 00:19:55,470
And then we fix
everything below us.

333
00:19:55,470 --> 00:19:55,970
All right.

334
00:19:55,970 --> 00:19:57,450
Well, let's get to that.

335
00:20:11,864 --> 00:20:13,405
This is really the
direction we want.

336
00:20:13,405 --> 00:20:15,460
The thresholds are getting
tighter and tighter,

337
00:20:15,460 --> 00:20:16,950
more constrained as we go up.

338
00:20:16,950 --> 00:20:19,590
Because it means if
we walk up a lot,

339
00:20:19,590 --> 00:20:23,640
we, essentially, can pay for it
because we bring that interval

340
00:20:23,640 --> 00:20:27,500
even farther within threshold.

341
00:20:27,500 --> 00:20:33,400
So we have some node,
which is within threshold.

342
00:20:33,400 --> 00:20:35,640
So we bring it into the
density thresholds of here.

343
00:20:35,640 --> 00:20:38,280
If we look at the
children of that node,

344
00:20:38,280 --> 00:20:42,930
their density
thresholds are smaller--

345
00:20:42,930 --> 00:20:44,460
sorry, are more relaxed.

346
00:20:44,460 --> 00:20:46,350
So if we bring this
node into threshold

347
00:20:46,350 --> 00:20:48,450
by rewriting all the
leaves down here,

348
00:20:48,450 --> 00:20:50,820
these nodes will not
only be within threshold,

349
00:20:50,820 --> 00:20:54,090
they'll be far within threshold.

350
00:20:54,090 --> 00:20:58,660
If you look at their ratios,
you know, their densities--

351
00:20:58,660 --> 00:21:01,714
the number of elements in there,
divided by the array slots.

352
00:21:01,714 --> 00:21:03,130
It's going to be
exactly the same.

353
00:21:03,130 --> 00:21:07,050
The density is equal
because we're uniformly

354
00:21:07,050 --> 00:21:08,226
distributing the items here.

355
00:21:08,226 --> 00:21:09,600
And there's some
rounding errors,

356
00:21:09,600 --> 00:21:12,070
but other than rounding.

357
00:21:12,070 --> 00:21:14,670
And that's actually why we have
these leaves as size theta log

358
00:21:14,670 --> 00:21:16,350
N, so the rounding
doesn't bite us.

359
00:21:18,990 --> 00:21:21,450
We're evenly redistributing
so the density is equal

360
00:21:21,450 --> 00:21:22,550
everywhere.

361
00:21:22,550 --> 00:21:25,320
Left child had the same
density as the parent.

362
00:21:25,320 --> 00:21:28,890
But if you look at the density
thresholds of the child,

363
00:21:28,890 --> 00:21:31,270
they will be more relaxed
compared to the parent.

364
00:21:31,270 --> 00:21:32,850
So if the parent is
within threshold,

365
00:21:32,850 --> 00:21:35,460
the child will be
far within threshold

366
00:21:35,460 --> 00:21:40,790
by at least a d over
h additive amount.

367
00:21:40,790 --> 00:21:43,860
Sorry, 1 over h, because
their depths differ by 1.

368
00:21:43,860 --> 00:21:45,960
If this is d, this
would be d plus 1.

369
00:21:51,480 --> 00:22:08,400
When we rebalance a
node, we put the children

370
00:22:08,400 --> 00:22:09,900
far within threshold.

371
00:22:15,610 --> 00:22:22,380
Meaning, if we look at
the absolute difference

372
00:22:22,380 --> 00:22:26,280
between the density and
either the upper threshold

373
00:22:26,280 --> 00:22:35,580
or the lower threshold, that
will be, I guess, at least 1

374
00:22:35,580 --> 00:22:43,960
over 4h because we're
increasing d by 1.

375
00:22:43,960 --> 00:22:49,470
And it's 1 over for 4h
for each step we take.

376
00:22:49,470 --> 00:22:53,460
OK so the children
are extra happy.

377
00:22:53,460 --> 00:22:55,890
We walked up here
because before--

378
00:22:55,890 --> 00:22:57,510
let's say we walked
up this path.

379
00:22:57,510 --> 00:23:00,150
So we walked from
the right child.

380
00:23:00,150 --> 00:23:03,390
We didn't stop here,
which means this node was

381
00:23:03,390 --> 00:23:06,420
was beyond threshold.

382
00:23:06,420 --> 00:23:13,020
But now we walked up and now
we fixed this entire interval.

383
00:23:13,020 --> 00:23:14,700
And now it's far
within threshold.

384
00:23:14,700 --> 00:23:18,360
So before, you know, the
density minus the threshold

385
00:23:18,360 --> 00:23:20,610
went the wrong way,
had the wrong sign.

386
00:23:20,610 --> 00:23:24,370
Now we're good, and we're
good by at least a 1 over 4h.

387
00:23:24,370 --> 00:23:27,420
Now h, here, was the
height of the tree.

388
00:23:27,420 --> 00:23:32,940
It's log N minus log-log N. All
we need is that this is theta

389
00:23:32,940 --> 00:23:35,180
log N--

390
00:23:35,180 --> 00:23:44,220
sorry, theta 1 over log
N. h is theta log N.

391
00:23:44,220 --> 00:23:47,930
And this is a ratio--

392
00:23:47,930 --> 00:23:50,330
1 over log N--

393
00:23:50,330 --> 00:23:53,979
of the number of items
versus the number of slots.

394
00:23:53,979 --> 00:23:55,520
But we know the
number of slots we're

395
00:23:55,520 --> 00:24:02,030
dealing with is theta log N. And
so this is at least one item.

396
00:24:02,030 --> 00:24:07,340
This log N is designed
to balance the h, here.

397
00:24:07,340 --> 00:24:08,310
OK, cool.

398
00:24:11,260 --> 00:24:12,790
Let's go over here.

399
00:24:19,270 --> 00:24:21,870
So the idea is if we're
far within threshold,

400
00:24:21,870 --> 00:24:23,714
we can charge to those items.

401
00:24:23,714 --> 00:24:24,380
That's our goal.

402
00:24:50,950 --> 00:24:52,540
What we're interested
in-- if we just

403
00:24:52,540 --> 00:24:54,820
rebalanced this node, say x.

404
00:24:54,820 --> 00:24:58,390
We want to know when is the
next time x can be rebalanced?

405
00:24:58,390 --> 00:25:00,610
For x to have to be
rebalanced, that means,

406
00:25:00,610 --> 00:25:03,880
again, one of its children will
have to be out of threshold.

407
00:25:03,880 --> 00:25:06,520
And then we insert or
delete within that child.

408
00:25:06,520 --> 00:25:09,130
And then that
propagates up to x.

409
00:25:09,130 --> 00:25:11,459
But right now, the children
are far within threshold.

410
00:25:11,459 --> 00:25:13,000
So the question is,
how long would it

411
00:25:13,000 --> 00:25:16,060
take for them to get
out of threshold again?

412
00:25:16,060 --> 00:25:18,610
Well, you'd have to change
the density by at least 1

413
00:25:18,610 --> 00:25:20,380
over an additive--

414
00:25:20,380 --> 00:25:23,800
1 over log N. If you
multiply by the size,

415
00:25:23,800 --> 00:25:26,664
it's the size of
the interval divided

416
00:25:26,664 --> 00:25:29,080
by log N. You've got to have
at least that many insertions

417
00:25:29,080 --> 00:25:29,800
or deletions.

418
00:25:32,650 --> 00:25:41,131
Before this node rebalances
again, one of its children

419
00:25:41,131 --> 00:25:42,130
must get out of balance.

420
00:25:47,890 --> 00:25:54,400
And so you must have
done at least the size

421
00:25:54,400 --> 00:26:03,460
of the interval
divided by theta log N

422
00:26:03,460 --> 00:26:10,570
updates for one of the children
to become out of balance again.

423
00:26:10,570 --> 00:26:11,890
Boom.

424
00:26:11,890 --> 00:26:14,290
So when this rebalance
happens again,

425
00:26:14,290 --> 00:26:18,350
we're going to charge to
those updates, which is good

426
00:26:18,350 --> 00:26:20,580
because the time it takes
us to do the rebalance

427
00:26:20,580 --> 00:26:21,860
is the size of the interval.

428
00:26:24,680 --> 00:26:30,340
We need to charge each of
these items log N times.

429
00:26:30,340 --> 00:26:43,130
So charge the
rebalance cost, which

430
00:26:43,130 --> 00:26:52,930
is the size of the
interval to these updates.

431
00:26:57,430 --> 00:26:58,930
And what we know
is that the updates

432
00:26:58,930 --> 00:27:00,070
are within the interval.

433
00:27:11,360 --> 00:27:16,460
So this looks like a log N
bound, which is not right.

434
00:27:16,460 --> 00:27:19,040
It should be a log
squared N bound.

435
00:27:19,040 --> 00:27:27,650
The idea is when we insert
into one of these leaves,

436
00:27:27,650 --> 00:27:32,180
we're simultaneously making this
node worse and this node worse

437
00:27:32,180 --> 00:27:34,760
and this node worse.

438
00:27:34,760 --> 00:27:36,500
Whenever we insert
a node, it belongs

439
00:27:36,500 --> 00:27:39,609
to log N intervals
that we care about.

440
00:27:39,609 --> 00:27:41,400
So in fact, not only
are we losing this log

441
00:27:41,400 --> 00:27:44,915
N, because there aren't quite
enough items to charge to--

442
00:27:44,915 --> 00:27:46,790
the log N factor less.

443
00:27:46,790 --> 00:27:50,960
We're also charging to each item
another factor of log N times

444
00:27:50,960 --> 00:27:53,898
because it lives in all
these different intervals.

445
00:28:03,610 --> 00:28:14,260
Each update gets charged at
most, h, which is order log

446
00:28:14,260 --> 00:28:14,990
N times.

447
00:28:20,770 --> 00:28:24,250
We looked at what
happens for node x,

448
00:28:24,250 --> 00:28:26,440
but we have to apply this
argument simultaneously

449
00:28:26,440 --> 00:28:27,940
for all nodes x.

450
00:28:27,940 --> 00:28:30,250
Fortunately, this
node versus this node,

451
00:28:30,250 --> 00:28:31,750
they don't share
any descendants.

452
00:28:31,750 --> 00:28:33,650
And so there's no
multiple charging.

453
00:28:33,650 --> 00:28:36,005
But node x and its
parent and grandparent

454
00:28:36,005 --> 00:28:38,890
and all its
ancestors, they're all

455
00:28:38,890 --> 00:28:40,240
talking about the same nodes.

456
00:28:40,240 --> 00:28:41,740
And so they will
multiple charge,

457
00:28:41,740 --> 00:28:43,540
but only by a factor
of log N. That's

458
00:28:43,540 --> 00:28:46,870
something we've seen a few
times, charging log N times,

459
00:28:46,870 --> 00:28:49,870
for every node in the
tree, like range trees

460
00:28:49,870 --> 00:28:52,720
having N log N space
in two dimensions.

461
00:28:52,720 --> 00:28:55,470
Same deal.

462
00:28:55,470 --> 00:28:58,960
So we've got size of the
interval divided by log N guys

463
00:28:58,960 --> 00:29:02,530
to charge to, which we multiply
charge log N times, so we get

464
00:29:02,530 --> 00:29:04,290
a log squared amortized bound.

465
00:29:15,180 --> 00:29:20,490
So this log N is hard to avoid
because we have a binary tree

466
00:29:20,490 --> 00:29:22,890
that's pretty natural.

467
00:29:22,890 --> 00:29:28,050
This log N, essentially,
comes from this h.

468
00:29:28,050 --> 00:29:30,461
The fact that we can only
go from one constant factor

469
00:29:30,461 --> 00:29:30,960
to another.

470
00:29:30,960 --> 00:29:33,570
And we've got log N
different steps to make.

471
00:29:33,570 --> 00:29:37,860
We have to do 1 over log
N increment every step.

472
00:29:37,860 --> 00:29:39,630
That's the best we could afford.

473
00:29:39,630 --> 00:29:45,880
That's why these are evenly
spaced out in this linear way.

474
00:29:45,880 --> 00:29:49,200
But if we had a little more
space, we could do better.

475
00:29:49,200 --> 00:29:52,194
So that's going to lead us to
this list labeling problem.

476
00:29:52,194 --> 00:29:53,610
But first, are
there any questions

477
00:29:53,610 --> 00:29:54,818
about order file maintenance?

478
00:29:54,818 --> 00:29:56,370
At this point, we are done.

479
00:29:56,370 --> 00:29:57,034
Yeah.

480
00:29:57,034 --> 00:29:59,492
AUDIENCE: Can you explain again
how is it that you get from

481
00:29:59,492 --> 00:30:02,547
the size of the interval
[INAUDIBLE] and that each--

482
00:30:02,547 --> 00:30:03,880
ERIK DEMAINE: This amortization?

483
00:30:03,880 --> 00:30:05,899
AUDIENCE: Yeah, how you
got to the amortized.

484
00:30:05,899 --> 00:30:06,690
ERIK DEMAINE: Yeah.

485
00:30:06,690 --> 00:30:07,689
So let me explain again.

486
00:30:07,689 --> 00:30:10,710
So when we do rebalance
of an interval,

487
00:30:10,710 --> 00:30:12,252
the cost is the size
of the interval.

488
00:30:12,252 --> 00:30:14,543
We're trying to analyze, what
is the size the interval?

489
00:30:14,543 --> 00:30:16,270
Prove that is log
squared N. So we have

490
00:30:16,270 --> 00:30:18,420
this cost of size of interval.

491
00:30:18,420 --> 00:30:23,430
We're charging it to the
items which just got inserted

492
00:30:23,430 --> 00:30:25,350
or deleted into that interval.

493
00:30:25,350 --> 00:30:30,000
Before this node rebalances
again, but in general,

494
00:30:30,000 --> 00:30:31,950
we're interested in the--

495
00:30:31,950 --> 00:30:34,710
we can afford to rebalance
every node at the beginning.

496
00:30:34,710 --> 00:30:38,010
And then whenever
a node rebalances--

497
00:30:38,010 --> 00:30:41,040
before it rebalances,
one of its children

498
00:30:41,040 --> 00:30:42,390
had to be out of whack.

499
00:30:42,390 --> 00:30:46,380
For one of its children
to be out of whack,

500
00:30:46,380 --> 00:30:47,880
there had to have
been an insertion

501
00:30:47,880 --> 00:30:52,740
of at least the size of the
interval divided by log N,

502
00:30:52,740 --> 00:30:54,432
because log was h.

503
00:30:54,432 --> 00:30:55,890
There's a slight
discrepancy, here.

504
00:30:55,890 --> 00:30:57,570
We're talking about
the size of the parent

505
00:30:57,570 --> 00:30:59,528
interval versus the size
of the child interval,

506
00:30:59,528 --> 00:31:01,560
but that's just a factor of 2.

507
00:31:01,560 --> 00:31:04,380
So that's incorporated
by this theta.

508
00:31:04,380 --> 00:31:07,830
So you could have a little
theta here, too, if you like.

509
00:31:07,830 --> 00:31:09,960
OK, so for the child
to be out of whack,

510
00:31:09,960 --> 00:31:12,870
we had to have done updates of
size interval divided by log N.

511
00:31:12,870 --> 00:31:15,610
So we charge this cost to them.

512
00:31:15,610 --> 00:31:18,690
And so we have to charge log
N to each of those items,

513
00:31:18,690 --> 00:31:20,644
each of those updates.

514
00:31:20,644 --> 00:31:22,060
And then there's
a second problem,

515
00:31:22,060 --> 00:31:26,190
which is everybody gets charged
by all of its ancestors.

516
00:31:26,190 --> 00:31:28,930
And it has log N ancestors.

517
00:31:28,930 --> 00:31:32,380
So in all, each update gets
charged at most log squared N

518
00:31:32,380 --> 00:31:33,170
times.

519
00:31:33,170 --> 00:31:36,130
So you get amortized
log squared per update.

520
00:31:41,400 --> 00:31:44,710
Other questions?

521
00:31:44,710 --> 00:31:45,310
Cool.

522
00:31:45,310 --> 00:31:46,480
So that's ordered files.

523
00:31:46,480 --> 00:31:48,620
Now we have b-trees.

524
00:31:48,620 --> 00:31:53,230
We can handle this log squared
N. That was OK for B trees,

525
00:31:53,230 --> 00:31:54,910
using a layer of indirection.

526
00:31:54,910 --> 00:31:59,470
But it's natural to wonder
whether you can do better.

527
00:31:59,470 --> 00:32:01,990
In general, the conjecture
is that for ordered files,

528
00:32:01,990 --> 00:32:03,220
you cannot do better.

529
00:32:03,220 --> 00:32:05,740
If this is your problem
setup, if you can only

530
00:32:05,740 --> 00:32:08,920
have constant sized gaps,
you need a linear size array.

531
00:32:08,920 --> 00:32:11,110
You need log squared
N interval updates,

532
00:32:11,110 --> 00:32:13,320
but no one's proved
that lower bound.

533
00:32:13,320 --> 00:32:15,670
So that's one open question.

534
00:32:15,670 --> 00:32:19,000
Another fun fact is
we did amortized,

535
00:32:19,000 --> 00:32:23,080
but it's actually
possible to do worst case.

536
00:32:23,080 --> 00:32:24,250
It's complicated.

537
00:32:24,250 --> 00:32:31,690
Willard did it in 1992.

538
00:32:31,690 --> 00:32:34,330
But let's talk about
relaxing the problem.

539
00:32:34,330 --> 00:32:37,230
So instead of saying, well, the
array has to be linear size,

540
00:32:37,230 --> 00:32:39,090
what if we let it to be bigger?

541
00:32:39,090 --> 00:32:40,440
And then you can do better.

542
00:32:44,890 --> 00:32:47,830
And this is a problem
called list labeling.

543
00:32:53,740 --> 00:32:57,870
So I'm going to rephrase it,
but it's essentially the same

544
00:32:57,870 --> 00:32:59,550
as order file maintenance.

545
00:32:59,550 --> 00:33:04,959
It's a little bit
less restriction.

546
00:33:34,310 --> 00:33:36,170
So it's a dynamic
linked list problem,

547
00:33:36,170 --> 00:33:37,580
like we've seen before.

548
00:33:37,580 --> 00:33:48,580
And each node at all
times stores a label such

549
00:33:48,580 --> 00:33:59,290
that labels are always
monotone down the list.

550
00:34:02,650 --> 00:34:04,130
So think of it this way.

551
00:34:04,130 --> 00:34:06,070
We have a linked list.

552
00:34:11,355 --> 00:34:13,730
And all we're interested is
maintaining this linked list.

553
00:34:13,730 --> 00:34:16,780
So we want to be able to
say, OK, delete this item

554
00:34:16,780 --> 00:34:19,030
and update these pointers.

555
00:34:19,030 --> 00:34:23,290
Or maybe, insert a
new item over here.

556
00:34:23,290 --> 00:34:25,330
And so it's going to
be linked like that.

557
00:34:25,330 --> 00:34:28,389
And at all times in
this cell here, we're

558
00:34:28,389 --> 00:34:35,274
storing a number, 3, 7,
12, 14, 42, whatever.

559
00:34:35,274 --> 00:34:38,739
It could be any
integer, let's say.

560
00:34:38,739 --> 00:34:40,790
And we need that the
numbers are increasing order

561
00:34:40,790 --> 00:34:43,239
down the linked list.

562
00:34:43,239 --> 00:34:47,000
Now, I claim this is basically
the same as an ordered file

563
00:34:47,000 --> 00:34:48,130
problem.

564
00:34:48,130 --> 00:34:50,620
Just think of this number as
being the index in the array

565
00:34:50,620 --> 00:34:52,250
that you store it.

566
00:34:52,250 --> 00:34:54,502
So this should be
strictly monotone--

567
00:34:57,280 --> 00:35:03,670
I guess, increasing
down the list.

568
00:35:03,670 --> 00:35:05,650
That means none of these
numbers are the same.

569
00:35:05,650 --> 00:35:08,350
So we can store the items--

570
00:35:08,350 --> 00:35:10,930
whatever data is
associated with this node--

571
00:35:10,930 --> 00:35:13,720
we can store that in
the array position 3,

572
00:35:13,720 --> 00:35:16,690
in the array position 7,
in the array position 12.

573
00:35:16,690 --> 00:35:22,114
When we insert, we have to find
a new label between 3 and 7.

574
00:35:22,114 --> 00:35:24,280
Now we're allowed to change
the labels dynamically--

575
00:35:24,280 --> 00:35:25,690
that's what makes
this possible--

576
00:35:25,690 --> 00:35:29,410
which corresponds to
moving items in the array.

577
00:35:29,410 --> 00:35:34,420
And the only difference is about
how this is physically done.

578
00:35:34,420 --> 00:35:37,150
With ordered file, you have
to physically move items

579
00:35:37,150 --> 00:35:37,900
in the array.

580
00:35:37,900 --> 00:35:43,030
Here, you just change a number
and that's moving the item.

581
00:35:43,030 --> 00:35:46,210
Where this gets interesting is
if you allow the label space--

582
00:35:46,210 --> 00:35:48,850
which is the storage
space of the array--

583
00:35:48,850 --> 00:35:50,770
to become super linear.

584
00:35:50,770 --> 00:35:53,650
With ordered files, it doesn't
really make a lot of sense

585
00:35:53,650 --> 00:35:56,530
to go super linear, at least,
not by more than a couple log

586
00:35:56,530 --> 00:36:00,340
factors because time is
always at least space.

587
00:36:00,340 --> 00:36:03,130
If you have a giant array,
you have to initialize it.

588
00:36:03,130 --> 00:36:05,380
And you can't afford to
initialize an array of, say,

589
00:36:05,380 --> 00:36:08,960
N squared size, if you're
trying to maintain N items.

590
00:36:08,960 --> 00:36:10,420
But in list labeling, you can.

591
00:36:10,420 --> 00:36:14,380
I mean, if you say all these
numbers are between 0 and N

592
00:36:14,380 --> 00:36:17,500
squared, that's no
big deal because you

593
00:36:17,500 --> 00:36:21,490
can represent a number up to
N squared, which is two log N

594
00:36:21,490 --> 00:36:22,400
bits.

595
00:36:22,400 --> 00:36:25,207
So squaring the space is no
big deal for list labeling.

596
00:36:25,207 --> 00:36:27,290
It would be a big deal for
order file maintenance,

597
00:36:27,290 --> 00:36:28,600
so that's why we rephrase.

598
00:36:28,600 --> 00:36:31,840
But they're basically
the same problem--

599
00:36:31,840 --> 00:36:35,140
just a little less useful
for cache-oblivious stuff.

600
00:36:42,740 --> 00:36:45,089
Let me tell you what's
known about mislabelling.

601
00:36:56,825 --> 00:37:11,220
In terms of the amount of
label space you're given,

602
00:37:11,220 --> 00:37:13,810
and how good a running
time per update

603
00:37:13,810 --> 00:37:17,460
we can get, in terms
of best known results.

604
00:37:17,460 --> 00:37:21,390
So what we just saw
is that if you do--

605
00:37:21,390 --> 00:37:22,804
we just said linear space.

606
00:37:22,804 --> 00:37:24,720
But in fact, you can get
1 plus epsilon space.

607
00:37:24,720 --> 00:37:28,380
You can say, oh, I wouldn't want
to waste 1% of my storage space

608
00:37:28,380 --> 00:37:29,560
in the array.

609
00:37:29,560 --> 00:37:32,370
And if you set the
theta at the leaves,

610
00:37:32,370 --> 00:37:36,612
here, to the right value,
then you can just waste 1%

611
00:37:36,612 --> 00:37:38,070
and still maintain
an ordered file.

612
00:37:38,070 --> 00:37:40,910
I think this is cool from
a file system perspective.

613
00:37:40,910 --> 00:37:44,790
But in particular, it
gives us what we need.

614
00:37:44,790 --> 00:37:47,940
And if you go up
to N log N space,

615
00:37:47,940 --> 00:37:50,490
it doesn't seem
to help you much.

616
00:37:50,490 --> 00:37:54,015
The best we know is log
squared N. As I mentioned,

617
00:37:54,015 --> 00:37:58,830
it could be amortized
or worst case.

618
00:37:58,830 --> 00:38:01,520
If you bump up the space to
N to the 1 plus epsilon--

619
00:38:01,520 --> 00:38:04,056
so a little bit super linear--

620
00:38:04,056 --> 00:38:10,122
and anything polynomial, then
the best we know is log N.

621
00:38:10,122 --> 00:38:11,580
And there is actually
a lower bound

622
00:38:11,580 --> 00:38:15,130
for this result in
a particular model.

623
00:38:15,130 --> 00:38:18,960
So it seems pretty clear that--

624
00:38:18,960 --> 00:38:21,630
at least for these style
of data structures,

625
00:38:21,630 --> 00:38:24,150
the best you can do in
this situation is log N.

626
00:38:24,150 --> 00:38:27,370
But hey, log N is
better than log squared.

627
00:38:27,370 --> 00:38:31,050
And the other obvious bound is
if you have exponential space,

628
00:38:31,050 --> 00:38:34,122
you can do constant,
essentially.

629
00:38:34,122 --> 00:38:35,580
Because with
exponential space, you

630
00:38:35,580 --> 00:38:37,110
can just keep
bisecting the interval

631
00:38:37,110 --> 00:38:39,630
between two items
in constant time

632
00:38:39,630 --> 00:38:41,487
until you've inserted N items.

633
00:38:41,487 --> 00:38:43,570
And then you can rebuild
the whole data structure.

634
00:38:43,570 --> 00:38:46,560
So that's sort of
the trivial result.

635
00:38:46,560 --> 00:38:49,590
If you don't care about
how big these labels get--

636
00:38:49,590 --> 00:38:52,151
and how big they would
get is 2 to the N--

637
00:38:52,151 --> 00:38:53,400
then you can do constant time.

638
00:38:53,400 --> 00:38:55,690
That's really the only way
we know how to do constant.

639
00:38:58,650 --> 00:39:00,480
So the interesting
new result, here,

640
00:39:00,480 --> 00:39:03,360
that I'm talking about
is for polynomial space,

641
00:39:03,360 --> 00:39:05,370
we can get log N.

642
00:39:05,370 --> 00:39:09,260
And rough idea is you just
fixed these density intervals.

643
00:39:09,260 --> 00:39:12,480
Now you don't have to make it--

644
00:39:12,480 --> 00:39:15,090
I mean, you're still going
to be divided by h, here,

645
00:39:15,090 --> 00:39:17,530
but your spread
can be much bigger.

646
00:39:17,530 --> 00:39:19,680
So your densities no longer
have to be constants.

647
00:39:19,680 --> 00:39:21,720
Now we can afford a density--

648
00:39:21,720 --> 00:39:25,050
near the root, we can get
a density of like 1 over--

649
00:39:25,050 --> 00:39:27,660
actually, anywhere, we
can afford a density of 1

650
00:39:27,660 --> 00:39:31,980
over N. Because if we have N
squared slots and only N items

651
00:39:31,980 --> 00:39:35,810
to put in them, then a decent
density is 1 over N, in fact.

652
00:39:35,810 --> 00:39:37,340
It could also be constant.

653
00:39:37,340 --> 00:39:39,330
Constant would be all right.

654
00:39:39,330 --> 00:39:42,180
So we've got a big spread there,
from 1 over N to constant.

655
00:39:42,180 --> 00:39:45,260
And so we can afford to
take much bigger jumps here,

656
00:39:45,260 --> 00:39:53,650
of like 1 over N. And so that
gets rid of this log N factor,

657
00:39:53,650 --> 00:39:55,650
essentially.

658
00:39:55,650 --> 00:39:58,450
That was the rough sketch.

659
00:39:58,450 --> 00:40:05,950
I have written, here,
that the densities we use

660
00:40:05,950 --> 00:40:08,060
are no longer uniformly spaced.

661
00:40:08,060 --> 00:40:11,110
The 1 over alpha to the d.

662
00:40:11,110 --> 00:40:19,600
Alpha, here, is some constant
in the interval between 1 and 2.

663
00:40:19,600 --> 00:40:21,070
And d is the depth.

664
00:40:21,070 --> 00:40:24,130
So now we have exponentially
increasing densities

665
00:40:24,130 --> 00:40:28,660
which give you a big gap--

666
00:40:28,660 --> 00:40:32,320
no longer lose
this log N factor.

667
00:40:32,320 --> 00:40:34,990
So you do exactly the
same data structure,

668
00:40:34,990 --> 00:40:36,650
these different densities.

669
00:40:36,650 --> 00:40:39,010
Now you've got room to
fill in a whole bunch

670
00:40:39,010 --> 00:40:42,840
more densities when you
have, say, N squared space.

671
00:40:42,840 --> 00:40:46,420
And so you only get the log N
factor because of the number

672
00:40:46,420 --> 00:40:47,471
of ancestors of a node.

673
00:40:47,471 --> 00:40:48,970
And you lose the
other log N factor.

674
00:40:55,470 --> 00:40:59,640
Now let me tell you
about another problem

675
00:40:59,640 --> 00:41:00,620
building on this.

676
00:41:11,672 --> 00:41:15,324
So this is the list order
maintenance problem.

677
00:41:15,324 --> 00:41:17,240
And this is the problem
we saw in a Lecture 1.

678
00:41:22,520 --> 00:41:32,540
So here, same as before,
maintain a linked list,

679
00:41:32,540 --> 00:41:40,880
subject to, insert a
node at this location,

680
00:41:40,880 --> 00:41:46,835
delete a node at this
location, and order queries.

681
00:41:53,720 --> 00:42:00,110
Is node x before
node y in the list?

682
00:42:03,110 --> 00:42:06,410
This is what we needed to
support full persistence.

683
00:42:06,410 --> 00:42:07,535
You have a big linked list.

684
00:42:14,430 --> 00:42:16,760
And then if I give you
this node and this node,

685
00:42:16,760 --> 00:42:18,950
I want to know that this
node is before that node

686
00:42:18,950 --> 00:42:21,710
in the list in constant time.

687
00:42:21,710 --> 00:42:25,250
I claim that we can solve this
problem given our solutions

688
00:42:25,250 --> 00:42:28,250
to list labeling--

689
00:42:28,250 --> 00:42:31,840
not so obvious, how.

690
00:42:31,840 --> 00:42:33,784
List labeling is
great because it lets

691
00:42:33,784 --> 00:42:34,950
you do order queries, right?

692
00:42:34,950 --> 00:42:37,130
You just compare the two
labels and you instantly

693
00:42:37,130 --> 00:42:39,680
discover which is before which.

694
00:42:39,680 --> 00:42:41,840
And so if we could afford
log N time over there,

695
00:42:41,840 --> 00:42:43,820
we could just use this solution.

696
00:42:43,820 --> 00:42:45,420
And this is reasonable.

697
00:42:45,420 --> 00:42:48,140
But we really want constant
time per operation.

698
00:42:48,140 --> 00:42:49,880
Now if we do constant
time per operation

699
00:42:49,880 --> 00:42:52,410
and we use exponential
label space,

700
00:42:52,410 --> 00:42:54,310
this is not so good
because it means you need

701
00:42:54,310 --> 00:42:56,540
N bits to write down a label.

702
00:42:56,540 --> 00:42:58,940
So it's going to take,
like, linear time to modify

703
00:42:58,940 --> 00:43:02,000
or to compare two labels.

704
00:43:02,000 --> 00:43:04,540
This doesn't save you.

705
00:43:04,540 --> 00:43:05,870
We can afford to do this.

706
00:43:05,870 --> 00:43:08,900
This is only log
N bits per label.

707
00:43:08,900 --> 00:43:10,760
And we assume all
of our integers

708
00:43:10,760 --> 00:43:13,190
can store at least log N bits--

709
00:43:13,190 --> 00:43:16,740
something called
a Word RAM Model.

710
00:43:16,740 --> 00:43:19,970
And so we can afford
to store these labels,

711
00:43:19,970 --> 00:43:22,140
but we pay this log N time.

712
00:43:22,140 --> 00:43:24,560
So you've got to
remove a log N factor.

713
00:43:24,560 --> 00:43:25,750
How do we do that?

714
00:43:25,750 --> 00:43:28,610
Indirection-- just
like last class.

715
00:43:34,230 --> 00:43:35,710
Let's do that on this board.

716
00:43:48,860 --> 00:43:55,670
On the top, we're going to
store N over log N items

717
00:43:55,670 --> 00:44:11,680
using this list labeling with
label space, say, N squared.

718
00:44:11,680 --> 00:44:13,420
Any polynomial will do.

719
00:44:13,420 --> 00:44:15,560
And so this takes
log N per operation

720
00:44:15,560 --> 00:44:18,480
to do anything on these
N over log N items.

721
00:44:18,480 --> 00:44:20,030
And then at the
bottom, we have lots

722
00:44:20,030 --> 00:44:23,000
of little structures
of size log N.

723
00:44:23,000 --> 00:44:25,910
And that's supposed to eat up
a factor of log N in our update

724
00:44:25,910 --> 00:44:27,130
time if we do it right.

725
00:44:35,030 --> 00:44:38,670
Actually, I can just do list
labeling down here as well.

726
00:44:38,670 --> 00:44:41,840
So if I'm only
storing log N items,

727
00:44:41,840 --> 00:44:44,960
then I can afford to use
the exponential solution--

728
00:44:44,960 --> 00:44:48,770
the trivial thing where I just
bisect all the labels until all

729
00:44:48,770 --> 00:44:50,480
the log items have changed.

730
00:44:50,480 --> 00:44:51,680
Then, rewrite them.

731
00:44:51,680 --> 00:44:56,000
Because 2 to the log N is only
N. So here, in each of these,

732
00:44:56,000 --> 00:45:04,370
I do a list labeling with
space 2 to the log N,

733
00:45:04,370 --> 00:45:06,800
also known as N.

734
00:45:06,800 --> 00:45:10,920
So these guys are constant
time to do anything in

735
00:45:10,920 --> 00:45:12,500
to maintain the labels.

736
00:45:12,500 --> 00:45:14,420
And to maintain levels up here--

737
00:45:14,420 --> 00:45:17,750
so to maintain, basically, each
of these N over log N guys,

738
00:45:17,750 --> 00:45:22,210
is one representative element
representing this entire group

739
00:45:22,210 --> 00:45:24,050
as N over log N of these groups.

740
00:45:27,320 --> 00:45:28,790
So this label
structure is supposed

741
00:45:28,790 --> 00:45:30,770
to distinguish the
different groups.

742
00:45:30,770 --> 00:45:34,340
And then the labels in here
are distinguishing the items

743
00:45:34,340 --> 00:45:35,270
within the group.

744
00:45:35,270 --> 00:45:41,620
Now this cost log N amortized
to keep what we need up here.

745
00:45:41,620 --> 00:45:44,180
But now if I want
the label of an item,

746
00:45:44,180 --> 00:45:49,030
I just take this label,
comma, this label.

747
00:45:49,030 --> 00:45:52,290
So if I have an item
here, for example,

748
00:45:52,290 --> 00:45:54,620
at first, I look at
the label of this block

749
00:45:54,620 --> 00:45:57,000
as stored by this
data structure.

750
00:45:57,000 --> 00:45:59,180
And then I look at the
label within the block.

751
00:45:59,180 --> 00:46:05,690
And so my composite
label is going

752
00:46:05,690 --> 00:46:12,020
to be an ordered pair of top
label, comma, the bottom label.

753
00:46:18,017 --> 00:46:19,600
This is kind of funny
because it looks

754
00:46:19,600 --> 00:46:22,030
like we're solving the list
labeling problem again,

755
00:46:22,030 --> 00:46:25,030
with now a space of N cubed.

756
00:46:25,030 --> 00:46:27,760
We've got N squared for
the first coordinate,

757
00:46:27,760 --> 00:46:29,471
and then N for the
second coordinate.

758
00:46:29,471 --> 00:46:30,970
So if you just
concatenate those two

759
00:46:30,970 --> 00:46:35,740
labels, that lives in a bigger
label space of size N cubed.

760
00:46:35,740 --> 00:46:38,520
And yet, I claim this
takes constant amortized

761
00:46:38,520 --> 00:46:40,750
and is not a solution
to list labeling.

762
00:46:40,750 --> 00:46:44,560
It's just a matter of, again,
how updates are performed.

763
00:46:44,560 --> 00:46:47,260
With order file maintenance, we
had to physically move items.

764
00:46:47,260 --> 00:46:49,330
With list labeling
problem, we had

765
00:46:49,330 --> 00:46:52,450
to modify the number
stored with each node.

766
00:46:52,450 --> 00:46:54,010
That's expensive.

767
00:46:54,010 --> 00:46:58,570
In this world, if we
change a label up here,

768
00:46:58,570 --> 00:47:00,970
we are basically
instantly-- say,

769
00:47:00,970 --> 00:47:04,450
we changed the label
corresponding to this group

770
00:47:04,450 --> 00:47:05,500
in here.

771
00:47:05,500 --> 00:47:09,070
We changed the label all of
these items in one operation.

772
00:47:09,070 --> 00:47:11,170
Or actually, it
takes log N time.

773
00:47:11,170 --> 00:47:14,060
And then we change the label
of all log N of these items.

774
00:47:14,060 --> 00:47:17,080
So that's why we get
constant amortized.

775
00:47:17,080 --> 00:47:18,440
So how does this work?

776
00:47:18,440 --> 00:47:20,230
If we insert a new
item, we stick it

777
00:47:20,230 --> 00:47:22,240
into one of these blocks.

778
00:47:22,240 --> 00:47:23,920
The block that it fits into.

779
00:47:23,920 --> 00:47:27,850
If this block gets too full,
more than, say, 1 times log N,

780
00:47:27,850 --> 00:47:29,260
then we split it in half.

781
00:47:29,260 --> 00:47:32,410
If it gets too sparse by
deletion, say, less than a 1/4

782
00:47:32,410 --> 00:47:35,080
log N, then we'll merge it
with one of its neighbors,

783
00:47:35,080 --> 00:47:36,550
and then possibly split.

784
00:47:36,550 --> 00:47:38,800
Each of those triggers a
constant number of operations

785
00:47:38,800 --> 00:47:39,900
up here.

786
00:47:39,900 --> 00:47:41,920
And so we pay log N,
but we only pay it

787
00:47:41,920 --> 00:47:45,850
when we've made theta log N
changes to one of these blocks.

788
00:47:45,850 --> 00:47:49,150
So we can charge this log N
cost to those log N updates.

789
00:47:49,150 --> 00:47:50,770
And so this turns into constant.

790
00:47:50,770 --> 00:47:52,670
This down here is
always constant.

791
00:47:52,670 --> 00:47:54,671
And so it's constant amortized.

792
00:48:02,530 --> 00:48:05,410
And if each of these
blocks remembers

793
00:48:05,410 --> 00:48:07,480
what the corresponding--

794
00:48:07,480 --> 00:48:09,370
and, basically, this
is a linked list, here.

795
00:48:09,370 --> 00:48:10,994
And then we have
linked list down here,

796
00:48:10,994 --> 00:48:12,400
and they have labels.

797
00:48:12,400 --> 00:48:15,940
And it's like a, what do you
call, a skip list with just two

798
00:48:15,940 --> 00:48:17,110
levels.

799
00:48:17,110 --> 00:48:18,820
And every node
here just remembers

800
00:48:18,820 --> 00:48:20,527
what its parent is up here.

801
00:48:20,527 --> 00:48:22,360
So if you want to know
your composite label,

802
00:48:22,360 --> 00:48:25,660
you just look at the bottom
label and then walk up.

803
00:48:25,660 --> 00:48:27,010
Look at the top label.

804
00:48:27,010 --> 00:48:29,140
Those are just stored
right there in the nodes.

805
00:48:29,140 --> 00:48:31,348
And so in constant time,
you can find your top label,

806
00:48:31,348 --> 00:48:32,200
your bottom label.

807
00:48:32,200 --> 00:48:35,210
Therefore, you can compare
two items in constant time.

808
00:48:35,210 --> 00:48:37,380
So this solves the problem
we needed in Lecture 1.

809
00:48:37,380 --> 00:48:37,879
Question?

810
00:48:37,879 --> 00:48:41,708
AUDIENCE: Sorry, why isn't it
the same as the N cubed label

811
00:48:41,708 --> 00:48:42,210
space?

812
00:48:42,210 --> 00:48:42,918
ERIK DEMAINE: OK.

813
00:48:42,918 --> 00:48:45,880
Why is it not same as N cubed
label space, which I claim

814
00:48:45,880 --> 00:48:47,770
has a lower amount of log N?

815
00:48:47,770 --> 00:48:50,650
The difference is
with list labeling,

816
00:48:50,650 --> 00:48:53,590
you have to explicitly change
the label of each items.

817
00:48:53,590 --> 00:48:55,990
And here, we're
basically computing

818
00:48:55,990 --> 00:48:58,492
the label of a node now becomes
a constant time algorithm.

819
00:48:58,492 --> 00:48:59,950
We're allowed to
look at the label,

820
00:48:59,950 --> 00:49:02,014
here, then walk up, then
look at the label, here.

821
00:49:02,014 --> 00:49:03,430
And by changing
the label up here,

822
00:49:03,430 --> 00:49:06,550
we change it simultaneously
for log N guys down there.

823
00:49:06,550 --> 00:49:09,400
So that's the big difference
between this list order

824
00:49:09,400 --> 00:49:13,000
maintenance problem from
the list labeling problem.

825
00:49:13,000 --> 00:49:15,010
These were actually all
solved in the same paper

826
00:49:15,010 --> 00:49:16,540
by Dietz and Slater.

827
00:49:16,540 --> 00:49:20,200
But sort of successively--

828
00:49:20,200 --> 00:49:21,731
slightly relaxing
on a problem makes

829
00:49:21,731 --> 00:49:23,980
a huge difference in the
running time you can achieve.

830
00:49:23,980 --> 00:49:26,021
Obviously, we can't get
any better than constant.

831
00:49:26,021 --> 00:49:28,407
So we're done in--

832
00:49:28,407 --> 00:49:30,490
of course, then there's
external memory versions--

833
00:49:30,490 --> 00:49:36,350
but in terms of regular
data structures.

834
00:49:36,350 --> 00:49:41,200
And again, Willard made
it worst case constant.

835
00:49:41,200 --> 00:49:43,800
That's a lot harder.

836
00:49:43,800 --> 00:49:46,661
Other questions?

837
00:49:46,661 --> 00:49:47,160
Cool.

838
00:49:47,160 --> 00:49:52,260
So that does order file
maintenance and list labeling.

839
00:49:52,260 --> 00:49:55,210
And so next, we're going to
move to a very different data

840
00:49:55,210 --> 00:49:57,520
structure, which is
cache-oblivious priority queue.

841
00:50:09,912 --> 00:50:11,870
We haven't really done
any cache-oblivious data

842
00:50:11,870 --> 00:50:12,536
structures, yet.

843
00:50:12,536 --> 00:50:15,710
So it's time to return
to our original goal.

844
00:50:19,640 --> 00:50:21,820
We're not actually going
to use ordered files.

845
00:50:21,820 --> 00:50:23,800
Sadly, that was last lecture.

846
00:50:26,644 --> 00:50:30,550
So All of this was a
continuation of last lecture.

847
00:50:30,550 --> 00:50:32,770
Now we're going to do a
different data structure.

848
00:50:32,770 --> 00:50:34,270
It's going to adapt
to B, it's going

849
00:50:34,270 --> 00:50:36,170
to adapt to M. It's
cache-oblivious.

850
00:50:36,170 --> 00:50:38,080
And it achieves priority queue.

851
00:50:38,080 --> 00:50:40,120
Now remember, this
sorting bound--

852
00:50:44,830 --> 00:50:49,120
N over B log base M of N over
B This is our sorting bound.

853
00:50:49,120 --> 00:50:53,270
Then the priority queue bound
we want is this divided by N.

854
00:50:53,270 --> 00:50:57,520
So we want to be able to
do 1 over B log base M

855
00:50:57,520 --> 00:51:01,990
over B, N over b.

856
00:51:01,990 --> 00:51:05,920
Insert and delete
min, let's say.

857
00:51:05,920 --> 00:51:08,290
And this is an interesting
bound because it's usually

858
00:51:08,290 --> 00:51:10,060
much less than 1.

859
00:51:10,060 --> 00:51:11,990
It's a sub constant bound.

860
00:51:11,990 --> 00:51:14,740
This is going to
be a little o of 1,

861
00:51:14,740 --> 00:51:20,530
assuming this log is
smaller than this B.

862
00:51:20,530 --> 00:51:30,190
So it's a little o of
1 if, let's say, B is

863
00:51:30,190 --> 00:51:33,620
bigger than the log N, roughly.

864
00:51:33,620 --> 00:51:37,390
This would, in
particular, be enough.

865
00:51:37,390 --> 00:51:39,820
No pun intended-- be enough.

866
00:51:39,820 --> 00:51:43,030
So if our block size
is reasonably big--

867
00:51:43,030 --> 00:51:45,035
cache line is bigger
than log N, which

868
00:51:45,035 --> 00:51:48,550
is probably most likely
in typical caches

869
00:51:48,550 --> 00:51:49,900
and architectures--

870
00:51:49,900 --> 00:51:54,420
then this is more like 1
over B, never mind the log.

871
00:51:54,420 --> 00:51:57,970
And so we have to do B
operations in one step.

872
00:51:57,970 --> 00:52:00,250
And then to really
get the log right,

873
00:52:00,250 --> 00:52:10,690
we have to depend
on M, not just B.

874
00:52:10,690 --> 00:52:16,300
As I mentioned, that's the
bound we're going to achieve.

875
00:52:16,300 --> 00:52:18,700
We do need to make an
assumption about M and B

876
00:52:18,700 --> 00:52:20,420
and how they relate.

877
00:52:20,420 --> 00:52:31,250
So we're going to assume a
tall cache, which is M is,

878
00:52:31,250 --> 00:52:35,230
let's say, B to
the 1 plus epsilon.

879
00:52:35,230 --> 00:52:38,380
So it has to be substantially
bigger than B, not

880
00:52:38,380 --> 00:52:39,475
by a huge amount.

881
00:52:39,475 --> 00:52:40,600
It could be B squared.

882
00:52:40,600 --> 00:52:42,910
It could be B to the 1.1 power.

883
00:52:42,910 --> 00:52:45,242
But M is definitely
at least B. And I want

884
00:52:45,242 --> 00:52:46,450
it to be a little bit bigger.

885
00:52:46,450 --> 00:52:49,420
I want the number of
blocks to be not so tiny.

886
00:52:53,537 --> 00:52:54,370
Here's how we do it.

887
00:53:19,940 --> 00:53:22,670
I think I need a big picture.

888
00:53:29,260 --> 00:53:32,010
So the kind of funny thing about
cache-oblivious priority queues

889
00:53:32,010 --> 00:53:33,445
is we're not going to use trees.

890
00:53:36,190 --> 00:53:40,090
It's basically a bunch of
arrays in a linear order.

891
00:53:42,814 --> 00:53:45,670
In some ways, it's an
easier data structure.

892
00:53:45,670 --> 00:53:48,056
But it's definitely
more complicated.

893
00:53:52,030 --> 00:53:54,786
But, hey, it's
faster than b-trees.

894
00:53:54,786 --> 00:53:57,160
If you don't need to be able
to do searches-- if you just

895
00:53:57,160 --> 00:53:59,770
need to be able to delete
min, then priority queues

896
00:53:59,770 --> 00:54:01,810
are a lot faster than b-trees.

897
00:54:10,206 --> 00:54:11,330
Let me draw this.

898
00:54:24,420 --> 00:54:28,500
I want to try to draw
this pseudo-accurate size.

899
00:54:51,132 --> 00:54:53,230
The only part that's
inaccurate about this picture

900
00:54:53,230 --> 00:54:55,050
is the dot dot dots.

901
00:54:55,050 --> 00:54:56,100
Lets me cheat a little.

902
00:55:09,160 --> 00:55:19,440
So we have x to that
9/4, x to the 3/2 and x.

903
00:55:19,440 --> 00:55:24,260
This is sort of what the
levels are going to look like.

904
00:55:24,260 --> 00:55:26,510
So we have a linear
sequence of levels.

905
00:55:26,510 --> 00:55:32,090
They are increasing doubly
exponentially in size.

906
00:55:32,090 --> 00:55:34,840
If you look at what's
going on here--

907
00:55:38,137 --> 00:55:38,970
double exponential--

908
00:55:38,970 --> 00:55:39,926
AUDIENCE: Are those
exponentials the same as those?

909
00:55:39,926 --> 00:55:41,014
Or are they inverted?

910
00:55:41,014 --> 00:55:42,680
ERIK DEMAINE: It's
supposed to be this--

911
00:55:42,680 --> 00:55:47,400
so this is the top level,
and this is the bottom level.

912
00:55:51,100 --> 00:55:54,060
So at the top, we're going
to have something of size N.

913
00:55:54,060 --> 00:55:56,620
And at the bottom, we're going
to have something of a size C,

914
00:55:56,620 --> 00:55:57,770
constant.

915
00:55:57,770 --> 00:56:00,370
And what I've drawn is
the generic middle part.

916
00:56:00,370 --> 00:56:03,400
So we go from x, to x
to 3/2, to x to the 9/4.

917
00:56:03,400 --> 00:56:06,520
Or you could say x to the
9/4, down to x to the 3/2,

918
00:56:06,520 --> 00:56:12,040
down to x, according to
this exponential geometric

919
00:56:12,040 --> 00:56:13,610
progression, I guess.

920
00:56:13,610 --> 00:56:16,610
So that's how they go.

921
00:56:16,610 --> 00:56:18,970
So I mean, if it was
just exponential,

922
00:56:18,970 --> 00:56:21,580
it would be, like, N,
N over 2, N over 4.

923
00:56:21,580 --> 00:56:23,950
But we're doing--
in logarithms, we're

924
00:56:23,950 --> 00:56:25,200
changing by constant factors.

925
00:56:25,200 --> 00:56:28,120
So it's doubly exponential.

926
00:56:28,120 --> 00:56:30,700
And then each of the
levels is decomposed

927
00:56:30,700 --> 00:56:33,810
into top buffers and bottom--

928
00:56:33,810 --> 00:56:37,120
or, sorry, up buffers
and down buffers.

929
00:56:37,120 --> 00:56:39,820
And as you see, the
size of the top buffer

930
00:56:39,820 --> 00:56:43,600
is equal to the sum of the
sizes of the bottom buffers.

931
00:56:43,600 --> 00:56:48,920
So let me elaborate
on how this works.

932
00:56:48,920 --> 00:56:53,770
It's level x to the 3/2.

933
00:56:53,770 --> 00:56:56,500
We're going to, generically,
be looking at x to the 3/2.

934
00:56:56,500 --> 00:56:57,550
So I can easily go down.

935
00:56:57,550 --> 00:56:58,390
That's of size x.

936
00:56:58,390 --> 00:57:00,945
And up is x to the 9/4--

937
00:57:00,945 --> 00:57:01,445
easily.

938
00:57:04,660 --> 00:57:15,325
There's always one up
buffer of size x to the 3/2.

939
00:57:17,840 --> 00:57:19,850
So maybe some color.

940
00:57:22,410 --> 00:57:27,420
So this buffer, here,
is x to the 3/2.

941
00:57:27,420 --> 00:57:43,200
And then we also have up to x
to the 1/2 down buffers, each

942
00:57:43,200 --> 00:57:46,185
of size theta x.

943
00:57:49,750 --> 00:57:51,920
Each of these guys
has size theta x.

944
00:57:55,700 --> 00:58:00,382
And then the number of them
is, at most, x to the 1/2.

945
00:58:00,382 --> 00:58:01,840
And so you take
the product, that's

946
00:58:01,840 --> 00:58:04,810
x to 3/2, which is the
same as the up buffer.

947
00:58:04,810 --> 00:58:08,260
So that's the way
this cookie crumbles.

948
00:58:08,260 --> 00:58:12,460
That's how each of these guys
decomposes into little pieces.

949
00:58:12,460 --> 00:58:15,550
And so, in particular, this
up buffer, if you work it out,

950
00:58:15,550 --> 00:58:19,180
should be exactly the same
size as this down buffer.

951
00:58:19,180 --> 00:58:21,250
Maybe it's easier
to do these two.

952
00:58:21,250 --> 00:58:24,250
So this down buffer has size x.

953
00:58:24,250 --> 00:58:28,070
And if you have a structure size
x, the up buffer has size x.

954
00:58:28,070 --> 00:58:30,850
x of 3/2 has an up
buffer size of x 3/2.

955
00:58:30,850 --> 00:58:32,230
So this will be size x.

956
00:58:32,230 --> 00:58:35,220
And so these match.

957
00:58:35,220 --> 00:58:36,170
These are equal size.

958
00:58:36,170 --> 00:58:37,420
That's why I drew it this way.

959
00:58:40,630 --> 00:58:43,500
Now the dot dot dot hides the
fact that they're x to the 1/2

960
00:58:43,500 --> 00:58:44,000
of these.

961
00:58:44,000 --> 00:58:45,422
So there's a lot
of down buffers.

962
00:58:45,422 --> 00:58:47,380
They're actually a lot
smaller than up buffers.

963
00:58:47,380 --> 00:58:50,040
But these two match in
size and these two match

964
00:58:50,040 --> 00:58:50,860
in size, and so on.

965
00:58:59,370 --> 00:59:02,040
OK, there's a small exception.

966
00:59:02,040 --> 00:59:04,940
I'd put theta x here.

967
00:59:04,940 --> 00:59:08,220
The exception is that the
very first down buffer

968
00:59:08,220 --> 00:59:10,890
might be mostly empty.

969
00:59:10,890 --> 00:59:13,890
So this one is not
actually data x.

970
00:59:13,890 --> 00:59:16,380
Each of these will be theta x.

971
00:59:16,380 --> 00:59:19,490
And this one over on the
left will be big O of x.

972
00:59:22,300 --> 00:59:25,139
Small typo in the notes, here.

973
00:59:31,010 --> 00:59:35,599
So if that's not sufficiently
messy, let me redraw it--

974
00:59:35,599 --> 00:59:36,640
make it a little cleaner.

975
01:00:00,060 --> 01:00:03,360
I want to look at two
consecutive levels

976
01:00:03,360 --> 01:00:06,260
and specify invariants.

977
01:00:12,001 --> 01:00:14,250
So after all, I'm trying to
maintain a priority queue.

978
01:00:14,250 --> 01:00:16,500
What the heck is this thing?

979
01:00:16,500 --> 01:00:18,630
The idea is that
towards the bottom,

980
01:00:18,630 --> 01:00:20,760
that's where the min is.

981
01:00:20,760 --> 01:00:22,950
That would seem good,
because at the bottom,

982
01:00:22,950 --> 01:00:24,330
you're constant size.

983
01:00:24,330 --> 01:00:26,175
I can diddle with the
thing of constant size

984
01:00:26,175 --> 01:00:28,500
if fits in a block
or if it's in cache.

985
01:00:28,500 --> 01:00:31,890
It takes me zero time to
touch things near the bottom.

986
01:00:31,890 --> 01:00:34,530
So as long as I always
keep the min down there,

987
01:00:34,530 --> 01:00:36,960
I can do fine min in zero time.

988
01:00:36,960 --> 01:00:39,450
And delete min will
also be pretty fast.

989
01:00:39,450 --> 01:00:41,910
I'm also going to insert
there, because I don't

990
01:00:41,910 --> 01:00:43,200
know where else to put it.

991
01:00:43,200 --> 01:00:45,630
I'd like the larger items
to be at near the top.

992
01:00:45,630 --> 01:00:48,060
I'd like the smaller items
to be near the bottom,

993
01:00:48,060 --> 01:00:51,480
but kind of hard to know,
when I insert an item,

994
01:00:51,480 --> 01:00:52,500
where it belongs.

995
01:00:52,500 --> 01:00:54,420
So I'll just start by
inserting at the bottom.

996
01:00:54,420 --> 01:00:58,500
And the idea is as I insert,
insert, insert down here,

997
01:00:58,500 --> 01:01:00,030
things will start to trickle up.

998
01:01:00,030 --> 01:01:03,970
That's what the up arrows mean,
that these items are moving up.

999
01:01:03,970 --> 01:01:08,570
And then down arrow items are
items that are moving down.

1000
01:01:08,570 --> 01:01:09,820
Somehow this is going to work.

1001
01:01:09,820 --> 01:01:11,930
So let me tell
you the invariants

1002
01:01:11,930 --> 01:01:14,190
that will make this work.

1003
01:01:14,190 --> 01:01:18,300
If you look at the down
buffers, they are sorted.

1004
01:01:18,300 --> 01:01:19,920
Or I should say, all
the items in here

1005
01:01:19,920 --> 01:01:23,010
are less than all the
items in here, and so on.

1006
01:01:23,010 --> 01:01:27,270
But within each down
buffer, it's disordered.

1007
01:01:27,270 --> 01:01:36,290
Then we also know this.

1008
01:01:36,290 --> 01:01:38,580
All the items in the up
buffer in a given level--

1009
01:01:38,580 --> 01:01:42,175
this is x to the 3/2, let's say.

1010
01:01:42,175 --> 01:01:44,960
And this is x.

1011
01:01:44,960 --> 01:01:50,930
All of the up items are larger
than all of the down items.

1012
01:01:50,930 --> 01:01:53,805
I mean, it's just
an inequality here.

1013
01:01:53,805 --> 01:01:55,430
So these guys are
basically in a chain.

1014
01:01:55,430 --> 01:01:58,006
Again, the items in
here are not sorted.

1015
01:01:58,006 --> 01:02:00,380
But all the items here are
bigger than all the items here

1016
01:02:00,380 --> 01:02:01,610
are bigger than all
of the items here are

1017
01:02:01,610 --> 01:02:02,901
bigger than all the items here.

1018
01:02:02,901 --> 01:02:05,510
Now what about from
level to level?

1019
01:02:05,510 --> 01:02:07,730
This is a little more subtle.

1020
01:02:07,730 --> 01:02:10,880
What we know is this.

1021
01:02:14,089 --> 01:02:16,380
So again, we know that all
the down buffers are sorted.

1022
01:02:16,380 --> 01:02:17,796
And we know that
this down buffer,

1023
01:02:17,796 --> 01:02:20,820
all these items come before
all this down buffer items.

1024
01:02:20,820 --> 01:02:23,140
But these up buffer
items, we don't know yet,

1025
01:02:23,140 --> 01:02:24,780
because they're still moving up.

1026
01:02:24,780 --> 01:02:27,130
Now we know this.

1027
01:02:27,130 --> 01:02:28,362
They need to move up.

1028
01:02:28,362 --> 01:02:29,820
But we still don't
know how far up.

1029
01:02:29,820 --> 01:02:32,028
We don't know how these
items compare to these items.

1030
01:02:32,028 --> 01:02:34,590
Or these items could
have to go higher,

1031
01:02:34,590 --> 01:02:38,260
could be they belong here,
or are here, who knows.

1032
01:02:38,260 --> 01:02:41,280
So basically, the down buffers
are more or less sorted.

1033
01:02:41,280 --> 01:02:43,440
And so the mins are going
to be down at the bottom.

1034
01:02:43,440 --> 01:02:45,889
And the up buffer items, I
mean, they're still moving up.

1035
01:02:45,889 --> 01:02:47,430
We don't know where
they belong, yet.

1036
01:02:47,430 --> 01:02:49,054
But eventually they'll
find their place

1037
01:02:49,054 --> 01:02:50,730
and start trickling down.

1038
01:02:50,730 --> 01:02:53,001
Roughly speaking, an item
will go up for a while

1039
01:02:53,001 --> 01:02:54,000
and then come back down.

1040
01:02:54,000 --> 01:02:55,320
Although, that's
not literally true.

1041
01:02:55,320 --> 01:02:56,070
It's roughly true.

1042
01:03:01,100 --> 01:03:04,760
I should say something
about how we actually

1043
01:03:04,760 --> 01:03:08,240
store this in memory, because
the whole name of the game

1044
01:03:08,240 --> 01:03:10,720
is how you lay
things out in memory.

1045
01:03:10,720 --> 01:03:13,220
In cache-oblivious, that's all
you get to choose, basically.

1046
01:03:13,220 --> 01:03:16,700
The rest is algorithm,
regular RAM algorithm.

1047
01:03:16,700 --> 01:03:18,680
And all we do is store
the items in order,

1048
01:03:18,680 --> 01:03:20,150
say, from bottom to top.

1049
01:03:20,150 --> 01:03:22,400
So store the entire C
level, then the next level

1050
01:03:22,400 --> 01:03:25,835
up, to store these items
as consecutive arrays.

1051
01:03:25,835 --> 01:03:28,190
That's what we need to.

1052
01:03:28,190 --> 01:03:30,800
Leave enough space for x to
the 1/2 down buffers, each

1053
01:03:30,800 --> 01:03:32,260
at size theta x.

1054
01:03:54,430 --> 01:03:59,840
So how do we do
inserts and deletes?

1055
01:04:06,814 --> 01:04:10,330
Let's start with insert.

1056
01:04:10,330 --> 01:04:13,695
As I mentioned, we want to start
by inserting in the bottom,

1057
01:04:13,695 --> 01:04:15,320
until we run out of
room in the bottom.

1058
01:04:15,320 --> 01:04:17,780
And then things
have to trickle up.

1059
01:04:17,780 --> 01:04:19,583
So here's the basic algorithm.

1060
01:04:32,770 --> 01:04:34,540
You look at the bottom level.

1061
01:04:34,540 --> 01:04:38,030
You stick the item
into the up buffer.

1062
01:04:38,030 --> 01:04:40,052
This is not necessarily
the right thing to do.

1063
01:04:40,052 --> 01:04:41,760
So here, you stick it
into the up buffer.

1064
01:04:41,760 --> 01:04:44,390
But these things are supposed
to be roughly sorted.

1065
01:04:44,390 --> 01:04:46,780
So once you stick it there,
you say, oh, well maybe I

1066
01:04:46,780 --> 01:04:48,080
have to go down here.

1067
01:04:48,080 --> 01:04:48,901
Go down here.

1068
01:04:48,901 --> 01:04:50,650
The point is the up
buffer is the only one

1069
01:04:50,650 --> 01:04:52,750
I want to be growing in size.

1070
01:04:52,750 --> 01:04:54,430
So I insert into the up buffer.

1071
01:04:54,430 --> 01:04:57,370
And say, oh, potentially
I have to swap down here.

1072
01:04:57,370 --> 01:04:58,490
Swap down here.

1073
01:04:58,490 --> 01:04:59,740
I mean, this is constant size.

1074
01:04:59,740 --> 01:05:03,370
I can afford to look at all
the items here in zero time

1075
01:05:03,370 --> 01:05:05,620
and find out which
buffer it belongs to.

1076
01:05:05,620 --> 01:05:07,840
As I move the item
down here, I swap.

1077
01:05:07,840 --> 01:05:11,060
So I take the max item here
and move it up to here.

1078
01:05:11,060 --> 01:05:13,510
Move the max item from
the next down buffer

1079
01:05:13,510 --> 01:05:16,630
and propagate it up, so that
I keep these things in order.

1080
01:05:22,530 --> 01:05:34,030
Swap into bottom down
buffers if necessary--

1081
01:05:36,730 --> 01:05:38,110
or maybe, as necessary.

1082
01:05:38,110 --> 01:05:39,909
You might have to
do a bunch of swaps.

1083
01:05:39,909 --> 01:05:40,825
But it's all constant.

1084
01:05:44,560 --> 01:05:47,230
And then the point
is the only buffer

1085
01:05:47,230 --> 01:05:50,260
that got larger by one item was
the up buffer, because, here,

1086
01:05:50,260 --> 01:05:52,480
we did swaps to preserve size.

1087
01:05:52,480 --> 01:05:55,668
And if that overflows, then
we do something interesting.

1088
01:05:59,580 --> 01:06:02,250
And something interesting
is called push.

1089
01:06:06,200 --> 01:06:11,720
This is a separate team,
which I'll define now.

1090
01:06:29,000 --> 01:06:31,040
So this is the
bottom level push.

1091
01:06:31,040 --> 01:06:33,770
But I wanted to find the
generic level of push, which

1092
01:06:33,770 --> 01:06:41,150
is we're going to be pushing
x elements into level x

1093
01:06:41,150 --> 01:06:41,715
to the 3/2.

1094
01:06:47,500 --> 01:06:51,150
So why is that?

1095
01:06:51,150 --> 01:06:52,320
Check it out.

1096
01:06:52,320 --> 01:06:56,104
If we're at level x and
our up buffer overflows,

1097
01:06:56,104 --> 01:06:58,020
and we're trying to push
into the next level--

1098
01:06:58,020 --> 01:06:59,970
that's level x to 3/2--

1099
01:06:59,970 --> 01:07:05,435
then the up buffer that we're
trying to push has size x.

1100
01:07:05,435 --> 01:07:09,180
The up buffer has the same
size as the level name.

1101
01:07:09,180 --> 01:07:11,190
So if we're at level
x, this has size x.

1102
01:07:11,190 --> 01:07:13,780
We're trying to push x items
up into the next thing.

1103
01:07:18,215 --> 01:07:19,840
We're going to empty
out the up buffer.

1104
01:07:19,840 --> 01:07:21,230
Send all those items up there.

1105
01:07:24,330 --> 01:07:24,990
Cool.

1106
01:07:24,990 --> 01:07:25,770
So what do we do?

1107
01:07:25,770 --> 01:07:27,390
How do we do this push?

1108
01:07:27,390 --> 01:07:31,590
First thing we do
is sort the items--

1109
01:07:31,590 --> 01:07:33,570
those x items.

1110
01:07:33,570 --> 01:07:36,300
This is where I need
the black box, which

1111
01:07:36,300 --> 01:07:39,000
is that we can sort
N over B log base N

1112
01:07:39,000 --> 01:07:41,340
over B, N over B
cache-obliviously.

1113
01:07:41,340 --> 01:07:45,480
So we're going to use that here.

1114
01:07:45,480 --> 01:07:48,810
It's not hard, but I don't
want to spend time on it.

1115
01:07:48,810 --> 01:07:54,480
Now the interesting bit
is how we do the push.

1116
01:07:54,480 --> 01:07:57,000
The tricky part is we
have all these items,

1117
01:07:57,000 --> 01:07:59,479
we know that they're bigger
than all the items below them.

1118
01:07:59,479 --> 01:08:00,770
Maybe, here's a better picture.

1119
01:08:00,770 --> 01:08:02,370
We know these guys
need to go up.

1120
01:08:02,370 --> 01:08:06,300
They're bigger than everything,
all the down items below us.

1121
01:08:06,300 --> 01:08:07,800
But we don't know,
does it fit here?

1122
01:08:07,800 --> 01:08:08,450
Here?

1123
01:08:08,450 --> 01:08:08,950
Here?

1124
01:08:08,950 --> 01:08:10,605
Or here?

1125
01:08:10,605 --> 01:08:12,480
But we do know that
these things are ordered.

1126
01:08:12,480 --> 01:08:14,850
And so if we sort these
items, then we can say,

1127
01:08:14,850 --> 01:08:16,859
well, let's start
by looking here.

1128
01:08:16,859 --> 01:08:18,990
Do any of these items
fit in this block?

1129
01:08:18,990 --> 01:08:20,790
Just look at the max, here.

1130
01:08:20,790 --> 01:08:23,850
And if these guys are
smaller than the max, here,

1131
01:08:23,850 --> 01:08:24,819
then they belong here.

1132
01:08:24,819 --> 01:08:26,430
So keep inserting there.

1133
01:08:26,430 --> 01:08:29,279
Eventually, we'll get bigger
than the max, then we go here.

1134
01:08:29,279 --> 01:08:30,649
Look at the max item, here.

1135
01:08:30,649 --> 01:08:33,107
As long as we have items here
that are smaller than the max

1136
01:08:33,107 --> 01:08:36,810
here, insert, insert,
insert, insert, and so on.

1137
01:08:36,810 --> 01:08:39,396
And then when we're beyond
all of these-- maybe

1138
01:08:39,396 --> 01:08:40,979
we're immediately
beyond all of these.

1139
01:08:40,979 --> 01:08:42,359
We have to check, oh,
bigger than the max,

1140
01:08:42,359 --> 01:08:44,100
bigger than the max,
bigger than the max.

1141
01:08:44,100 --> 01:08:48,370
Then we put all the remaining
items into the up buffer.

1142
01:08:48,370 --> 01:08:50,025
This is called distribution.

1143
01:09:11,390 --> 01:09:14,670
So we're looking at
level x to the 3/2,

1144
01:09:14,670 --> 01:09:17,103
and just scanning sequentially.

1145
01:09:21,740 --> 01:09:25,340
Because we just sorted them,
we're scanning them in order.

1146
01:09:25,340 --> 01:09:28,640
We visit the down
buffers in order.

1147
01:09:38,460 --> 01:09:41,160
And insert into the appropriate
down buffer as we go.

1148
01:09:43,740 --> 01:09:46,950
Now there's a little bit
that can happen, here.

1149
01:09:46,950 --> 01:09:49,710
Our down buffers
have a limit in size.

1150
01:09:49,710 --> 01:09:53,010
Down buffers are supposed to
have size theta x at level

1151
01:09:53,010 --> 01:09:54,279
x the 3/2.

1152
01:09:54,279 --> 01:09:57,480
So as we're inserting into here,
a down buffer might overflow.

1153
01:09:57,480 --> 01:09:58,680
I've got theta slop, here.

1154
01:09:58,680 --> 01:10:02,100
So when a down buffer overflows,
I just split it in half.

1155
01:10:02,100 --> 01:10:04,254
I then make two down buffers.

1156
01:10:04,254 --> 01:10:05,670
Well, actually,
it would be, like,

1157
01:10:05,670 --> 01:10:07,440
here and right next to it.

1158
01:10:07,440 --> 01:10:10,920
I'm going to maintain a
linked list of down buffers.

1159
01:10:10,920 --> 01:10:14,470
And each of them will have
space for say, 2x items.

1160
01:10:14,470 --> 01:10:19,140
But once I do a split,
they'll both be half full.

1161
01:10:19,140 --> 01:10:40,670
So when a down buffer
overflows, split in half

1162
01:10:40,670 --> 01:10:43,700
and maintain a linked
list of down buffers.

1163
01:10:48,840 --> 01:10:51,700
Another thing that can happen
is when you increase the number

1164
01:10:51,700 --> 01:10:54,230
down buffers, we have a limit
on how many down buffers

1165
01:10:54,230 --> 01:10:55,190
we can have--

1166
01:10:55,190 --> 01:10:57,680
up to x the 1/2 of them.

1167
01:10:57,680 --> 01:11:00,380
So if we run out of down
buffers by splitting,

1168
01:11:00,380 --> 01:11:03,000
then we need to start
using the up buffer.

1169
01:11:03,000 --> 01:11:08,860
So maybe here, we split,
maybe, this buffer--

1170
01:11:08,860 --> 01:11:10,805
is now too many
down buffers total.

1171
01:11:10,805 --> 01:11:12,680
Then we'll just take
all the elements in here

1172
01:11:12,680 --> 01:11:15,290
and stick them into the
up buffer, because that's

1173
01:11:15,290 --> 01:11:18,030
where they belong.

1174
01:11:18,030 --> 01:11:20,840
So when the number
of down buffers

1175
01:11:20,840 --> 01:11:31,600
is too big, when that
number overflows,

1176
01:11:31,600 --> 01:11:42,720
move the last down buffer
into the up buffer.

1177
01:11:46,972 --> 01:11:49,180
So there's basically two
ways that elements are going

1178
01:11:49,180 --> 01:11:50,470
to get into the up buffer.

1179
01:11:50,470 --> 01:11:52,300
One way is that we run
out of down buffers,

1180
01:11:52,300 --> 01:11:54,220
and so the last down
buffer starts getting

1181
01:11:54,220 --> 01:11:56,632
promoted into the up buffer.

1182
01:11:56,632 --> 01:11:58,090
The other possibility
is that we're

1183
01:11:58,090 --> 01:12:01,460
inserting items that are
just really big in value.

1184
01:12:01,460 --> 01:12:03,460
And if the items
that are getting

1185
01:12:03,460 --> 01:12:05,230
promoted from here
into the next level

1186
01:12:05,230 --> 01:12:07,430
just happen to be larger
than all these items,

1187
01:12:07,430 --> 01:12:09,790
we will immediately start
inserting into the up buffer.

1188
01:12:09,790 --> 01:12:11,830
But in general, we have to
look at this down buffer.

1189
01:12:11,830 --> 01:12:12,538
Look at this one.

1190
01:12:12,538 --> 01:12:16,300
Look at this one, then that one.

1191
01:12:16,300 --> 01:12:18,070
That is insert and push.

1192
01:12:18,070 --> 01:12:21,010
I think before I
talk about delete,

1193
01:12:21,010 --> 01:12:23,170
I'd like to talk about the
analysis of just insert

1194
01:12:23,170 --> 01:12:24,040
and push.

1195
01:12:24,040 --> 01:12:25,720
Keep it simple.

1196
01:12:25,720 --> 01:12:28,120
And then I'll briefly
tell you about deletion.

1197
01:12:51,620 --> 01:12:52,550
Oh, I didn't say.

1198
01:12:52,550 --> 01:12:53,210
Sorry.

1199
01:12:53,210 --> 01:12:55,307
There's one more step,
which is the recursion.

1200
01:12:57,990 --> 01:12:59,450
Running out a room, here.

1201
01:12:59,450 --> 01:13:01,394
Maybe I'll go over here.

1202
01:13:01,394 --> 01:13:02,435
This is nothing relevant.

1203
01:13:11,112 --> 01:13:12,320
So I need to recurse somehow.

1204
01:13:15,890 --> 01:13:18,740
At some point, inserting things
into the up offer, the up

1205
01:13:18,740 --> 01:13:19,760
buffer might overflow.

1206
01:13:19,760 --> 01:13:21,980
That's the one last thing
that could overflow.

1207
01:13:21,980 --> 01:13:24,680
When that happens, I just
push it to the next level up.

1208
01:13:52,030 --> 01:13:55,487
So as we do
insertions in here, we

1209
01:13:55,487 --> 01:13:57,070
might start inserting
a lot into here.

1210
01:13:57,070 --> 01:13:58,486
Eventually, this
will get too big.

1211
01:13:58,486 --> 01:14:00,200
It's supposed to
have size x the 3/2.

1212
01:14:00,200 --> 01:14:02,350
If it gets bigger than
that, take all these items

1213
01:14:02,350 --> 01:14:05,620
and just recursively push
them up to the next level.

1214
01:14:05,620 --> 01:14:08,010
And conveniently, that's
exactly the same size

1215
01:14:08,010 --> 01:14:09,010
as we were doing before.

1216
01:14:09,010 --> 01:14:11,690
Here, we did x into
level x to the 3/2.

1217
01:14:11,690 --> 01:14:16,646
Next will be x to the 3/2 into
level x to the 9/4, and so on.

1218
01:14:16,646 --> 01:14:18,020
Always the size
of the up buffer.

1219
01:14:22,040 --> 01:14:25,300
So the claim is if we
look at the push at level

1220
01:14:25,300 --> 01:14:26,260
x to the 3/2--

1221
01:14:26,260 --> 01:14:29,680
which is what we
just described--

1222
01:14:29,680 --> 01:14:31,990
and we ignore the recursion.

1223
01:14:40,910 --> 01:14:51,560
Then the cost is x over B log
base M over B of x over B--

1224
01:14:51,560 --> 01:14:54,590
so sorting bound on x items.

1225
01:14:54,590 --> 01:14:57,800
So we spend that right away
in the very first step.

1226
01:14:57,800 --> 01:15:00,050
We sort x items.

1227
01:15:00,050 --> 01:15:04,592
So that costs x over B. It has
log base M over B of x over B.

1228
01:15:04,592 --> 01:15:06,050
And the whole point
of the analysis

1229
01:15:06,050 --> 01:15:09,380
is to show that this
distribution step doesn't cost

1230
01:15:09,380 --> 01:15:11,120
any more than the sorting step.

1231
01:15:14,180 --> 01:15:19,220
So let's prove this claim.

1232
01:15:19,220 --> 01:15:23,120
And the first
observation-- maybe,

1233
01:15:23,120 --> 01:15:24,830
don't even need to
write this down.

1234
01:15:24,830 --> 01:15:27,392
So remember, with
cache-oblivious b-trees,

1235
01:15:27,392 --> 01:15:28,850
we looked at a
level of detail that

1236
01:15:28,850 --> 01:15:30,224
was sort of the
relevant one that

1237
01:15:30,224 --> 01:15:33,830
straddled B. Now we
have this data structure

1238
01:15:33,830 --> 01:15:36,800
and there's no longer recursive
levels in this picture.

1239
01:15:36,800 --> 01:15:38,297
It's just a list.

1240
01:15:38,297 --> 01:15:40,130
But one of the things
we said is that if you

1241
01:15:40,130 --> 01:15:41,570
look at the very
small structures,

1242
01:15:41,570 --> 01:15:44,120
those are free because
they just stay in cache.

1243
01:15:44,120 --> 01:15:45,140
I'm going to assume--

1244
01:15:45,140 --> 01:15:47,780
it's a little bit of a
bastardization of notation--

1245
01:15:47,780 --> 01:15:52,310
assume that all the levels
up to M fit in cache.

1246
01:15:52,310 --> 01:15:54,800
It's really up to, like,
size M over 2 or something,

1247
01:15:54,800 --> 01:15:56,090
but let's just call it all.

1248
01:15:56,090 --> 01:15:58,550
If x is less than
M, then you know

1249
01:15:58,550 --> 01:16:01,610
all this stuff has size order
M. And so let's just say--

1250
01:16:01,610 --> 01:16:04,400
by redefining what M is
by a constant factor--

1251
01:16:04,400 --> 01:16:06,210
that just fits in cache.

1252
01:16:06,210 --> 01:16:08,210
Because we can assume
whatever cache replacement

1253
01:16:08,210 --> 01:16:11,090
strategy we want,
assume that these things

1254
01:16:11,090 --> 01:16:13,530
stay in cache forever.

1255
01:16:13,530 --> 01:16:20,600
So all of that bottom
stuff, size up to M,

1256
01:16:20,600 --> 01:16:23,930
is permanently in cache
and costs zero to access.

1257
01:16:23,930 --> 01:16:25,790
So those levels are for free.

1258
01:16:25,790 --> 01:16:28,704
So it's all about when we
touch the upper levels.

1259
01:16:28,704 --> 01:16:30,245
And, really, the
most important level

1260
01:16:30,245 --> 01:16:32,180
would be the transition
from the things that

1261
01:16:32,180 --> 01:16:35,270
fit in cache to the next level
up that does not fit in cache.

1262
01:16:35,270 --> 01:16:36,920
That's going to
be the key level.

1263
01:16:36,920 --> 01:16:44,020
But in general,
let's look at push.

1264
01:16:44,020 --> 01:16:57,500
Or let's just assume x to
the 3/2 is bigger than M.

1265
01:16:57,500 --> 01:17:00,530
Because we're looking at a
push at level x to the 3/2.

1266
01:17:00,530 --> 01:17:04,970
And just by definition, if
it's less than M, it's free.

1267
01:17:04,970 --> 01:17:06,701
Question?

1268
01:17:06,701 --> 01:17:08,728
AUDIENCE: So how can
you assume that all

1269
01:17:08,728 --> 01:17:11,471
of the things below
that particular size

1270
01:17:11,471 --> 01:17:12,902
always stay in cache empty?

1271
01:17:12,902 --> 01:17:15,630
So you're making assumptions
on the cache replacement?

1272
01:17:15,630 --> 01:17:16,520
ERIK DEMAINE: I'm
making assumption

1273
01:17:16,520 --> 01:17:18,645
on the cache replacement
strategy, which I actually

1274
01:17:18,645 --> 01:17:19,940
made last lecture.

1275
01:17:19,940 --> 01:17:22,850
So I said, magically
assume that we

1276
01:17:22,850 --> 01:17:25,640
use optimal cache replacement.

1277
01:17:25,640 --> 01:17:29,820
So whatever I choose, opt is
going to be better than that.

1278
01:17:29,820 --> 01:17:30,740
So you're right.

1279
01:17:30,740 --> 01:17:33,320
The algorithm doesn't get to
choose what stays in cache.

1280
01:17:33,320 --> 01:17:35,660
But for analysis purposes,
I can say well, suppose

1281
01:17:35,660 --> 01:17:36,951
all these things stay in cache.

1282
01:17:36,951 --> 01:17:39,440
If I prove an upper
bound in that world,

1283
01:17:39,440 --> 01:17:42,790
then the optimal
replacement will do better.

1284
01:17:42,790 --> 01:17:46,850
And the LRU or FIFO
replacement will

1285
01:17:46,850 --> 01:17:49,940
do within a constant
factor of that,

1286
01:17:49,940 --> 01:17:52,280
again, by changing M
by a constant factor.

1287
01:17:52,280 --> 01:17:54,830
So I'm freely throwing away
constant factors in my cache

1288
01:17:54,830 --> 01:17:58,080
size, but then I will get
that these effectively

1289
01:17:58,080 --> 01:17:58,960
stay in cache.

1290
01:17:58,960 --> 01:18:04,010
FIFO or LRU will do that just
as well, or almost as well.

1291
01:18:04,010 --> 01:18:05,740
Good question.

1292
01:18:05,740 --> 01:18:08,465
So now we can just look at
the pushes above that level.

1293
01:18:08,465 --> 01:18:10,590
And I really want to look
at this transition level,

1294
01:18:10,590 --> 01:18:12,548
but we're going to have
to look at all of them.

1295
01:18:12,548 --> 01:18:13,500
So let's do this.

1296
01:18:13,500 --> 01:18:19,850
Now I also have tall
cache assumption.

1297
01:18:19,850 --> 01:18:22,460
M is at least B to
the 1 plus epsilon.

1298
01:18:22,460 --> 01:18:25,170
I actually want a somewhat
bigger assumption,

1299
01:18:25,170 --> 01:18:29,120
which is that M is greater
than or equal to B squared.

1300
01:18:29,120 --> 01:18:31,940
If it's not B squared,
you have to change

1301
01:18:31,940 --> 01:18:35,780
this 3/2 and 9/4 and stuff
to be something a little bit

1302
01:18:35,780 --> 01:18:36,860
bigger than 1.

1303
01:18:36,860 --> 01:18:38,700
And it gets really messy
to write that down.

1304
01:18:38,700 --> 01:18:41,579
So this data structure with
appropriate modification

1305
01:18:41,579 --> 01:18:43,370
does work for other
tall cache assumptions,

1306
01:18:43,370 --> 01:18:44,780
but let's just assume this one.

1307
01:18:44,780 --> 01:18:47,750
So this means M over B
is at least B. That's

1308
01:18:47,750 --> 01:18:49,940
sort of the clean statement.

1309
01:18:49,940 --> 01:18:56,330
It's true for most caches also,
but this will be my assumption.

1310
01:18:56,330 --> 01:18:57,640
OK.

1311
01:18:57,640 --> 01:18:58,190
Cool.

1312
01:18:58,190 --> 01:19:03,050
So if we just do some algebra.

1313
01:19:03,050 --> 01:19:08,370
Claim is this is x is
greater than B to the 4/3.

1314
01:19:08,370 --> 01:19:11,930
So that's just taking
this inequality,

1315
01:19:11,930 --> 01:19:14,420
x the 3/2 is greater than
or equal to B squared

1316
01:19:14,420 --> 01:19:16,670
and raising to
the exponent, 2/3.

1317
01:19:16,670 --> 01:19:18,680
And so this turns
into x and this

1318
01:19:18,680 --> 01:19:21,570
turns into B to the 2 times
2/3, which would be the 4/3.

1319
01:19:21,570 --> 01:19:26,120
So x is quite a
bit bigger than B.

1320
01:19:26,120 --> 01:19:30,800
In particular, this means that
x over B is bigger than 1,

1321
01:19:30,800 --> 01:19:31,460
by a lot.

1322
01:19:31,460 --> 01:19:35,120
But if we take ceiling's
on this, no big deal.

1323
01:19:35,120 --> 01:19:37,751
So we don't have to
worry about the ceilings.

1324
01:19:37,751 --> 01:19:38,250
All right.

1325
01:19:38,250 --> 01:19:46,380
Now the claim is that the
distribution step costs--

1326
01:19:46,380 --> 01:19:48,600
this is really the
interesting part--

1327
01:19:48,600 --> 01:19:54,030
x over B plus x to the
1/2 memory transfers.

1328
01:19:57,060 --> 01:19:58,050
OK.

1329
01:19:58,050 --> 01:19:59,730
Why?

1330
01:19:59,730 --> 01:20:03,590
Remember, up here we have
x to 1/2 down buffers

1331
01:20:03,590 --> 01:20:04,565
that we're looking at.

1332
01:20:04,565 --> 01:20:05,940
And we're visiting
them in order.

1333
01:20:05,940 --> 01:20:08,100
We touch this down buffer.

1334
01:20:08,100 --> 01:20:12,990
And really, we just care
about the max, here.

1335
01:20:12,990 --> 01:20:15,540
And then we start writing
elements one by one.

1336
01:20:15,540 --> 01:20:17,700
But if we write
elements one by one,

1337
01:20:17,700 --> 01:20:20,231
the very first element we
write costs an entire memory

1338
01:20:20,231 --> 01:20:20,730
transfer.

1339
01:20:20,730 --> 01:20:22,300
We have to load in a block.

1340
01:20:22,300 --> 01:20:25,300
But then we can fill that block
and then write out that block.

1341
01:20:25,300 --> 01:20:28,270
So we get to write out
B items in one step.

1342
01:20:28,270 --> 01:20:31,050
So we pay x to
the 1/2 because we

1343
01:20:31,050 --> 01:20:35,862
have to touch the last
block of this down buffer,

1344
01:20:35,862 --> 01:20:37,320
this down buffer,
this down buffer.

1345
01:20:37,320 --> 01:20:39,840
And they're x to the
1/2 down buffers.

1346
01:20:39,840 --> 01:20:41,400
So each one, we have to pay 1.

1347
01:20:41,400 --> 01:20:42,150
That's this part.

1348
01:20:42,150 --> 01:20:43,377
That's the expensive part.

1349
01:20:43,377 --> 01:20:45,210
But then, once we're
actually writing items,

1350
01:20:45,210 --> 01:20:48,900
if we stay here for a while,
that's basically for free.

1351
01:20:48,900 --> 01:20:51,000
Overall, we're writing x items.

1352
01:20:51,000 --> 01:20:55,920
And so to write them all will
only take x over B ceiling,

1353
01:20:55,920 --> 01:20:58,290
sort of, over the
entire summation.

1354
01:20:58,290 --> 01:21:02,700
But we have to pay 1 to start
out here, here, and here.

1355
01:21:02,700 --> 01:21:05,610
So this is the real
amount of time.

1356
01:21:05,610 --> 01:21:08,000
And I want to amortize
that, essentially.

1357
01:21:10,800 --> 01:21:14,610
So there's two cases now.

1358
01:21:14,610 --> 01:21:20,490
If x is greater than or equal
to B squared, then we're happy.

1359
01:21:20,490 --> 01:21:23,100
x is greater than or
equal to B squared,

1360
01:21:23,100 --> 01:21:27,510
then this term dominates.

1361
01:21:27,510 --> 01:21:29,550
And so this is tinier.

1362
01:21:29,550 --> 01:21:30,280
All right.

1363
01:21:30,280 --> 01:21:35,010
Say, x is B cubed, then
this is x to the 3/2--

1364
01:21:35,010 --> 01:21:36,360
sorry, this is the B to the 3/2.

1365
01:21:36,360 --> 01:21:38,880
This is B squared,
so this is bigger.

1366
01:21:38,880 --> 01:21:44,610
And so then the cost
is just x over B.

1367
01:21:44,610 --> 01:21:46,800
And we're done, because
we needed to prove

1368
01:21:46,800 --> 01:21:48,080
it's x over B times log.

1369
01:21:48,080 --> 01:21:51,690
We had to do this to sort, but
this distribution will be free.

1370
01:21:51,690 --> 01:21:55,740
Now that says all the
high levels are free.

1371
01:21:55,740 --> 01:21:56,990
All the low levels are free.

1372
01:21:59,394 --> 01:22:01,560
If you're less than B
squared, then your less than M

1373
01:22:01,560 --> 01:22:02,354
and you're free.

1374
01:22:02,354 --> 01:22:04,770
It's saying, if you're bigger
than B squared, you're free.

1375
01:22:04,770 --> 01:22:07,710
But there's going to be one
level right in between where

1376
01:22:07,710 --> 01:22:09,435
you're not really
bigger or smaller.

1377
01:22:13,100 --> 01:22:20,770
So there's one level where it's
going to B to the 4/3 less than

1378
01:22:20,770 --> 01:22:23,850
or equal to x less than
or equal to B squared,

1379
01:22:23,850 --> 01:22:25,860
strictly less than.

1380
01:22:25,860 --> 01:22:29,280
Because you're jumping in
this doubly exponential way,

1381
01:22:29,280 --> 01:22:31,180
you might miss slightly.

1382
01:22:31,180 --> 01:22:33,530
And so you're in between
these two levels.

1383
01:22:36,920 --> 01:22:37,440
Why is this?

1384
01:22:37,440 --> 01:22:40,530
Because we only know that x
to the 3/2 is less than M.

1385
01:22:40,530 --> 01:22:44,400
We don't know that
x is less than M.

1386
01:22:44,400 --> 01:22:46,662
So we have a bit of slot there.

1387
01:22:52,790 --> 01:22:59,720
So then at this transition
point, what we do

1388
01:22:59,720 --> 01:23:06,140
is say, OK, we can't afford
to store this whole-- x to 3/2

1389
01:23:06,140 --> 01:23:07,760
is not less than M.
So we can't afford

1390
01:23:07,760 --> 01:23:09,290
to store all this in the cache.

1391
01:23:09,290 --> 01:23:12,825
But we can afford to store
the last block of this guy

1392
01:23:12,825 --> 01:23:14,450
and the last block
of this guy, all the

1393
01:23:14,450 --> 01:23:17,450
down buffers we can store
the last block in cache.

1394
01:23:17,450 --> 01:23:19,340
Why?

1395
01:23:19,340 --> 01:23:25,410
Because there's only
x to the 1/2 of them.

1396
01:23:25,410 --> 01:23:28,230
And we know that x is
less than B squared.

1397
01:23:28,230 --> 01:23:32,690
So if x is less than B
squared, x to the 1/2

1398
01:23:32,690 --> 01:23:36,320
is less than B, which
is less than M over B

1399
01:23:36,320 --> 01:23:40,560
because M is least B squared,
by tall cache assumption.

1400
01:23:40,560 --> 01:23:43,505
So this is the number of blocks
that we can afford in cache.

1401
01:23:43,505 --> 01:23:44,880
This is number of
blocks we want.

1402
01:23:44,880 --> 01:23:47,220
I want 1 block
for each of these.

1403
01:23:47,220 --> 01:23:49,610
So basically then,
this x to the one half

1404
01:23:49,610 --> 01:23:53,520
term disappears for the
one transition level.

1405
01:23:53,520 --> 01:23:56,380
So this is free.

1406
01:23:56,380 --> 01:24:11,720
Because you can afford 1 block
per down buffer in cache.

1407
01:24:11,720 --> 01:24:16,620
And so, again, we
get an x over B cost.

1408
01:24:16,620 --> 01:24:19,100
So the distribution
is basically free.

1409
01:24:19,100 --> 01:24:21,770
The hard part is the sorting.

1410
01:24:21,770 --> 01:24:24,200
And then the idea
is that, well, OK--

1411
01:24:24,200 --> 01:24:26,360
now this is one push.

1412
01:24:26,360 --> 01:24:29,930
When we do an insert, that item
might get pushed many times,

1413
01:24:29,930 --> 01:24:33,200
but basically can only
get pushed once per level.

1414
01:24:33,200 --> 01:24:37,970
So you end up taking this cost
and summing it over all x.

1415
01:24:37,970 --> 01:24:42,500
So if you look at an insertion,
amortize what you pay--

1416
01:24:42,500 --> 01:24:45,170
or you look at the sum
over all these things.

1417
01:24:48,680 --> 01:24:52,310
You get the sorting bound
on x-- summed over x where x

1418
01:24:52,310 --> 01:24:54,470
is growing doubly exponential.

1419
01:24:54,470 --> 01:24:56,900
And so this becomes a
geometric series, or even

1420
01:24:56,900 --> 01:24:58,310
super geometric.

1421
01:24:58,310 --> 01:25:01,190
And so you get--

1422
01:25:01,190 --> 01:25:03,650
I guess I should look
at it per element.

1423
01:25:03,650 --> 01:25:06,560
You look at the amortized
cost per element,

1424
01:25:06,560 --> 01:25:08,780
so I get to divide by x.

1425
01:25:08,780 --> 01:25:11,290
Because when I do a
push, I push x elements.

1426
01:25:11,290 --> 01:25:12,980
So I get to divide by x, here.

1427
01:25:12,980 --> 01:25:15,450
And then if an element
gets pushed to all levels,

1428
01:25:15,450 --> 01:25:17,500
I have to sum over
all these x's.

1429
01:25:17,500 --> 01:25:20,920
But you do this summation
and you get order--

1430
01:25:20,920 --> 01:25:22,640
it's dominated by the last term.

1431
01:25:27,360 --> 01:25:32,330
Sorry, this should be N.

1432
01:25:32,330 --> 01:25:33,350
This is the fun part.

1433
01:25:33,350 --> 01:25:35,630
We're taking logs, here,
but conveniently this

1434
01:25:35,630 --> 01:25:38,660
was doubly exponential
growing, the x over B.

1435
01:25:38,660 --> 01:25:41,900
So when we sum these
is singly exponential.

1436
01:25:41,900 --> 01:25:44,020
So it's a geometric series.

1437
01:25:44,020 --> 01:25:46,850
And so we are just dominated
by the last term, which

1438
01:25:46,850 --> 01:25:48,890
is where we get log base
M over B of N over B.

1439
01:25:48,890 --> 01:25:54,320
And this is our amortized
cost per insertion.

1440
01:25:54,320 --> 01:25:56,460
Deletions are
basically the same.

1441
01:25:56,460 --> 01:25:58,274
You just two pulls
instead of pushes.

1442
01:25:58,274 --> 01:26:00,440
And you have, sort of, the
reverse of a distribution

1443
01:26:00,440 --> 01:26:02,675
step-- in some
ways, even simpler.

1444
01:26:02,675 --> 01:26:05,840
You don't have to do
this clever analysis.

1445
01:26:05,840 --> 01:26:06,340
Question?

1446
01:26:06,340 --> 01:26:07,974
AUDIENCE: Can you
explain, once again,

1447
01:26:07,974 --> 01:26:11,173
how is it that you got the
distributed cost of x over B?

1448
01:26:11,173 --> 01:26:12,891
So I understand
that every time, you

1449
01:26:12,891 --> 01:26:15,703
need to pay that x to the
1/2 because you need to load.

1450
01:26:15,703 --> 01:26:17,170
But where did the x over B come?

1451
01:26:17,170 --> 01:26:17,900
ERIK DEMAINE: This
is essentially

1452
01:26:17,900 --> 01:26:19,190
amortized per element.

1453
01:26:19,190 --> 01:26:22,580
We're just paying 1 over B.
Once the block has been loaded--

1454
01:26:22,580 --> 01:26:24,590
it's only after we
insert B items that we

1455
01:26:24,590 --> 01:26:26,520
have to load another item.

1456
01:26:26,520 --> 01:26:29,940
So that's why it's
x over B, here.

1457
01:26:29,940 --> 01:26:31,021
Good.

1458
01:26:31,021 --> 01:26:31,520
Question?

1459
01:26:31,520 --> 01:26:34,364
AUDIENCE: Which buffers do
we keep sorted at all times?

1460
01:26:34,364 --> 01:26:35,530
ERIK DEMAINE: Which buffer--

1461
01:26:35,530 --> 01:26:36,350
AUDIENCE: Which buffers
do we keep sorted?

1462
01:26:36,350 --> 01:26:38,891
ERIK DEMAINE: We're only going
to keep the very bottom buffer

1463
01:26:38,891 --> 01:26:41,330
sorted at all times.

1464
01:26:41,330 --> 01:26:42,830
So I mean, it doesn't
really matter.

1465
01:26:42,830 --> 01:26:43,970
You don't even have
to keep those sorted,

1466
01:26:43,970 --> 01:26:45,410
because you can afford
to look at all of them.

1467
01:26:45,410 --> 01:26:47,048
AUDIENCE: [INAUDIBLE] I
think the upper levels when

1468
01:26:47,048 --> 01:26:49,160
we're trying to figure
out where it goes?

1469
01:26:49,160 --> 01:26:50,630
ERIK DEMAINE: What we do
need-- we don't need them

1470
01:26:50,630 --> 01:26:51,350
in sorted order.

1471
01:26:51,350 --> 01:26:55,130
But we need to know
where the max is, yeah.

1472
01:26:55,130 --> 01:26:58,520
So I guess, maybe, maintain
a linked list of the items

1473
01:26:58,520 --> 01:26:59,640
would be one way to do it.

1474
01:26:59,640 --> 01:27:01,880
So the sort order is in there,
but they're not physically

1475
01:27:01,880 --> 01:27:02,900
stored in sorted order.

1476
01:27:02,900 --> 01:27:05,921
That would be a little bit
too much to hope for, I think.

1477
01:27:05,921 --> 01:27:06,420
Yeah.

1478
01:27:06,420 --> 01:27:07,919
We do need to keep
track of the max,

1479
01:27:07,919 --> 01:27:11,110
but that's easy to do
as you're inserting.

1480
01:27:11,110 --> 01:27:12,680
Cool.

1481
01:27:12,680 --> 01:27:13,790
So that's priority queues.

1482
01:27:13,790 --> 01:27:16,670
You can look at the
notes for deletions.