1
00:00:00,090 --> 00:00:01,800
The following
content is provided

2
00:00:01,800 --> 00:00:04,040
under a Creative
Commons license.

3
00:00:04,040 --> 00:00:06,880
Your support will help MIT
OpenCourseWare continue

4
00:00:06,880 --> 00:00:10,740
to offer high quality
educational resources for free.

5
00:00:10,740 --> 00:00:13,350
To make a donation or
view additional materials

6
00:00:13,350 --> 00:00:15,800
from hundreds of
MIT courses, visit

7
00:00:15,800 --> 00:00:21,994
MIT OpenCourseWare
at ocw.mit.edu

8
00:00:21,994 --> 00:00:24,850
PROFESSOR: All right,
let's get started.

9
00:00:24,850 --> 00:00:27,810
We return today to graph search.

10
00:00:27,810 --> 00:00:29,950
Last time we saw breadth-first
search, today we're

11
00:00:29,950 --> 00:00:31,672
going to do depth-first search.

12
00:00:31,672 --> 00:00:34,130
It's a simple algorithm, but
you can do lots of cool things

13
00:00:34,130 --> 00:00:34,570
with it.

14
00:00:34,570 --> 00:00:36,236
And that's what I'll
spend most of today

15
00:00:36,236 --> 00:00:39,680
on, in particular, telling
whether your graph has a cycle,

16
00:00:39,680 --> 00:00:42,680
and something called
topological sort.

17
00:00:42,680 --> 00:00:45,970
As usual, basically in
all graph algorithms

18
00:00:45,970 --> 00:00:48,790
in this class, the input, the
way the graph is specified

19
00:00:48,790 --> 00:00:52,840
is as an adjacency list, or I
guess adjacency list plural.

20
00:00:52,840 --> 00:00:56,460
So you have a bunch of lists,
each one says for each vertex,

21
00:00:56,460 --> 00:00:58,550
what are the vertices
I'm connected to?

22
00:00:58,550 --> 00:01:02,900
What are the vertices I can
get to in one step via an edge?

23
00:01:02,900 --> 00:01:05,040
So that's our
input and our goal,

24
00:01:05,040 --> 00:01:08,630
in general, with graph search
is to explore the graph.

25
00:01:08,630 --> 00:01:10,390
In particular, the
kind of exploration

26
00:01:10,390 --> 00:01:13,430
we're going to be doing today
is to visit all the vertices,

27
00:01:13,430 --> 00:01:17,416
in some order, and visit
each vertex only once.

28
00:01:17,416 --> 00:01:19,040
So the way we did
breadth-first search,

29
00:01:19,040 --> 00:01:20,581
breadth-first search
was really good.

30
00:01:20,581 --> 00:01:22,830
It explored things
layer by layer,

31
00:01:22,830 --> 00:01:25,340
and that was nice because
it gave us shortest paths,

32
00:01:25,340 --> 00:01:28,830
it gave us the fastest
way to get to everywhere,

33
00:01:28,830 --> 00:01:31,490
from a particular
source, vertex s.

34
00:01:31,490 --> 00:01:34,210
But if you can't get
from s to your vertex,

35
00:01:34,210 --> 00:01:37,410
than the shortest way to
get there is infinity,

36
00:01:37,410 --> 00:01:39,440
there's no way to get there.

37
00:01:39,440 --> 00:01:41,790
And BFS is good for detecting
that, it can tell you

38
00:01:41,790 --> 00:01:46,490
which vertices are
unreachable from s.

39
00:01:46,490 --> 00:01:50,130
DFS can do that as
well, but it's often

40
00:01:50,130 --> 00:01:52,390
used to explore the
whole graph, not just

41
00:01:52,390 --> 00:01:54,134
the part reachable
from s, and so

42
00:01:54,134 --> 00:01:55,800
we're going to see
how to do that today.

43
00:01:55,800 --> 00:01:58,580
This trick could be used
for be BFS or for DFS,

44
00:01:58,580 --> 00:02:01,590
but we're going to do it
here for DFS, because that's

45
00:02:01,590 --> 00:02:02,840
more common, let's say.

46
00:02:07,080 --> 00:02:09,014
So DFS.

47
00:02:21,110 --> 00:02:24,930
So depth-first search is kind
of like how you solve a maze.

48
00:02:24,930 --> 00:02:27,350
Like, the other weekend
I was at the big corn

49
00:02:27,350 --> 00:02:32,050
maze in central
Massachusetts, and it's

50
00:02:32,050 --> 00:02:34,584
easy to get lost in
there, in particular,

51
00:02:34,584 --> 00:02:36,250
because I didn't bring
any bread crumbs.

52
00:02:36,250 --> 00:02:39,160
The proper way to solve a
maze, if you're in there

53
00:02:39,160 --> 00:02:41,950
and all you can do is see which
way to go next and then walk

54
00:02:41,950 --> 00:02:43,730
a little bit to
the next junction,

55
00:02:43,730 --> 00:02:45,970
and then you have to
keep making decisions.

56
00:02:45,970 --> 00:02:49,490
Unless you have a really
good memory, which I do not,

57
00:02:49,490 --> 00:02:53,780
teaching staff can attest to
that, then an easy way to do it

58
00:02:53,780 --> 00:02:55,720
is to leave bread
crumbs behind, say,

59
00:02:55,720 --> 00:02:58,710
this is the last way
I went from this node,

60
00:02:58,710 --> 00:03:00,974
so that when I
reach a deadend, I

61
00:03:00,974 --> 00:03:02,390
have to turn around
and backtrack.

62
00:03:02,390 --> 00:03:04,450
I reach a breadcrumb that
say, oh, last time you

63
00:03:04,450 --> 00:03:07,160
went this way, next time
you should go this way,

64
00:03:07,160 --> 00:03:10,570
and in particular, keep track
at each node, which of the edges

65
00:03:10,570 --> 00:03:14,890
have I visited, which ones
are still left to visit.

66
00:03:14,890 --> 00:03:18,910
And this can be done very easily
on a computer using recursion.

67
00:03:30,520 --> 00:03:32,140
So high-level
description is we're

68
00:03:32,140 --> 00:03:37,400
going to just recursively
explore the graph,

69
00:03:37,400 --> 00:03:42,495
backtracking as necessary, kind
of like how you solve a maze.

70
00:03:54,980 --> 00:03:59,210
In fact, when I was
seven years old,

71
00:03:59,210 --> 00:04:00,960
one of the first
computer programs I wrote

72
00:04:00,960 --> 00:04:01,918
was for solving a maze.

73
00:04:01,918 --> 00:04:04,140
I didn't know it was
depth-first search at the time,

74
00:04:04,140 --> 00:04:04,810
but now I know.

75
00:04:11,050 --> 00:04:12,690
It was so much harder
doing algorithms

76
00:04:12,690 --> 00:04:15,540
when I didn't know
what they were.

77
00:04:15,540 --> 00:04:20,779
Anyway, I'm going to write some
code for depth-first search,

78
00:04:20,779 --> 00:04:26,900
it is super simple code, the
simplest graph algorithm.

79
00:04:49,175 --> 00:04:50,255
It's four lines.

80
00:05:05,590 --> 00:05:06,090
That's it.

81
00:05:06,090 --> 00:05:08,280
I'm going to write a little
bit of code after this,

82
00:05:08,280 --> 00:05:11,500
but this is basic
depth-first search.

83
00:05:11,500 --> 00:05:13,580
This will visit all
the vertices reachable

84
00:05:13,580 --> 00:05:16,480
from a given source, vertex s.

85
00:05:16,480 --> 00:05:19,780
So we're given the
adjacency list.

86
00:05:19,780 --> 00:05:22,130
I don't know why I put v
here, you could erase it,

87
00:05:22,130 --> 00:05:24,380
it's not necessary.

88
00:05:24,380 --> 00:05:29,030
And all we do is, we
have our vertex b, sorry,

89
00:05:29,030 --> 00:05:31,300
we have our vertex s.

90
00:05:31,300 --> 00:05:35,930
We look at all of the
outgoing edges from s.

91
00:05:35,930 --> 00:05:40,950
For each one, we'll
call it v, we check,

92
00:05:40,950 --> 00:05:42,770
have I visited this
vertex already?

93
00:05:45,422 --> 00:05:46,880
A place where we
need to be careful

94
00:05:46,880 --> 00:05:49,160
is to not repeat vertices.

95
00:05:49,160 --> 00:05:50,980
We need to do this
in BFS as well.

96
00:05:56,110 --> 00:05:58,430
So, the way we're
going to do that

97
00:05:58,430 --> 00:06:00,450
is by setting the
parent of a node,

98
00:06:00,450 --> 00:06:03,210
we'll see what that
actually means later.

99
00:06:03,210 --> 00:06:05,940
But for now, it's just, are you
in the parent structure or not?

100
00:06:05,940 --> 00:06:09,600
This is initially, we've
seen s, so we give it

101
00:06:09,600 --> 00:06:14,250
a parent of nothing, but it
exists in this dictionary.

102
00:06:14,250 --> 00:06:16,830
If the vertex b that
we're looking at

103
00:06:16,830 --> 00:06:19,300
is not in our dictionary,
we haven't seen it yet,

104
00:06:19,300 --> 00:06:23,190
we mark it as seen by
setting its parent to s,

105
00:06:23,190 --> 00:06:25,060
and then we
recursively visit it.

106
00:06:25,060 --> 00:06:26,310
That's it.

107
00:06:26,310 --> 00:06:29,120
Super simple, just recurse.

108
00:06:29,120 --> 00:06:32,130
Sort of the magical part
is the preventing yourself

109
00:06:32,130 --> 00:06:34,070
from repeating.

110
00:06:34,070 --> 00:06:37,250
As you explore the graph,
if you reach something

111
00:06:37,250 --> 00:06:39,950
you've already seen before
you just skip it again.

112
00:06:39,950 --> 00:06:45,010
So you only visit every
vertex once, at most once.

113
00:06:45,010 --> 00:06:47,260
This will not visit
the entire graph,

114
00:06:47,260 --> 00:06:50,840
it will only visit the
vertices reachable from s.

115
00:06:50,840 --> 00:06:52,940
The next part of the
code I'd like to give you

116
00:06:52,940 --> 00:06:56,920
is for visiting all the
vertices, and in the textbook

117
00:06:56,920 --> 00:06:58,820
this is called the DFS,
whereas this is just

118
00:06:58,820 --> 00:07:02,180
called DFS visit, that's
sort of the recursive part,

119
00:07:02,180 --> 00:07:08,330
and this is sort of a
top level algorithm.

120
00:07:08,330 --> 00:07:19,840
Here we are going to use
the set of vertices, b,

121
00:07:19,840 --> 00:07:22,040
and here we're just going
to iterate over the s's.

122
00:07:47,960 --> 00:07:51,150
So it looks almost the same,
but what we're iterating over

123
00:07:51,150 --> 00:07:52,200
is different.

124
00:07:52,200 --> 00:07:55,720
Here we're iterating over
the outgoing edges from s,

125
00:07:55,720 --> 00:07:57,855
here were iterating
over the choices of s.

126
00:08:03,190 --> 00:08:05,239
So the idea here
is we don't really

127
00:08:05,239 --> 00:08:06,530
know where to start our search.

128
00:08:06,530 --> 00:08:09,154
If it's a disconnected graph or
not a strongly connected graph,

129
00:08:09,154 --> 00:08:12,330
we might have to start
our search multiple times.

130
00:08:12,330 --> 00:08:15,520
This DFS algorithm is finding
all the possible places

131
00:08:15,520 --> 00:08:19,290
you might start the search
and trying them all.

132
00:08:19,290 --> 00:08:21,320
So it's like, OK, let's
try the first vertex.

133
00:08:21,320 --> 00:08:23,778
If that hasn't been visited,
which initially nothing's been

134
00:08:23,778 --> 00:08:27,380
visited, then visit it,
recursively, everything

135
00:08:27,380 --> 00:08:29,010
reachable from s.

136
00:08:29,010 --> 00:08:30,630
Then you go on to
the second vertex.

137
00:08:30,630 --> 00:08:32,480
Now, you may have already
visited it, then you skip it.

138
00:08:32,480 --> 00:08:34,271
Third vertex, maybe
you visited it already.

139
00:08:34,271 --> 00:08:36,250
Third, fourth
vertex, keep going,

140
00:08:36,250 --> 00:08:39,049
until you find some vertex
you haven't visited at all.

141
00:08:39,049 --> 00:08:42,990
And then you recursively visit
everything reachable from it,

142
00:08:42,990 --> 00:08:45,400
and you repeat.

143
00:08:45,400 --> 00:08:48,400
This will find all the
different clusters,

144
00:08:48,400 --> 00:08:50,480
all the different strongly
connected components

145
00:08:50,480 --> 00:08:51,630
of your graph.

146
00:08:51,630 --> 00:08:54,190
Most of the work is being
done by this recursion,

147
00:08:54,190 --> 00:08:55,970
but then there's
this top level, just

148
00:08:55,970 --> 00:08:59,090
to make sure that all
the vertices get visited.

149
00:08:59,090 --> 00:09:03,380
Let's do a little example,
so this is super clear,

150
00:09:03,380 --> 00:09:07,410
and then it will also
let me do something

151
00:09:07,410 --> 00:09:09,480
called edge classification.

152
00:09:09,480 --> 00:09:13,340
Once we see every
edge in the graph

153
00:09:13,340 --> 00:09:15,870
gets visited by DFS
in one way or another,

154
00:09:15,870 --> 00:09:18,820
and it's really helpful to
think about the different ways

155
00:09:18,820 --> 00:09:20,910
they can be visited.

156
00:09:20,910 --> 00:09:25,810
So here's a graph.

157
00:09:25,810 --> 00:09:29,010
I think its a similar
to one from last class.

158
00:09:46,160 --> 00:09:50,220
It's not strongly
connected, I don't think,

159
00:09:50,220 --> 00:09:53,960
so you can't get from
these vertices to c.

160
00:09:53,960 --> 00:09:55,510
You can get from
c to everywhere,

161
00:09:55,510 --> 00:10:00,110
it looks like, but not
strongly connected.

162
00:10:00,110 --> 00:10:02,820
And we're going to run
DFS, and I think, basically

163
00:10:02,820 --> 00:10:06,480
in alphabetical order
is how we're imagining--

164
00:10:06,480 --> 00:10:08,230
these vertices have
to be ordered somehow,

165
00:10:08,230 --> 00:10:12,680
we don't really care how, but
for sake of example I care.

166
00:10:12,680 --> 00:10:15,610
So we're going to
start with a, that's

167
00:10:15,610 --> 00:10:17,029
the first vertex in here.

168
00:10:17,029 --> 00:10:19,570
We're going to recursively visit
everything reachable from a,

169
00:10:19,570 --> 00:10:22,750
so we enter here
with s equals a.

170
00:10:22,750 --> 00:10:30,275
So I'll mark this s1, to be the
first value of s at this level.

171
00:10:33,070 --> 00:10:37,180
So we consider-- I'm going
to check the order here--

172
00:10:37,180 --> 00:10:39,345
first edge we look at,
there's two outgoing edges,

173
00:10:39,345 --> 00:10:40,845
let's say we look
at this one first.

174
00:10:46,230 --> 00:10:48,950
We look at b, b has
not been visited yet,

175
00:10:48,950 --> 00:10:50,570
has no parent pointer.

176
00:10:50,570 --> 00:10:54,040
This one has a
parent pointer of 0.

177
00:10:54,040 --> 00:10:59,560
B we're going to give a parent
pointer of a, that's here.

178
00:10:59,560 --> 00:11:01,970
Then we recursively
visit everything for b.

179
00:11:01,970 --> 00:11:04,670
So we look at all the outgoing
edges from b, there's only one.

180
00:11:04,670 --> 00:11:05,750
So we visit this edge.

181
00:11:09,230 --> 00:11:11,160
for b to e. e has
not been visited,

182
00:11:11,160 --> 00:11:15,200
so we set as parent pointer to
b, an now we recursively visit

183
00:11:15,200 --> 00:11:16,451
e.

184
00:11:16,451 --> 00:11:22,590
e has only one outgoing edge, so
we look at it, over here to d.

185
00:11:25,230 --> 00:11:29,286
d has not been visited, so
we set a parent pointer to e,

186
00:11:29,286 --> 00:11:31,160
and we look at all the
outgoing edges from d.

187
00:11:31,160 --> 00:11:33,170
d has one outgoing
edge, which is

188
00:11:33,170 --> 00:11:35,760
to b. b has already
been visited,

189
00:11:35,760 --> 00:11:38,530
so we skip that
one, nothing to do.

190
00:11:38,530 --> 00:11:42,720
That's the else case
of this if, so we

191
00:11:42,720 --> 00:11:45,730
do nothing in the else case,
we just go to the next edge.

192
00:11:45,730 --> 00:11:48,450
But there's no next edge
for d, so we're done.

193
00:11:48,450 --> 00:11:52,440
So this algorithm returns
to the next level up.

194
00:11:52,440 --> 00:11:54,220
Next level up was
e, we were iterating

195
00:11:54,220 --> 00:11:55,690
over the outgoing edges from e.

196
00:11:55,690 --> 00:11:59,870
But there was only one, so
we're done, so e finishes.

197
00:11:59,870 --> 00:12:05,340
Then we backtrack to b,
which is always going back

198
00:12:05,340 --> 00:12:07,420
along the parent pointer,
but it's also just

199
00:12:07,420 --> 00:12:08,500
in the recursion.

200
00:12:08,500 --> 00:12:10,915
We know where to go back to.

201
00:12:10,915 --> 00:12:13,540
We were going over the outgoing
edges from b, there's only one,

202
00:12:13,540 --> 00:12:15,610
we're done.

203
00:12:15,610 --> 00:12:16,960
So we go back to a.

204
00:12:16,960 --> 00:12:18,910
We only looked at one
outgoing edge from a.

205
00:12:18,910 --> 00:12:22,130
There's another outgoing
edge, which is this one,

206
00:12:22,130 --> 00:12:24,880
but we've already visited
d, so we skip over that one,

207
00:12:24,880 --> 00:12:27,240
too, so we're done
recursively visiting

208
00:12:27,240 --> 00:12:30,970
everything reachable from a.

209
00:12:30,970 --> 00:12:34,190
Now we go back to this
loop, the outer loop.

210
00:12:34,190 --> 00:12:38,310
So we did a, next we look at b,
we say, oh b has been visited,

211
00:12:38,310 --> 00:12:40,000
we don't need to do
anything from there.

212
00:12:40,000 --> 00:12:42,430
Then we go to c, c
hasn't been visited

213
00:12:42,430 --> 00:12:46,210
so we're going to loop
from c, and so this

214
00:12:46,210 --> 00:12:50,390
is our second choice
of s in this recursion,

215
00:12:50,390 --> 00:12:53,460
or in this outer loop.

216
00:12:53,460 --> 00:12:56,200
And so we look at the
outgoing edges from s2,

217
00:12:56,200 --> 00:12:59,210
let me match the
order in the notes.

218
00:12:59,210 --> 00:13:03,516
Let's say first we go to f.

219
00:13:03,516 --> 00:13:08,150
f has not been visited, so we
set its parent pointer to c.

220
00:13:08,150 --> 00:13:10,130
Then we look at all the
outgoing edges from f.

221
00:13:10,130 --> 00:13:13,710
There's one outgoing edge
from f, it goes to f.

222
00:13:13,710 --> 00:13:18,860
I guess I shouldn't
really bold this, sorry.

223
00:13:18,860 --> 00:13:21,040
I'll say what the bold
edges mean in a moment.

224
00:13:23,570 --> 00:13:25,300
This is just a regular edge.

225
00:13:25,300 --> 00:13:27,570
We follow the edge from f to f.

226
00:13:27,570 --> 00:13:29,385
We see, oh, f has
already been visited,

227
00:13:29,385 --> 00:13:31,400
it already has a parent
pointer, so there's

228
00:13:31,400 --> 00:13:33,389
no point going down there.

229
00:13:33,389 --> 00:13:35,430
We're done with f, that's
the only outgoing edge.

230
00:13:35,430 --> 00:13:37,650
We go back to c, there's
one other outgoing edge,

231
00:13:37,650 --> 00:13:40,900
but it leads to a vertex we've
already visited, namely e,

232
00:13:40,900 --> 00:13:44,600
and so we're done with visiting
everything reachable from c.

233
00:13:44,600 --> 00:13:46,100
We didn't visit
everything reachable

234
00:13:46,100 --> 00:13:49,250
from c, because some of it
was already visited from a.

235
00:13:49,250 --> 00:13:51,685
Then we go back to the outer
loop, say, OK, what about d?

236
00:13:51,685 --> 00:13:53,060
D has been visited,
what about e?

237
00:13:53,060 --> 00:13:54,351
E's been visited, what about f?

238
00:13:54,351 --> 00:13:55,590
F's been visited.

239
00:13:55,590 --> 00:13:57,790
So we're visiting
these vertices again,

240
00:13:57,790 --> 00:14:03,640
but should only be twice
in total, and in the end

241
00:14:03,640 --> 00:14:06,230
we visit all the vertices,
and, in a certain sense,

242
00:14:06,230 --> 00:14:07,170
all the edges as well.

243
00:14:12,440 --> 00:14:18,070
Let's talk about running time.

244
00:14:27,597 --> 00:14:29,930
What do you think the running
time of this algorithm is?

245
00:14:38,120 --> 00:14:39,590
Anyone?

246
00:14:39,590 --> 00:14:42,935
Time to wake up.

247
00:14:42,935 --> 00:14:43,897
AUDIENCE: Upper bound?

248
00:14:43,897 --> 00:14:45,810
PROFESSOR: Upper bound, sure.

249
00:14:45,810 --> 00:14:46,310
AUDIENCE: V?

250
00:14:46,310 --> 00:14:46,851
PROFESSOR: V?

251
00:14:46,851 --> 00:14:48,690
AUDIENCE: [INAUDIBLE].

252
00:14:48,690 --> 00:14:55,690
PROFESSOR: V is a little bit
optimistic, plus e, good,

253
00:14:55,690 --> 00:14:57,720
collaborative effort.

254
00:14:57,720 --> 00:15:00,070
It's linear time, just like BFS.

255
00:15:00,070 --> 00:15:02,520
This is what we
call linear time,

256
00:15:02,520 --> 00:15:07,550
because this is the
size of the input.

257
00:15:07,550 --> 00:15:11,342
It's theta V plus E
for the whole thing.

258
00:15:11,342 --> 00:15:12,800
The size of the
input was v plus e.

259
00:15:12,800 --> 00:15:15,300
We needed v slots
in an array, plus we

260
00:15:15,300 --> 00:15:20,400
needed e items in these linked
lists, one for each edge.

261
00:15:20,400 --> 00:15:22,560
We have to traverse
that whole structure.

262
00:15:22,560 --> 00:15:27,030
The reason it's order v plus e
is-- first, as you were saying,

263
00:15:27,030 --> 00:15:30,320
you're visiting every vertex
once in this outer loop,

264
00:15:30,320 --> 00:15:46,160
so not worrying about the
recursion in DFS alone,

265
00:15:46,160 --> 00:15:48,480
so that's order b.

266
00:15:48,480 --> 00:15:51,040
Then have to worry
about this recursion,

267
00:15:51,040 --> 00:15:56,160
but we know that whenever we
call DFS visit on a vertex,

268
00:15:56,160 --> 00:15:58,961
that it did not have
a parent before.

269
00:15:58,961 --> 00:16:00,830
Right before we
called DFS visit,

270
00:16:00,830 --> 00:16:03,170
we set its parent
for the first time.

271
00:16:03,170 --> 00:16:05,590
Right before we called
DFS visit on v here,

272
00:16:05,590 --> 00:16:07,580
we set as parent
for the first time,

273
00:16:07,580 --> 00:16:09,930
because it wasn't set before.

274
00:16:09,930 --> 00:16:17,880
So DFS visit, and I'm
going to just write of v,

275
00:16:17,880 --> 00:16:19,840
meaning the last argument here.

276
00:16:25,520 --> 00:16:32,660
It's called once, at
most once, per vertex b.

277
00:16:35,800 --> 00:16:37,580
But it does not
take constant time.

278
00:16:37,580 --> 00:16:41,310
This takes constant time per
vertex, plus a recursive call.

279
00:16:41,310 --> 00:16:44,320
This thing, this takes constant
time, but there's a for loop

280
00:16:44,320 --> 00:16:44,820
here.

281
00:16:44,820 --> 00:16:47,140
We have to pay for however
many outgoing edges

282
00:16:47,140 --> 00:16:49,300
there are from b, that's
the part you're missing.

283
00:16:52,880 --> 00:17:00,560
And we pay length of adjacency
of v for that vertex.

284
00:17:00,560 --> 00:17:03,046
So the total in
addition to this v

285
00:17:03,046 --> 00:17:08,300
is going to be the order, sum
overall vertices, v in capital

286
00:17:08,300 --> 00:17:13,400
V, of length of the
adjacency, list for v,

287
00:17:13,400 --> 00:17:22,150
which is E. This
is the handshaking

288
00:17:22,150 --> 00:17:24,592
lemma from last time.

289
00:17:24,592 --> 00:17:27,010
It's twice e for
undirected graphs,

290
00:17:27,010 --> 00:17:29,550
it's e for directed graphs.

291
00:17:29,550 --> 00:17:33,970
I've drawn directed graphs here,
it's a little more interesting.

292
00:17:33,970 --> 00:17:37,560
OK, so it's linear time, just
like the BFS, so you could say,

293
00:17:37,560 --> 00:17:42,240
who cares, but DFS offers a
lot of different properties

294
00:17:42,240 --> 00:17:42,870
than BFS.

295
00:17:42,870 --> 00:17:44,660
They each have their niche.

296
00:17:44,660 --> 00:17:46,250
BFS is great for shortest paths.

297
00:17:46,250 --> 00:17:49,080
You want to know the fastest
way to solve the Rubik's cube,

298
00:17:49,080 --> 00:17:50,560
BFS will find it.

299
00:17:50,560 --> 00:17:53,330
You want to find the fastest
way to solve the Rubik's cube,

300
00:17:53,330 --> 00:17:55,150
DFS will not find it.

301
00:17:55,150 --> 00:17:57,090
It's not following
shortest paths here.

302
00:17:57,090 --> 00:17:59,300
Going from a to
d, we use the path

303
00:17:59,300 --> 00:18:01,324
of length 3, that's
the bold edges.

304
00:18:01,324 --> 00:18:02,740
We could have gone
directly from a

305
00:18:02,740 --> 00:18:05,170
to d, so it's a
different kind of search,

306
00:18:05,170 --> 00:18:07,340
but sort of the inverse.

307
00:18:07,340 --> 00:18:10,560
But it's extremely useful,
in particular, in the way

308
00:18:10,560 --> 00:18:13,082
that it classifies edges.

309
00:18:13,082 --> 00:18:14,790
So let me talk about
edge classification.

310
00:18:27,630 --> 00:18:31,540
You can check every edge
in this graph gets visited.

311
00:18:31,540 --> 00:18:34,060
In a directed graph every
edge gets visited once,

312
00:18:34,060 --> 00:18:35,740
in an undirected
graph, every edge

313
00:18:35,740 --> 00:18:37,660
gets visited twice,
once from each side.

314
00:18:40,200 --> 00:18:42,240
And when you visit
that edge, there's

315
00:18:42,240 --> 00:18:45,710
sort of different categories
of what could happen to it.

316
00:18:45,710 --> 00:18:50,920
Maybe the edge led to something
unvisited, when you went there.

317
00:18:50,920 --> 00:18:52,190
We call those tree edges.

318
00:19:10,360 --> 00:19:12,920
That's what the parent
pointers are specifying

319
00:19:12,920 --> 00:19:16,420
and all the bold edges here
are called three edges.

320
00:19:16,420 --> 00:19:27,410
This is when we visit a
new vertex via that edge.

321
00:19:29,832 --> 00:19:31,540
So we look at the
other side of the edge,

322
00:19:31,540 --> 00:19:33,024
we discover a new vertex.

323
00:19:33,024 --> 00:19:34,440
Those are what we
call tree edges,

324
00:19:34,440 --> 00:19:37,830
it turns out they form
a tree, a directed tree.

325
00:19:37,830 --> 00:19:39,930
That's a lemma you can prove.

326
00:19:39,930 --> 00:19:40,810
You can see it here.

327
00:19:40,810 --> 00:19:44,650
We just have a path, actually a
forest would be more accurate.

328
00:19:44,650 --> 00:19:48,916
We have a path abed,
and we have an edge cf,

329
00:19:48,916 --> 00:19:51,209
but, in general, it's a forest.

330
00:19:51,209 --> 00:19:53,250
So for example, if there
was another thing coming

331
00:19:53,250 --> 00:19:57,540
from e here, let's modify my
graph, we would, at some point,

332
00:19:57,540 --> 00:19:59,720
visit that edge and say,
oh, here's a new way to go,

333
00:19:59,720 --> 00:20:04,250
and now that bold structure
forms an actual tree.

334
00:20:04,250 --> 00:20:06,850
These are called tree edges,
you can call them forest edges

335
00:20:06,850 --> 00:20:10,080
if you feel like it.

336
00:20:10,080 --> 00:20:13,120
There are other edges in
there, the nonbold edges,

337
00:20:13,120 --> 00:20:17,260
and the textbook distinguishes
three types, three types?

338
00:20:17,260 --> 00:20:19,950
Three types, so many types.

339
00:20:22,500 --> 00:20:40,580
They are forward edges,
backward edges, and cross edges.

340
00:20:44,720 --> 00:20:47,740
Some of these are more useful
to distinguish than others,

341
00:20:47,740 --> 00:20:51,490
but it doesn't hurt
to have them all.

342
00:20:51,490 --> 00:20:57,590
So, for example, this edge I'm
going to call a forward edge,

343
00:20:57,590 --> 00:21:01,260
just write f,
that's unambiguous,

344
00:21:01,260 --> 00:21:04,430
because it goes, in some
sense, forward along the tree.

345
00:21:04,430 --> 00:21:09,730
It goes from the root of
this tree to a descendant.

346
00:21:09,730 --> 00:21:12,130
There is a path
in the tree from a

347
00:21:12,130 --> 00:21:14,720
to d, so we call
it a forward edge.

348
00:21:14,720 --> 00:21:20,770
By contrast, this edge I'm
going to call a backward edge,

349
00:21:20,770 --> 00:21:24,570
because it goes from
a node in the tree

350
00:21:24,570 --> 00:21:26,390
to an ancestor in the trees.

351
00:21:26,390 --> 00:21:28,914
If you think of parents, I
can go from d to its parent

352
00:21:28,914 --> 00:21:30,830
to its parent, and that's
where the edge goes,

353
00:21:30,830 --> 00:21:33,460
so that's a backward
edge-- double check I

354
00:21:33,460 --> 00:21:36,870
got these not reversed,
yeah, that's right.

355
00:21:36,870 --> 00:21:39,334
Forward edge because I could
go from d to its parent

356
00:21:39,334 --> 00:21:41,000
to its parent to its
parent and the edge

357
00:21:41,000 --> 00:21:44,220
went the other way,
that's a forward edge.

358
00:21:44,220 --> 00:21:49,170
So forward edge goes from a node
to a descendant in the tree.

359
00:21:52,540 --> 00:21:56,660
Backward edge goes from a node
to an ancestor in the tree.

360
00:22:02,670 --> 00:22:04,180
And when I say,
tree, I mean forest.

361
00:22:07,080 --> 00:22:10,170
And then all the other
edges are cross edges.

362
00:22:12,940 --> 00:22:17,670
So I guess, here,
this is a cross edge.

363
00:22:17,670 --> 00:22:20,840
In this case, it goes from
one tree to another, doesn't

364
00:22:20,840 --> 00:22:22,540
have to go between
different trees.

365
00:22:22,540 --> 00:22:28,540
For example, let's say
I'm visiting d, then

366
00:22:28,540 --> 00:22:32,942
I go back to e, I visit g,
or there could be this edge.

367
00:22:32,942 --> 00:22:37,720
If this edge existed, it
would be a cross edge,

368
00:22:37,720 --> 00:22:40,970
because g and d are
not ancestor related,

369
00:22:40,970 --> 00:22:42,980
neither one is an
ancestor of the other,

370
00:22:42,980 --> 00:22:46,329
they are siblings actually.

371
00:22:46,329 --> 00:22:47,870
So there's, in
general, there's going

372
00:22:47,870 --> 00:22:51,210
to be some subtree over
here, some subtree over here,

373
00:22:51,210 --> 00:22:55,760
and this is a cross edge
between two different subtrees.

374
00:22:55,760 --> 00:23:07,960
This cross edge is between two,
sort of, non ancestor related,

375
00:23:07,960 --> 00:23:16,955
I think is the shortest way to
write this, subtrees or nodes.

376
00:23:26,520 --> 00:23:29,065
A little puzzle for
you, well, I guess

377
00:23:29,065 --> 00:23:31,620
the first question is, how do
you compute this structure?

378
00:23:31,620 --> 00:23:34,212
How do you compute
which edges are which?

379
00:23:34,212 --> 00:23:36,670
This is not hard, although I
haven't written it in the code

380
00:23:36,670 --> 00:23:37,200
here.

381
00:23:37,200 --> 00:23:42,290
You can check the textbook
for one way to do it.

382
00:23:42,290 --> 00:23:45,800
The parent structure tells you
which edges are tree edges.

383
00:23:45,800 --> 00:23:47,980
So that part we have done.

384
00:23:47,980 --> 00:23:52,670
Every parent pointer corresponds
to the reverse of a tree edge,

385
00:23:52,670 --> 00:23:55,250
so at the same time you could
mark that edge a tree edge,

386
00:23:55,250 --> 00:23:56,958
and you'd know which
edges are tree edges

387
00:23:56,958 --> 00:23:58,874
and which edges
are nontree edges.

388
00:23:58,874 --> 00:24:01,290
If you want to know which are
forward, which are backward,

389
00:24:01,290 --> 00:24:06,130
which are cross edges, the
key thing you need to know

390
00:24:06,130 --> 00:24:14,140
is, well, in particular,
for backward edges, one way

391
00:24:14,140 --> 00:24:16,850
to compute them is
to mark which nodes

392
00:24:16,850 --> 00:24:19,880
you are currently exploring.

393
00:24:19,880 --> 00:24:22,660
So when we do a DFS
visit on a node,

394
00:24:22,660 --> 00:24:25,160
we could say at
the beginning here,

395
00:24:25,160 --> 00:24:31,230
basically, we're starting
to visit s, say, start s,

396
00:24:31,230 --> 00:24:33,569
and then at the end of
this for loop, we write,

397
00:24:33,569 --> 00:24:34,485
we're finished with s.

398
00:24:38,190 --> 00:24:40,130
And you could mark that
in the s structure.

399
00:24:40,130 --> 00:24:43,720
You could say s dot in
process is true up here,

400
00:24:43,720 --> 00:24:46,730
s dot in process
equals false down here.

401
00:24:46,730 --> 00:24:49,470
Keep track of which nodes are
currently in the recursion

402
00:24:49,470 --> 00:24:53,120
stack, just by marking
them and unmarking them

403
00:24:53,120 --> 00:24:55,430
at the beginning and the end.

404
00:24:55,430 --> 00:24:58,210
Then we'll know, if we follow
an edge and it's an edge

405
00:24:58,210 --> 00:25:01,220
to somebody who's
already in the stack,

406
00:25:01,220 --> 00:25:06,020
then it's a backward edge,
because that's-- everyone

407
00:25:06,020 --> 00:25:10,690
in the stack is an ancestor
from our current node.

408
00:25:10,690 --> 00:25:15,400
Detecting forward edges,
it's a little trickier.

409
00:25:18,940 --> 00:25:23,330
Forward edges
versus cross edges,

410
00:25:23,330 --> 00:25:25,220
any suggestions on an
easy way to do that?

411
00:25:28,480 --> 00:25:31,840
I don't think I know
an easy way to do that.

412
00:25:31,840 --> 00:25:33,560
It can be done.

413
00:25:33,560 --> 00:25:35,750
The way the textbook does
it is a little bit more

414
00:25:35,750 --> 00:25:41,030
sophisticated, in that when
they start visiting a vertex,

415
00:25:41,030 --> 00:25:44,890
they record the time
that it got visited.

416
00:25:44,890 --> 00:25:46,620
What's time?

417
00:25:46,620 --> 00:25:49,220
You could think of it as
the clock on your computer,

418
00:25:49,220 --> 00:25:51,140
another way to do
it is, every time

419
00:25:51,140 --> 00:25:55,000
you do a step in this algorithm,
you increment a counter.

420
00:25:55,000 --> 00:25:58,351
So every time anything happens,
you increment a counter,

421
00:25:58,351 --> 00:25:59,850
and then you store
the value of that

422
00:25:59,850 --> 00:26:02,910
counter here for s, that
would be the start time for s,

423
00:26:02,910 --> 00:26:06,100
you store the finish
time for s down here,

424
00:26:06,100 --> 00:26:08,040
and then this gives
you, this tells you

425
00:26:08,040 --> 00:26:09,970
when a node was
visited, and you can

426
00:26:09,970 --> 00:26:12,520
use that to compute when
an edge is a forward edge

427
00:26:12,520 --> 00:26:14,924
and otherwise it's a cross edge.

428
00:26:14,924 --> 00:26:16,840
It's not terribly exciting,
though, so I'm not

429
00:26:16,840 --> 00:26:18,810
going to detail that.

430
00:26:18,810 --> 00:26:22,450
You can look at the textbook
if you're interested.

431
00:26:22,450 --> 00:26:24,140
But here's a fun puzzle.

432
00:26:24,140 --> 00:26:32,920
In an undirected graph, which
of these edges can exist?

433
00:26:32,920 --> 00:26:38,790
We can have a vote, do some
democratic mathematics.

434
00:26:38,790 --> 00:26:41,910
How many people think tree edges
exist in undirected graphs?

435
00:26:44,510 --> 00:26:46,170
You, OK.

436
00:26:46,170 --> 00:26:46,670
Sarini does.

437
00:26:46,670 --> 00:26:47,740
That's a good sign.

438
00:26:47,740 --> 00:26:49,340
How many people
think forward edges

439
00:26:49,340 --> 00:26:50,920
exist in an undirected graph?

440
00:26:54,310 --> 00:26:54,870
A couple.

441
00:26:54,870 --> 00:26:56,370
How many people
think backward edges

442
00:26:56,370 --> 00:26:59,500
exist in an undirected graph?

443
00:26:59,500 --> 00:27:00,000
Couple.

444
00:27:00,000 --> 00:27:01,850
How many people
think cross edges

445
00:27:01,850 --> 00:27:03,980
exist in undirected graph?

446
00:27:03,980 --> 00:27:05,250
More people, OK.

447
00:27:05,250 --> 00:27:07,870
I think voting worked.

448
00:27:07,870 --> 00:27:10,830
They all exist, no,
that's not true.

449
00:27:10,830 --> 00:27:13,217
This one can exist and
this one can exist.

450
00:27:13,217 --> 00:27:15,050
I actually wrote the
wrong ones in my notes,

451
00:27:15,050 --> 00:27:19,020
so it's good to trick you,
no, it's I made a mistake.

452
00:27:19,020 --> 00:27:20,870
It's very easy to
get these mixed up

453
00:27:20,870 --> 00:27:24,360
and you can think
about why this is true,

454
00:27:24,360 --> 00:27:26,200
maybe I'll draw some
pictures to clarify.

455
00:27:30,080 --> 00:27:35,570
This is something, you remember
the-- there was BFS diagram,

456
00:27:35,570 --> 00:27:38,460
I talked a little bit
about this last class.

457
00:27:38,460 --> 00:27:40,650
Tree edges better exist,
those are the things

458
00:27:40,650 --> 00:27:42,370
you use to visit new vertices.

459
00:27:42,370 --> 00:27:45,640
So that always happens,
undirected or otherwise.

460
00:27:45,640 --> 00:27:47,640
Forward edges, though,
forward edge of

461
00:27:47,640 --> 00:27:51,590
would be, OK, I visited
this, then I visited this.

462
00:27:51,590 --> 00:27:52,770
Those were tree edges.

463
00:27:55,370 --> 00:27:58,552
Then I backtrack and I
follow an edge like this.

464
00:27:58,552 --> 00:27:59,760
This would be a forward edge.

465
00:27:59,760 --> 00:28:03,470
And in a directed
graph that can happen.

466
00:28:03,470 --> 00:28:11,320
In an undirected graph,
it can also happen, right?

467
00:28:11,320 --> 00:28:12,540
Oh, no, it can't, it can't.

468
00:28:12,540 --> 00:28:14,530
OK.

469
00:28:14,530 --> 00:28:15,720
So confusing.

470
00:28:15,720 --> 00:28:17,970
undirected graph, if
you look like this,

471
00:28:17,970 --> 00:28:20,300
you start-- let's say this is s.

472
00:28:20,300 --> 00:28:24,000
You start here, and suppose
we follow this edge.

473
00:28:24,000 --> 00:28:27,180
We get to here, then we follow
this edge, we get to here.

474
00:28:27,180 --> 00:28:31,390
Then we will follow this
edge in the other direction,

475
00:28:31,390 --> 00:28:35,240
and that's guaranteed to
finish before we get back to s.

476
00:28:35,240 --> 00:28:36,970
So, in order to
be a forward edge,

477
00:28:36,970 --> 00:28:39,110
this one has to be
visited after this one,

478
00:28:39,110 --> 00:28:43,030
from s, but in this scenario,
if you follow this one first,

479
00:28:43,030 --> 00:28:44,530
you'll eventually
get to this vertex

480
00:28:44,530 --> 00:28:47,440
and then you will come back,
and then that will be classified

481
00:28:47,440 --> 00:28:49,670
as a backward edge in
an undirected graph.

482
00:28:49,670 --> 00:28:53,335
So you can never have forward
edges in an undirected graph.

483
00:29:00,900 --> 00:29:04,490
But I have a backward edge
here, that would suggest

484
00:29:04,490 --> 00:29:08,190
I can have backward edges
here, and no cross edges.

485
00:29:08,190 --> 00:29:14,410
Well, democracy did not work, I
was swayed by the popular vote.

486
00:29:14,410 --> 00:29:17,700
So I claim, apparently,
cross edges do not exist.

487
00:29:17,700 --> 00:29:18,660
Let's try to draw this.

488
00:29:18,660 --> 00:29:26,240
So a cross edge typical
scenario would be either here,

489
00:29:26,240 --> 00:29:29,900
you follow this
edge, you backtrack,

490
00:29:29,900 --> 00:29:31,950
you follow another
edge, and then

491
00:29:31,950 --> 00:29:34,670
you discover there's was an
edge back to some other subtree

492
00:29:34,670 --> 00:29:36,020
that you've already visited.

493
00:29:36,020 --> 00:29:38,365
That can happen in
an undirected graph.

494
00:29:38,365 --> 00:29:41,930
For the same reason, if
I follow this one first,

495
00:29:41,930 --> 00:29:46,240
and this edge exists undirected,
then I will go down that way.

496
00:29:46,240 --> 00:29:50,260
So it will be actually tree
edge, not a cross edge.

497
00:29:50,260 --> 00:29:51,670
OK, phew.

498
00:29:51,670 --> 00:29:56,494
That means my
notes were correct.

499
00:29:56,494 --> 00:29:57,910
I was surprised,
because they were

500
00:29:57,910 --> 00:30:04,355
copied from the textbook,
uncorrect my correction.

501
00:30:04,355 --> 00:30:04,855
Good.

502
00:30:10,080 --> 00:30:13,140
So what?

503
00:30:13,140 --> 00:30:15,930
Why do I care about these
edge classifications?

504
00:30:15,930 --> 00:30:21,970
I claim they're super handy for
two problems, cycle detection,

505
00:30:21,970 --> 00:30:24,140
which is pretty
intuitive problem.

506
00:30:24,140 --> 00:30:26,760
Does my graph have any cycles?

507
00:30:26,760 --> 00:30:29,890
In the directed case, this
is particularly interesting.

508
00:30:29,890 --> 00:30:33,390
I want to know, does a graph
have any directed cycles?

509
00:30:33,390 --> 00:30:35,360
And another problem
called topological sort,

510
00:30:35,360 --> 00:30:36,390
which we will get to.

511
00:30:41,500 --> 00:30:45,360
So let's start with
cycle detection.

512
00:30:45,360 --> 00:30:48,870
This is actually a warmup
for topological sort.

513
00:30:52,760 --> 00:30:55,680
So does my graph
have any cycles?

514
00:30:55,680 --> 00:31:00,600
G has a cycle, I claim.

515
00:31:00,600 --> 00:31:10,660
This happens, if and only if, G
has a back edge, or let's say,

516
00:31:10,660 --> 00:31:13,940
a depth-first search of
that graph has a back edge.

517
00:31:17,250 --> 00:31:19,840
So it doesn't matter
where I start from

518
00:31:19,840 --> 00:31:22,944
or how this algorithm-- I run
this top level DFS algorithm,

519
00:31:22,944 --> 00:31:24,360
explore the whole
graph, because I

520
00:31:24,360 --> 00:31:26,970
want to know in the whole
graph is there a cycle?

521
00:31:26,970 --> 00:31:29,580
I claim, if there's a back
edge, then there's a cycle.

522
00:31:33,030 --> 00:31:35,729
So it all comes
down to back edges.

523
00:31:35,729 --> 00:31:38,020
This will work for both
directed and undirected graphs.

524
00:31:38,020 --> 00:31:41,070
Detecting cycles is pretty
easy in undirected graphs.

525
00:31:41,070 --> 00:31:43,370
It's a little more subtle
with directed graphs,

526
00:31:43,370 --> 00:31:46,750
because you have to worry
about the edge directions.

527
00:31:46,750 --> 00:31:49,610
So let's prove this.

528
00:31:49,610 --> 00:31:52,770
We haven't done a
serious proof in a while,

529
00:31:52,770 --> 00:31:57,110
so this is still a pretty easy
one, let's think about it.

530
00:31:57,110 --> 00:31:58,880
What do you think is
the easier direction

531
00:31:58,880 --> 00:32:02,780
to prove here, left or right?

532
00:32:02,780 --> 00:32:03,720
To more democracy.

533
00:32:03,720 --> 00:32:07,292
How many people
think left is easy?

534
00:32:07,292 --> 00:32:08,360
A couple.

535
00:32:08,360 --> 00:32:10,240
How many people
think right is easy?

536
00:32:10,240 --> 00:32:12,410
A whole bunch more.

537
00:32:12,410 --> 00:32:14,890
I disagree with you.

538
00:32:14,890 --> 00:32:18,320
I guess it depends
what you consider easy.

539
00:32:18,320 --> 00:32:21,210
Let me show you
how easy left is.

540
00:32:21,210 --> 00:32:25,780
Left is, I have a back edge, I
want to claim there's a cycle.

541
00:32:25,780 --> 00:32:27,610
What is the back edge look like?

542
00:32:27,610 --> 00:32:34,050
Well, it's an edge to
an ancestor in the tree.

543
00:32:34,050 --> 00:32:35,796
If this node is a
descendant of this node

544
00:32:35,796 --> 00:32:39,920
and this node is an ancestor
of this node, that's

545
00:32:39,920 --> 00:32:42,860
saying there are
tree edges, there's

546
00:32:42,860 --> 00:32:45,820
a path, a tree path, that
connects one to the other.

547
00:32:49,340 --> 00:32:54,160
So these are tree
edges, because this

548
00:32:54,160 --> 00:32:57,859
is supposed to be an
ancestor, and this

549
00:32:57,859 --> 00:32:59,150
is supposed to be a descendant.

550
00:33:03,670 --> 00:33:08,770
And that's the definition
of a back edge.

551
00:33:08,770 --> 00:33:11,540
Do you see a cycle?

552
00:33:11,540 --> 00:33:12,820
I see a cycle.

553
00:33:12,820 --> 00:33:17,550
This is a cycle, directed cycle.

554
00:33:17,550 --> 00:33:21,970
So if there's a back edge, by
definition, it makes a cycle.

555
00:33:21,970 --> 00:33:24,290
Now, it's harder to say
if I have 10 back edges,

556
00:33:24,290 --> 00:33:25,400
how many cycles are there?

557
00:33:25,400 --> 00:33:26,560
Could be many.

558
00:33:26,560 --> 00:33:28,880
But if there's a
back edge, there's

559
00:33:28,880 --> 00:33:30,410
definitely at least one cycle.

560
00:33:34,082 --> 00:33:35,790
The other direction
is also not too hard,

561
00:33:35,790 --> 00:33:38,600
but I would hesitate
to call it easy.

562
00:33:38,600 --> 00:33:42,690
Any suggestions if, I
know there is a cycle,

563
00:33:42,690 --> 00:33:46,910
how do I prove that there's
a back edge somewhere?

564
00:33:46,910 --> 00:33:49,110
Think about that,
let me draw a cycle.

565
00:34:11,439 --> 00:34:12,480
There's a length k cycle.

566
00:34:16,214 --> 00:34:17,880
Where do you think,
which of these edges

567
00:34:17,880 --> 00:34:19,260
do you think is going
to be a back edge?

568
00:34:19,260 --> 00:34:20,835
Let's hope it's
one of these edges.

569
00:34:23,350 --> 00:34:24,190
Sorry?

570
00:34:24,190 --> 00:34:25,420
AUDIENCE: Vk to v zero.

571
00:34:25,420 --> 00:34:26,560
PROFESSOR: Vk to v zero.

572
00:34:26,560 --> 00:34:31,000
That's a good idea, maybe
this is a back edge.

573
00:34:31,000 --> 00:34:34,670
Of course, this is
symmetric, why that edge?

574
00:34:34,670 --> 00:34:36,780
I labeled it in
a suggestive way,

575
00:34:36,780 --> 00:34:39,389
but I need to say something
before I know actually which

576
00:34:39,389 --> 00:34:42,404
edge is going to
be the back edge.

577
00:34:42,404 --> 00:34:44,320
AUDIENCE: You have to
say you start to v zero?

578
00:34:44,320 --> 00:34:45,850
PROFESSOR: Start at v zero.

579
00:34:45,850 --> 00:34:48,460
If I started a search
of v zero, that

580
00:34:48,460 --> 00:34:49,839
looks good, because
the search is

581
00:34:49,839 --> 00:34:51,719
kind of going to go
in this direction.

582
00:34:51,719 --> 00:34:53,949
vk will maybe be the
last thing to be visited,

583
00:34:53,949 --> 00:34:55,480
that's not actually true.

584
00:34:55,480 --> 00:34:57,710
Could be there's an edge
directly from v zero to vk,

585
00:34:57,710 --> 00:35:00,700
but intuitively vk
will kind of later,

586
00:35:00,700 --> 00:35:02,470
and then when this
edge gets visited,

587
00:35:02,470 --> 00:35:05,350
this will be an ancestor
and it will be a back edge.

588
00:35:05,350 --> 00:35:10,270
Of course, we may not
start a search here,

589
00:35:10,270 --> 00:35:12,240
so calling it the
start of the search

590
00:35:12,240 --> 00:35:16,079
is not quite right,
a little different.

591
00:35:16,079 --> 00:35:18,800
AUDIENCE: First vertex
that gets hit [INAUDIBLE].

592
00:35:18,800 --> 00:35:21,550
PROFESSOR: First vertex
that gets hit, good.

593
00:35:21,550 --> 00:35:24,820
I'm going to start the
numbering , v zero,

594
00:35:24,820 --> 00:35:38,460
let's assume v 0 is the
first vertex in the cycle,

595
00:35:38,460 --> 00:35:40,040
visited by the
depth-first search.

596
00:35:47,100 --> 00:35:54,060
Together, if you want some
pillows if you like them,

597
00:35:54,060 --> 00:35:56,640
especially convenient
that they're in front.

598
00:35:56,640 --> 00:35:59,130
So right, if it's
not v zero, say

599
00:35:59,130 --> 00:36:00,470
v3 was the first one visited.

600
00:36:00,470 --> 00:36:01,845
We will just change
the labeling,

601
00:36:01,845 --> 00:36:06,260
so that's v zero, that's
v1, that's v, and so on.

602
00:36:06,260 --> 00:36:09,340
So set this labeling,
so that v0 first one,

603
00:36:09,340 --> 00:36:12,430
first vertex that gets visited.

604
00:36:12,430 --> 00:36:20,230
Then, I claim that-- let me
just write the claim first.

605
00:36:20,230 --> 00:36:23,610
This edge vkv0 will
be a back edge.

606
00:36:26,350 --> 00:36:29,252
We'll just say, is back edge.

607
00:36:29,252 --> 00:36:32,780
And I would say this is not
obvious, be a little careful.

608
00:36:50,420 --> 00:36:54,460
We have to somehow exploit
the depth-first nature of DFS,

609
00:36:54,460 --> 00:36:58,820
the fact that it goes deep-- it
goes as deep as it can before

610
00:36:58,820 --> 00:37:00,396
backtracking.

611
00:37:00,396 --> 00:37:02,820
If you think about
it, we're starting,

612
00:37:02,820 --> 00:37:05,690
at this point we are starting a
search relative to this cycle.

613
00:37:05,690 --> 00:37:08,550
No one has been visited,
except v zero just

614
00:37:08,550 --> 00:37:10,930
got visited, has a parent
pointer off somewhere else.

615
00:37:15,990 --> 00:37:16,880
What do we do next?

616
00:37:16,880 --> 00:37:19,309
Well, we visit all the
outgoing edges from v zero,

617
00:37:19,309 --> 00:37:20,850
there might be many
of them. it could

618
00:37:20,850 --> 00:37:23,480
be edge from v zero to v1,
it could an edge from v zero

619
00:37:23,480 --> 00:37:28,750
to v3, it could be an edge
from v zero to something else.

620
00:37:28,750 --> 00:37:31,980
We don't know which one's
going to happen first.

621
00:37:31,980 --> 00:37:39,760
But the one thing I
can claim is that v1

622
00:37:39,760 --> 00:37:46,610
will be visited before we
finish visiting v zero.

623
00:37:52,124 --> 00:37:53,790
From v zero, we might
go somewhere else,

624
00:37:53,790 --> 00:37:55,790
we might go somewhere
else that might eventually

625
00:37:55,790 --> 00:37:58,130
lead to v1 by some other
route, but in particular, we

626
00:37:58,130 --> 00:38:01,440
look at that edge
from v zero to v1.

627
00:38:01,440 --> 00:38:03,730
And so, at some point,
we're searching,

628
00:38:03,730 --> 00:38:06,580
we're visiting all the things
reachable from v zero, that

629
00:38:06,580 --> 00:38:09,830
includes v1, and
that will happen,

630
00:38:09,830 --> 00:38:11,950
we will touch v1
for the first time,

631
00:38:11,950 --> 00:38:13,800
because it hasn't
been touched yet.

632
00:38:13,800 --> 00:38:17,932
We will visit it before
we finish visiting v zero.

633
00:38:17,932 --> 00:38:21,660
The same goes actually for all
of v i's, because they're all

634
00:38:21,660 --> 00:38:23,510
reachable from v zero.

635
00:38:23,510 --> 00:38:25,760
You can prove this by induction.

636
00:38:25,760 --> 00:38:29,860
You'll have to visit v1 before
you finish visiting v zero.

637
00:38:29,860 --> 00:38:32,480
You'll have to visit v2
before you finish visiting

638
00:38:32,480 --> 00:38:35,592
v1, although you might
actually visit v2 before v1.

639
00:38:35,592 --> 00:38:37,050
You would definitely
finish, you'll

640
00:38:37,050 --> 00:38:41,880
finished v2 before you
finish v1, and so on.

641
00:38:41,880 --> 00:38:47,424
So vi will be visited before
you finish vi minus 1,

642
00:38:47,424 --> 00:38:49,090
but in particular,
what we care about is

643
00:38:49,090 --> 00:38:58,760
that vk is visited
before we finish v zero.

644
00:39:02,040 --> 00:39:03,670
And it will be entirely visited.

645
00:39:03,670 --> 00:39:05,930
We will finish
visiting vk before we

646
00:39:05,930 --> 00:39:07,570
finish visiting v zero.

647
00:39:07,570 --> 00:39:10,280
We will start decay vk
after we start to v zero,

648
00:39:10,280 --> 00:39:12,330
because v zero is first.

649
00:39:12,330 --> 00:39:16,580
So the order is going to
look like, start v zero,

650
00:39:16,580 --> 00:39:20,940
at some point we will start vk.

651
00:39:20,940 --> 00:39:27,950
Then we'll finish vk,
then we'll finish v zero.

652
00:39:27,950 --> 00:39:30,340
This is something the
textbook likes to call,

653
00:39:30,340 --> 00:39:33,200
and I like to call,
balanced parentheses.

654
00:39:33,200 --> 00:39:38,690
You can think of it as, we
start v zero, then we start vk,

655
00:39:38,690 --> 00:39:42,390
then we finish vk,
then we finish v zero.

656
00:39:42,390 --> 00:39:44,290
And these match up
and their balanced.

657
00:39:46,970 --> 00:39:48,720
Depth-first search
always looks like that,

658
00:39:48,720 --> 00:39:50,630
because once you
start a vertex, you

659
00:39:50,630 --> 00:39:53,060
keep chugging until you visited
all the things reachable

660
00:39:53,060 --> 00:39:54,460
from it.

661
00:39:54,460 --> 00:39:55,500
Then you finish it.

662
00:39:55,500 --> 00:39:57,560
You won't finish v zero
before you finish vk,

663
00:39:57,560 --> 00:40:00,114
because it's part
of the recursion.

664
00:40:00,114 --> 00:40:01,530
You can't return
at a higher level

665
00:40:01,530 --> 00:40:04,942
before you return
at the lower levels.

666
00:40:04,942 --> 00:40:06,400
So we've just argued
that the order

667
00:40:06,400 --> 00:40:08,025
is like this, because
v zero was first,

668
00:40:08,025 --> 00:40:11,600
so vk starts after v zero, and
also we're going to finish vk

669
00:40:11,600 --> 00:40:14,550
before we finish v zero, because
it's reachable, and hasn't

670
00:40:14,550 --> 00:40:17,000
been visited before.

671
00:40:17,000 --> 00:40:25,200
So, in here, we
consider vkv zero.

672
00:40:28,000 --> 00:40:32,070
When we consider that edge,
it will be a back edge.

673
00:40:34,750 --> 00:40:35,710
Why?

674
00:40:35,710 --> 00:40:39,640
Because v zero is currently
on the recursion stack,

675
00:40:39,640 --> 00:40:42,427
and so you will have marked v
zero as currently in process.

676
00:40:42,427 --> 00:40:44,760
So when you look at that edge,
you see it's a back edge,

677
00:40:44,760 --> 00:40:47,660
it's an edge to your ancestor.

678
00:40:47,660 --> 00:40:48,430
That's the proof.

679
00:40:51,700 --> 00:40:52,790
Any questions about that?

680
00:40:55,490 --> 00:40:59,460
It's pretty easy once you set
up the starting point, which

681
00:40:59,460 --> 00:41:01,470
is look at the first
time you visit the cycle,

682
00:41:01,470 --> 00:41:03,732
than just think about how
you walk around the cycle.

683
00:41:03,732 --> 00:41:05,940
There's lots of ways you
might walk around the cycle,

684
00:41:05,940 --> 00:41:08,579
but it's guaranteed you'll
visit vk at some point,

685
00:41:08,579 --> 00:41:10,870
then you'll look at the edge.
v0 is still in the stack,

686
00:41:10,870 --> 00:41:12,730
so it's a back edge.

687
00:41:12,730 --> 00:41:14,575
And so this proves
that having a cycle

688
00:41:14,575 --> 00:41:16,260
is equivalent to
having a back edge.

689
00:41:16,260 --> 00:41:18,980
This gives you an easy linear
time algorithm to tell,

690
00:41:18,980 --> 00:41:20,902
does my graph have a cycle?

691
00:41:20,902 --> 00:41:22,860
And if it does, it's
actually easy to find one,

692
00:41:22,860 --> 00:41:26,102
because we find a back edge,
just follow the tree edges,

693
00:41:26,102 --> 00:41:27,060
and you get your cycle.

694
00:41:29,564 --> 00:41:31,230
So if someone gives
you a graph and say,

695
00:41:31,230 --> 00:41:34,350
hey, I think this is acyclic,
you can very quickly say,

696
00:41:34,350 --> 00:41:36,590
no, it's not, here's
a cycle, or say,

697
00:41:36,590 --> 00:41:40,490
yeah, I agree, no back edges,
I only have tree, forward,

698
00:41:40,490 --> 00:41:41,611
and cross edges.

699
00:41:49,150 --> 00:41:50,545
OK, that was application 1.

700
00:41:56,610 --> 00:41:58,990
Application 2 is
topological sort,

701
00:41:58,990 --> 00:42:02,790
which we're going to
think about in the setting

702
00:42:02,790 --> 00:42:04,320
of a problem called
job scheduling.

703
00:42:07,700 --> 00:42:14,860
So job scheduling, we are
given a directed acyclic graph.

704
00:42:21,770 --> 00:42:39,090
I want to order the vertices
so that all edges point

705
00:42:39,090 --> 00:42:46,090
from lower order to high order.

706
00:42:52,520 --> 00:42:54,405
Directed acyclic
graph is called a DAG,

707
00:42:54,405 --> 00:42:59,830
you should know that from 042.

708
00:42:59,830 --> 00:43:02,790
And maybe I'll
draw one for kicks.

709
00:43:32,030 --> 00:43:34,760
Now, I've drawn the graph so
all the edges go left to right,

710
00:43:34,760 --> 00:43:37,110
so you can see that
there's no cycles here,

711
00:43:37,110 --> 00:43:41,090
but generally you'd run DFS and
you'd detect there's no cycles.

712
00:43:41,090 --> 00:43:43,170
And now, imagine these
vertices represent

713
00:43:43,170 --> 00:43:45,746
things you need to do.

714
00:43:45,746 --> 00:43:49,080
The textbook has a funny example
where you're getting dressed,

715
00:43:49,080 --> 00:43:50,820
so you have these
constraints that say,

716
00:43:50,820 --> 00:43:53,579
well, I've got to put my socks
on before put my shoes on.

717
00:43:53,579 --> 00:43:55,620
And then I've got to put
my underwear on before I

718
00:43:55,620 --> 00:43:59,350
put my pants on, and all
these kinds of things.

719
00:43:59,350 --> 00:44:01,460
You would code that as a
directed acyclic graph.

720
00:44:01,460 --> 00:44:03,293
You hope there's no
cycles, because then you

721
00:44:03,293 --> 00:44:05,100
can't get dressed.

722
00:44:05,100 --> 00:44:06,830
And there's some
things, like, well, I

723
00:44:06,830 --> 00:44:09,050
could put my glasses on
whenever, although actually I

724
00:44:09,050 --> 00:44:11,174
should put my glasses on
before I do anything else,

725
00:44:11,174 --> 00:44:12,730
otherwise there's problems.

726
00:44:12,730 --> 00:44:14,980
I don't know, you could put
your watch on at any time,

727
00:44:14,980 --> 00:44:17,110
unless you need to
know what time is.

728
00:44:17,110 --> 00:44:20,287
So there's some disconnected
parts, whatever.

729
00:44:20,287 --> 00:44:21,870
There's some unrelated
things, like, I

730
00:44:21,870 --> 00:44:24,955
don't care the order between
my shirt and my pants

731
00:44:24,955 --> 00:44:28,780
or whatever, some things
aren't constrained.

732
00:44:28,780 --> 00:44:31,760
What you'd like to do is choose
an actual order to do things.

733
00:44:31,760 --> 00:44:33,275
Say you're a
sequential being, you

734
00:44:33,275 --> 00:44:35,630
can only do one
thing at a time, so I

735
00:44:35,630 --> 00:44:37,050
want to compute a total order.

736
00:44:37,050 --> 00:44:39,510
First I'll do g,
then I'll do a, then

737
00:44:39,510 --> 00:44:42,900
I can do h, because I've done
both of the predecessors.

738
00:44:42,900 --> 00:44:45,160
Then I can't do be,
because I haven't done d,

739
00:44:45,160 --> 00:44:49,040
so maybe I'll do d first, and
then b, and than e, then c,

740
00:44:49,040 --> 00:44:50,090
then f, then i.

741
00:44:50,090 --> 00:44:53,180
That would be a valid order,
because all edges point

742
00:44:53,180 --> 00:44:55,580
from an earlier number
to a later number.

743
00:44:55,580 --> 00:44:56,930
So that's the goal.

744
00:44:56,930 --> 00:44:59,300
And these are real job
scheduling problems

745
00:44:59,300 --> 00:45:01,670
that come up, you'll
see more applications

746
00:45:01,670 --> 00:45:04,710
in your problem set.

747
00:45:04,710 --> 00:45:07,199
How do we do this?

748
00:45:07,199 --> 00:45:08,990
Well, at this point we
have two algorithms,

749
00:45:08,990 --> 00:45:10,880
and I pretty much
revealed it is DFS.

750
00:45:10,880 --> 00:45:13,100
DFS will do this.

751
00:45:13,100 --> 00:45:16,650
It's a topological sort, is
what this algorithm is usually

752
00:45:16,650 --> 00:45:17,150
called.

753
00:45:20,010 --> 00:45:23,280
Topological sort because
you're given a graph, which

754
00:45:23,280 --> 00:45:25,070
you could think
of as a topology.

755
00:45:25,070 --> 00:45:26,912
You want to sort it,
in a certain sense.

756
00:45:26,912 --> 00:45:28,370
It's not like
sorting numbers, it's

757
00:45:28,370 --> 00:45:32,370
sorting vertices in a graph,
so, hence, topological sort.

758
00:45:32,370 --> 00:45:34,150
That's the name
of the algorithm.

759
00:45:34,150 --> 00:45:46,250
And it's run DFS, and
output the reverse

760
00:45:46,250 --> 00:45:55,192
of the finishing
times of vertices.

761
00:45:55,192 --> 00:45:57,150
so this is another
application where you really

762
00:45:57,150 --> 00:45:58,983
want to visit all the
vertices in the graph,

763
00:45:58,983 --> 00:46:05,100
so we use this top level DFS,
so everybody gets visited.

764
00:46:05,100 --> 00:46:07,350
And there are these
finishing times,

765
00:46:07,350 --> 00:46:11,470
so every time I finish a vertex,
I could add it to a list.

766
00:46:11,470 --> 00:46:13,294
Say OK, that one
was finished next,

767
00:46:13,294 --> 00:46:15,460
than this one is finished,
than this one's finished.

768
00:46:15,460 --> 00:46:18,320
I take that order
and I reverse it.

769
00:46:18,320 --> 00:46:21,588
That will be a
topological order.

770
00:46:21,588 --> 00:46:22,900
Why?

771
00:46:22,900 --> 00:46:24,190
Who knows.

772
00:46:24,190 --> 00:46:24,880
Let's prove it.

773
00:46:34,440 --> 00:46:38,610
We've actually done pretty
much the hard work, which

774
00:46:38,610 --> 00:46:42,560
is to say-- we're assuming
our graph has no cycles,

775
00:46:42,560 --> 00:46:46,150
so that tells us by
this cycle detection

776
00:46:46,150 --> 00:46:47,410
that there are no back edges.

777
00:46:47,410 --> 00:46:49,780
Back edges are kind
of the annoying part.

778
00:46:49,780 --> 00:46:51,500
Now they don't exist here.

779
00:46:51,500 --> 00:46:56,970
So all the edges are tree edges,
forward edges, and cross edges,

780
00:46:56,970 --> 00:47:01,765
and we use that to
prove the theorem.

781
00:47:05,020 --> 00:47:10,570
So we want to prove that all
the edges point from an earlier

782
00:47:10,570 --> 00:47:12,170
number to a later number.

783
00:47:15,320 --> 00:47:17,080
So what that means
is for an edge,

784
00:47:17,080 --> 00:47:22,830
uv, we want to show that
v finishes before u.

785
00:47:32,010 --> 00:47:34,750
That's the reverse,
because what we're taking

786
00:47:34,750 --> 00:47:38,610
is the reverse of
the finishing order.

787
00:47:38,610 --> 00:47:41,790
So edge uv, I want to make
sure v finishes first,

788
00:47:41,790 --> 00:47:43,595
so that u will be ordered first.

789
00:47:45,917 --> 00:47:47,000
Well, there are two cases.

790
00:47:51,290 --> 00:47:59,010
Case 1 is that u
starts before v. Case 2

791
00:47:59,010 --> 00:48:01,460
is that he v before u.

792
00:48:06,690 --> 00:48:08,220
At some point they
start, because we

793
00:48:08,220 --> 00:48:09,136
visit the whole graph.

794
00:48:13,160 --> 00:48:16,400
This top loop guarantees that.

795
00:48:16,400 --> 00:48:21,440
So consider what order we visit
them first, at the beginning,

796
00:48:21,440 --> 00:48:23,960
and then we'll think
about how they finish.

797
00:48:23,960 --> 00:48:27,400
Well, this case is kind of
something we've seen before.

798
00:48:27,400 --> 00:48:31,480
We visit u, we have
not yet visited v,

799
00:48:31,480 --> 00:48:35,440
but v is reachable from
u, so maybe via this edge,

800
00:48:35,440 --> 00:48:38,320
or maybe via some other
path, we will eventually

801
00:48:38,320 --> 00:48:41,190
visit v in the recursion for u.

802
00:48:41,190 --> 00:48:48,950
So before u finishes,
we will visit v, visit v

803
00:48:48,950 --> 00:48:53,070
before u finishes.

804
00:48:53,070 --> 00:48:58,560
That sentence is just
like this sentence,

805
00:48:58,560 --> 00:48:59,849
so same kind of argument.

806
00:48:59,849 --> 00:49:01,640
We won't go into detail,
because we already

807
00:49:01,640 --> 00:49:04,470
did that several times.

808
00:49:04,470 --> 00:49:07,710
So that means we'll visit v,
we will completely visit v,

809
00:49:07,710 --> 00:49:10,040
we will finish v
before we finish u

810
00:49:10,040 --> 00:49:12,100
and that's what we
wanted to prove.

811
00:49:12,100 --> 00:49:14,580
So in that case is good.

812
00:49:14,580 --> 00:49:18,820
The other cases is
that v starts before u.

813
00:49:18,820 --> 00:49:21,764
Here, you might get
slightly worried.

814
00:49:21,764 --> 00:49:24,810
So we have an edge, uv,
still, same direction.

815
00:49:24,810 --> 00:49:29,930
But now we start at v, u
has not yet been visited.

816
00:49:29,930 --> 00:49:35,646
Well, now we worry
that we visit u.

817
00:49:35,646 --> 00:49:38,510
If we visit u, we're going to
finish u before we finish v,

818
00:49:38,510 --> 00:49:40,640
but we want it to be
the other way around.

819
00:49:40,640 --> 00:49:43,096
Why can't that happen?

820
00:49:43,096 --> 00:49:44,013
AUDIENCE: [INAUDIBLE].

821
00:49:44,013 --> 00:49:46,262
PROFESSOR: Because there's
a back edge somewhere here.

822
00:49:46,262 --> 00:49:48,610
In particular, the graph
would have to be cyclic.

823
00:49:48,610 --> 00:49:54,830
This is a cycle, so this
can't happen, a contradiction.

824
00:49:54,830 --> 00:50:00,350
So v will finish before
we visit u at all.

825
00:50:04,690 --> 00:50:07,830
So v will still finish first,
because we don't even touch u,

826
00:50:07,830 --> 00:50:10,080
because there's no cycles.

827
00:50:10,080 --> 00:50:13,280
So that's actually the proof
that topological sort gives you

828
00:50:13,280 --> 00:50:18,195
a valid job schedule,
and it's kind of-- there

829
00:50:18,195 --> 00:50:21,200
are even more things
you can do with DFS.

830
00:50:21,200 --> 00:50:24,520
We'll see some in recitations,
more in the textbook.

831
00:50:24,520 --> 00:50:28,280
But simple algorithm, can do
a lot of nifty things with it,

832
00:50:28,280 --> 00:50:30,930
very fast, linear time.