1
00:00:17,470 --> 00:00:20,000
MICHALE FEE: OK, let's
go ahead and get started.

2
00:00:20,000 --> 00:00:22,630
All right, so today, we're
going to continue talking about

3
00:00:22,630 --> 00:00:26,350
feed-forward neural networks,
and we're going to keep working

4
00:00:26,350 --> 00:00:31,180
on some interesting
aspects of linear algebra--

5
00:00:31,180 --> 00:00:32,950
matrix transformations.

6
00:00:32,950 --> 00:00:37,990
We're going to introduce a
new idea from linear algebra,

7
00:00:37,990 --> 00:00:40,240
the idea of basis sets.

8
00:00:40,240 --> 00:00:42,700
We're going to describe some
interesting and important

9
00:00:42,700 --> 00:00:46,390
properties of basis sets,
such as linear independence.

10
00:00:46,390 --> 00:00:49,480
And then we're going to end with
just a very simple formulation

11
00:00:49,480 --> 00:00:55,090
of how to change between
different basis sets.

12
00:00:55,090 --> 00:00:58,210
So let me explain
a little bit more,

13
00:00:58,210 --> 00:01:03,850
motivate a little bit more
why we're doing these things.

14
00:01:03,850 --> 00:01:08,770
So as people, as animals,
looking out at the world,

15
00:01:08,770 --> 00:01:11,740
we are looking at
high-dimensional data.

16
00:01:11,740 --> 00:01:16,330
We have hundreds of millions of
photoreceptors in our retina.

17
00:01:16,330 --> 00:01:22,660
Those data get compressed
down into about a million

18
00:01:22,660 --> 00:01:26,050
nerve fibers that go through
our optic nerve up to our brain.

19
00:01:26,050 --> 00:01:28,120
So it's a very
high-dimensional data set.

20
00:01:28,120 --> 00:01:31,030
And then our brain unpacks
that data and tries

21
00:01:31,030 --> 00:01:32,020
to make sense of it.

22
00:01:32,020 --> 00:01:34,720
And it does that by
passing that data

23
00:01:34,720 --> 00:01:36,970
through layers of
neural circuits

24
00:01:36,970 --> 00:01:38,500
that make transformations.

25
00:01:38,500 --> 00:01:42,790
And we've talked about how in
going from one layer of neurons

26
00:01:42,790 --> 00:01:44,440
to another layer
of neurons, there's

27
00:01:44,440 --> 00:01:47,080
a feed-forward projection
that essentially

28
00:01:47,080 --> 00:01:50,507
does what looks like a
matrix multiplication, OK?

29
00:01:50,507 --> 00:01:52,090
So that's one of the
reasons why we're

30
00:01:52,090 --> 00:01:55,940
trying to understand what
matrix multiplications do.

31
00:01:55,940 --> 00:01:58,930
Now, we talked about some of
the matrix transformations

32
00:01:58,930 --> 00:02:01,240
that you can see when you
do a matrix multiplication.

33
00:02:01,240 --> 00:02:04,630
And one of those was a rotation.

34
00:02:04,630 --> 00:02:07,690
Matrix multiplications
can implement rotations.

35
00:02:07,690 --> 00:02:11,980
And rotations are very
important for visualizing

36
00:02:11,980 --> 00:02:13,160
high-dimensional data.

37
00:02:13,160 --> 00:02:17,740
So this is from a website
at Google research,

38
00:02:17,740 --> 00:02:21,640
where they've implemented
different viewers

39
00:02:21,640 --> 00:02:24,790
for high-dimensional data, ways
of taking high-dimensional data

40
00:02:24,790 --> 00:02:29,590
and reducing the dimensionality
and then visualizing

41
00:02:29,590 --> 00:02:31,000
what that data looks like.

42
00:02:31,000 --> 00:02:33,100
And one of the
most important ways

43
00:02:33,100 --> 00:02:36,310
that you visualize
high-dimensional data

44
00:02:36,310 --> 00:02:39,945
is by rotating it and looking
at it from different angles.

45
00:02:39,945 --> 00:02:41,320
And what you're
doing when you do

46
00:02:41,320 --> 00:02:43,440
that is you take this
high-dimensional data,

47
00:02:43,440 --> 00:02:45,580
you rotate it,
and you project it

48
00:02:45,580 --> 00:02:49,190
into a plane, which is what
you're seeing on the screen.

49
00:02:49,190 --> 00:02:52,630
And you can see that
you get a lot out

50
00:02:52,630 --> 00:02:56,590
of looking at
different projections

51
00:02:56,590 --> 00:02:58,780
and different
rotations of data sets.

52
00:02:58,780 --> 00:03:01,750
Also, when you're
zooming in on the data,

53
00:03:01,750 --> 00:03:04,270
that's another matrix
transformation.

54
00:03:04,270 --> 00:03:07,570
You can stretch
and compress and do

55
00:03:07,570 --> 00:03:09,910
all sorts of different
things to data.

56
00:03:09,910 --> 00:03:13,810
Now, one of the cool
things is that when

57
00:03:13,810 --> 00:03:17,800
we study the brain
to try to figure out

58
00:03:17,800 --> 00:03:24,730
how it does this really cool
process of rotating data

59
00:03:24,730 --> 00:03:26,980
through its
transformations that are

60
00:03:26,980 --> 00:03:30,370
produced by neural networks,
we record from lots of neurons.

61
00:03:30,370 --> 00:03:32,230
There's technology
now where you can

62
00:03:32,230 --> 00:03:35,800
image from thousands, or even
tens of thousands, of neurons

63
00:03:35,800 --> 00:03:37,120
simultaneously.

64
00:03:37,120 --> 00:03:40,000
And again, it's this really
high-dimensional data

65
00:03:40,000 --> 00:03:42,040
set that we're looking
at to try to figure out

66
00:03:42,040 --> 00:03:43,570
how the brain works.

67
00:03:43,570 --> 00:03:46,330
And so in order to
analyze those data,

68
00:03:46,330 --> 00:03:50,140
we try to build programs
or machines that

69
00:03:50,140 --> 00:03:54,280
act like the brain in order
to understand the data that we

70
00:03:54,280 --> 00:03:56,260
collect from the brain.

71
00:03:56,260 --> 00:03:59,350
It's really cool.

72
00:03:59,350 --> 00:04:00,560
So it's kind of fun.

73
00:04:00,560 --> 00:04:04,030
As neuroscientists, we're
trying to build a brain

74
00:04:04,030 --> 00:04:08,160
to analyze the data that
we collect from the brain.

75
00:04:08,160 --> 00:04:12,100
All right, so the cool thing
is that the math that we're

76
00:04:12,100 --> 00:04:16,060
looking at right now and
the kinds of neural networks

77
00:04:16,060 --> 00:04:18,190
that we're looking at
right now are exactly

78
00:04:18,190 --> 00:04:19,959
the kinds of math
and neural networks

79
00:04:19,959 --> 00:04:24,790
that you use to
explain the brain

80
00:04:24,790 --> 00:04:30,220
and to look at data in very
powerful ways, all right?

81
00:04:30,220 --> 00:04:32,230
So that's what
we're trying to do.

82
00:04:32,230 --> 00:04:35,800
So let's start by coming back
to our two-layer feed-forward

83
00:04:35,800 --> 00:04:38,410
network and looking
in a little bit more

84
00:04:38,410 --> 00:04:39,790
detail about what it does.

85
00:04:39,790 --> 00:04:42,850
OK, so I introduced the idea,
this two-layer feed-forward

86
00:04:42,850 --> 00:04:43,630
network.

87
00:04:43,630 --> 00:04:46,540
We have an input layer that
has a vector of firing rates,

88
00:04:46,540 --> 00:04:49,600
a firing rate that describes
each of those input neurons,

89
00:04:49,600 --> 00:04:51,040
a vector of firing rates.

90
00:04:51,040 --> 00:04:52,690
That, again, is
a list of numbers

91
00:04:52,690 --> 00:04:55,840
that describes the firing rate
of each neuron in the output

92
00:04:55,840 --> 00:04:57,930
layer.

93
00:04:57,930 --> 00:05:00,030
And the connections
between these two layers

94
00:05:00,030 --> 00:05:03,870
are a bunch of synapses,
synaptic weights,

95
00:05:03,870 --> 00:05:08,310
that we can use to calculate
to transform the firing

96
00:05:08,310 --> 00:05:11,010
rates at the input layer into
the firing rates at the output

97
00:05:11,010 --> 00:05:11,970
layer.

98
00:05:11,970 --> 00:05:15,330
So let's look in a little
bit more detail now

99
00:05:15,330 --> 00:05:20,460
at what that collection
of weights looks like.

100
00:05:20,460 --> 00:05:22,820
So we describe it as a matrix.

101
00:05:22,820 --> 00:05:24,850
That's called the weight matrix.

102
00:05:24,850 --> 00:05:27,750
The matrix has in it a
number for the weight

103
00:05:27,750 --> 00:05:31,860
from each of the input neurons
to each of the output neurons.

104
00:05:31,860 --> 00:05:35,790
The rows are a vector
of weights onto each

105
00:05:35,790 --> 00:05:36,870
of the output neurons.

106
00:05:36,870 --> 00:05:39,660
And we'll see in
a couple of slides

107
00:05:39,660 --> 00:05:45,150
that the columns are the set of
weights from each input neuron

108
00:05:45,150 --> 00:05:47,790
to all the output neurons.

109
00:05:47,790 --> 00:05:51,990
A row of this weight matrix
is a vector of weights

110
00:05:51,990 --> 00:05:54,480
onto one of the output neurons.

111
00:05:57,460 --> 00:06:01,630
All right, so we can
compute the firing rates

112
00:06:01,630 --> 00:06:04,630
of the neurons in
our output layer

113
00:06:04,630 --> 00:06:08,590
for the case of linear
neurons in the output layer

114
00:06:08,590 --> 00:06:11,890
simply as a matrix
product of this weight

115
00:06:11,890 --> 00:06:15,550
vector times the vector
of input firing rates.

116
00:06:15,550 --> 00:06:18,640
And that matrix
multiplication gives us

117
00:06:18,640 --> 00:06:21,400
a vector that describes the
firing rates of the output

118
00:06:21,400 --> 00:06:22,460
layer.

119
00:06:22,460 --> 00:06:24,650
So let me just go through
what that looks like.

120
00:06:24,650 --> 00:06:28,780
If we define a column vector
of firing rates of each

121
00:06:28,780 --> 00:06:31,510
of the output neurons,
we can write that

122
00:06:31,510 --> 00:06:36,400
as the weight matrix times the
column vector of the firing

123
00:06:36,400 --> 00:06:38,740
rates of the input layer.

124
00:06:38,740 --> 00:06:42,550
We can calculate the firing
rate of the first neuron

125
00:06:42,550 --> 00:06:45,070
in the output layer
as the dot product

126
00:06:45,070 --> 00:06:48,790
of that row of the weight
matrix with that vector

127
00:06:48,790 --> 00:06:50,410
of firing rates, OK?

128
00:06:50,410 --> 00:06:54,460
And that gives us the
firing rate. v1 is then

129
00:06:54,460 --> 00:06:58,570
W of a equals 1 dot u.

130
00:06:58,570 --> 00:07:02,380
That is one particular
way of thinking

131
00:07:02,380 --> 00:07:07,220
about how you're calculating
the firing rates in the output

132
00:07:07,220 --> 00:07:07,720
layer.

133
00:07:07,720 --> 00:07:10,600
And it's called the dot
product interpretation

134
00:07:10,600 --> 00:07:12,880
of matrix multiplication,
all right?

135
00:07:12,880 --> 00:07:17,140
Now, there's a different
sort of complementary way

136
00:07:17,140 --> 00:07:18,850
of thinking about
what happens when

137
00:07:18,850 --> 00:07:21,970
you do this matrix
product that's also

138
00:07:21,970 --> 00:07:23,650
important to
understand, because it's

139
00:07:23,650 --> 00:07:27,910
a different way of thinking
about what's going on.

140
00:07:27,910 --> 00:07:32,600
We can also think about the
columns of this weight matrix.

141
00:07:32,600 --> 00:07:35,200
And we can think about
the weight matrix

142
00:07:35,200 --> 00:07:39,130
as a collection
of column vectors

143
00:07:39,130 --> 00:07:43,270
that we put together
into matrix form.

144
00:07:43,270 --> 00:07:45,930
So in this particular
network here,

145
00:07:45,930 --> 00:07:48,600
we can write down this
weight matrix, all right?

146
00:07:48,600 --> 00:07:51,510
And you can see that
this first input

147
00:07:51,510 --> 00:07:57,180
neuron connects to output neuron
one, so there's a one there.

148
00:07:57,180 --> 00:08:00,090
The first input neuron
connects to output neuron two,

149
00:08:00,090 --> 00:08:01,500
so there's a one there.

150
00:08:01,500 --> 00:08:05,070
The first input neuron does not
connect to output neuron three,

151
00:08:05,070 --> 00:08:08,870
so there's a zero there, OK?

152
00:08:08,870 --> 00:08:09,480
All right.

153
00:08:09,480 --> 00:08:12,360
So the columns of
the weight matrix

154
00:08:12,360 --> 00:08:16,800
represent the pattern
of projections from one

155
00:08:16,800 --> 00:08:20,280
of the input neurons to
all of the output neurons.

156
00:08:23,570 --> 00:08:27,200
All right, so let's
just take a look at what

157
00:08:27,200 --> 00:08:30,800
would happen if only one of
our input neurons was active

158
00:08:30,800 --> 00:08:33,650
and all the others were silent.

159
00:08:33,650 --> 00:08:35,110
So this neuron is active.

160
00:08:35,110 --> 00:08:39,716
What would the output
vector look like?

161
00:08:39,716 --> 00:08:41,299
What would the pattern
of firing rates

162
00:08:41,299 --> 00:08:46,630
look like for the output
neurons in this case?

163
00:08:46,630 --> 00:08:47,130
Anybody?

164
00:08:47,130 --> 00:08:48,290
It's straightforward.

165
00:08:48,290 --> 00:08:49,373
It's not a trick question.

166
00:08:53,110 --> 00:08:53,885
[INAUDIBLE]?

167
00:09:00,535 --> 00:09:01,428
AUDIENCE: So--

168
00:09:01,428 --> 00:09:03,720
MICHALE FEE: If this neuron
is firing and these weights

169
00:09:03,720 --> 00:09:04,645
are all one or zero.

170
00:09:07,293 --> 00:09:09,482
AUDIENCE: The one neuron, a--

171
00:09:09,482 --> 00:09:10,190
MICHALE FEE: Yes?

172
00:09:10,190 --> 00:09:10,730
This--

173
00:09:10,730 --> 00:09:11,240
AUDIENCE: Yeah, [INAUDIBLE].

174
00:09:11,240 --> 00:09:13,032
MICHALE FEE: --would
fire, this would fire,

175
00:09:13,032 --> 00:09:14,440
and that would not fire, right?

176
00:09:14,440 --> 00:09:17,060
Good.

177
00:09:17,060 --> 00:09:20,460
So you can write that out
as a matrix multiplication.

178
00:09:20,460 --> 00:09:23,300
So the firing rate
vector, in this case,

179
00:09:23,300 --> 00:09:28,310
would be the dot product of
this with this, this with this,

180
00:09:28,310 --> 00:09:29,570
and that with that.

181
00:09:29,570 --> 00:09:33,200
And what you would see is
that the output firing rate

182
00:09:33,200 --> 00:09:42,390
vector would look like this
first column of the weight

183
00:09:42,390 --> 00:09:43,430
matrix.

184
00:09:43,430 --> 00:09:45,860
So the output vector
would look like 1,

185
00:09:45,860 --> 00:09:50,040
1, 0 if only the first
neuron were active.

186
00:09:50,040 --> 00:09:55,320
So you can think of the
output firing rate vector

187
00:09:55,320 --> 00:09:59,940
as being a contribution
from neuron one--

188
00:09:59,940 --> 00:10:01,980
and that contribution
from neuron one

189
00:10:01,980 --> 00:10:05,910
is simply the first column
of the weight matrix--

190
00:10:05,910 --> 00:10:09,330
plus a contribution
from neuron two,

191
00:10:09,330 --> 00:10:12,720
which is given by the
second column of the weight

192
00:10:12,720 --> 00:10:16,510
matrix, and a contribution
from input neuron three,

193
00:10:16,510 --> 00:10:20,490
which is given by the third
column of the weight matrix,

194
00:10:20,490 --> 00:10:21,120
OK?

195
00:10:21,120 --> 00:10:26,010
So you can think of the
output firing rate vector

196
00:10:26,010 --> 00:10:29,670
as being a linear
combination of a contribution

197
00:10:29,670 --> 00:10:32,910
from the first
neuron, a contribution

198
00:10:32,910 --> 00:10:34,980
from the second neuron,
and a contribution

199
00:10:34,980 --> 00:10:36,060
from the third neuron.

200
00:10:36,060 --> 00:10:38,760
Does that make sense?

201
00:10:38,760 --> 00:10:41,970
It's a different way
of thinking about it.

202
00:10:41,970 --> 00:10:45,060
In the dot product
interpretation,

203
00:10:45,060 --> 00:10:51,210
we're asking, what is the--

204
00:10:51,210 --> 00:10:53,190
we're summing up
all of the weights

205
00:10:53,190 --> 00:10:57,150
onto neuron one
from those synapses.

206
00:10:57,150 --> 00:10:59,940
We're summing up all the
weights onto neuron two

207
00:10:59,940 --> 00:11:02,610
from those synapses and summing
up all the weights onto neuron

208
00:11:02,610 --> 00:11:04,240
three from those synapses.

209
00:11:04,240 --> 00:11:08,900
So we're doing it one
output neuron at a time.

210
00:11:08,900 --> 00:11:12,230
In this other interpretation
of this matrix multiplication,

211
00:11:12,230 --> 00:11:13,850
we're doing something different.

212
00:11:13,850 --> 00:11:17,780
We're asking, what is the
contribution to the output

213
00:11:17,780 --> 00:11:20,240
from one of the input neurons?

214
00:11:20,240 --> 00:11:24,620
What is the contribution to
the output from another input

215
00:11:24,620 --> 00:11:25,260
neuron?

216
00:11:25,260 --> 00:11:27,110
And what is the
contribution to the output

217
00:11:27,110 --> 00:11:29,270
from yet another input neuron?

218
00:11:29,270 --> 00:11:31,110
Does that makes sense?

219
00:11:31,110 --> 00:11:32,900
OK.

220
00:11:32,900 --> 00:11:35,510
All right, so we have
a linear combination

221
00:11:35,510 --> 00:11:38,650
of contributions from each
of those input neurons.

222
00:11:41,910 --> 00:11:45,390
And that's called the outer
product interpretation.

223
00:11:45,390 --> 00:11:47,880
I'm not going to explain right
now why it's called that,

224
00:11:47,880 --> 00:11:50,470
but that's how
that's referred to.

225
00:11:50,470 --> 00:11:53,700
So the output pattern
is a linear combination

226
00:11:53,700 --> 00:11:55,360
of contributions.

227
00:11:55,360 --> 00:12:03,660
OK, so let's take a look at
the effect of some very simple

228
00:12:03,660 --> 00:12:04,830
feed-forward networks, OK?

229
00:12:04,830 --> 00:12:08,210
So let's just look
at a few examples.

230
00:12:08,210 --> 00:12:10,270
So if we have a
feed forward-- this

231
00:12:10,270 --> 00:12:12,940
is sort of the simplest
feed-forward network.

232
00:12:12,940 --> 00:12:16,710
Each neuron in the
input layer connects

233
00:12:16,710 --> 00:12:20,230
to one neuron in the output
layer with a weight of one.

234
00:12:20,230 --> 00:12:24,330
So what is the weight
matrix of this network?

235
00:12:24,330 --> 00:12:25,190
AUDIENCE: Identity.

236
00:12:25,190 --> 00:12:26,920
MICHALE FEE: It's
the identity matrix.

237
00:12:26,920 --> 00:12:29,820
And so the firing rate
of the output layer

238
00:12:29,820 --> 00:12:33,030
will be exactly the same as
the firing rates in the input

239
00:12:33,030 --> 00:12:33,720
layer, OK?

240
00:12:33,720 --> 00:12:37,920
So there's the weight matrix,
which is just the identity

241
00:12:37,920 --> 00:12:40,380
matrix, the firing rate.

242
00:12:40,380 --> 00:12:43,710
And the output layer is
just the identity matrix

243
00:12:43,710 --> 00:12:45,690
times the firing rate
of the input layer.

244
00:12:45,690 --> 00:12:49,650
And so that's equal to
the input firing rate, OK?

245
00:12:49,650 --> 00:12:55,100
All right, let's take a
slightly more complex network,

246
00:12:55,100 --> 00:12:58,280
and let's make each one of
those weights independent.

247
00:12:58,280 --> 00:12:59,810
They're not all
just equal to one,

248
00:12:59,810 --> 00:13:02,480
but they're scaled
by some constant--

249
00:13:02,480 --> 00:13:05,780
lambda 1, lambda
2, and lambda 3.

250
00:13:05,780 --> 00:13:08,040
The weight matrix
looks like this.

251
00:13:08,040 --> 00:13:10,400
It's a diagonal matrix,
where each of those weights

252
00:13:10,400 --> 00:13:13,070
is on the diagonal.

253
00:13:13,070 --> 00:13:16,250
And in that case, you can
see that the output firing

254
00:13:16,250 --> 00:13:19,160
rate is just this diagonal
matrix times the input firing

255
00:13:19,160 --> 00:13:20,190
rate.

256
00:13:20,190 --> 00:13:24,260
And you can see that the
output firing rate is just

257
00:13:24,260 --> 00:13:27,680
the input firing rate where each
component of the input firing

258
00:13:27,680 --> 00:13:29,855
rate is scaled by some constant.

259
00:13:32,360 --> 00:13:35,960
Pretty straightforward.

260
00:13:35,960 --> 00:13:42,230
Let's take a look at a case
where the weight matrix now

261
00:13:42,230 --> 00:13:44,660
corresponds to a
rotation matrix, OK?

262
00:13:44,660 --> 00:13:49,430
So we're going to let the weight
matrix look like this rotation

263
00:13:49,430 --> 00:13:52,100
matrix that we talked
about on Tuesday, where

264
00:13:52,100 --> 00:13:56,930
are the diagonal elements are
cosine of sum rotation angle,

265
00:13:56,930 --> 00:13:59,780
and the off-diagonal elements
are plus and minus sine

266
00:13:59,780 --> 00:14:01,850
of the rotation angle.

267
00:14:01,850 --> 00:14:05,150
So you can see that this
weight matrix corresponds

268
00:14:05,150 --> 00:14:10,280
to this network, where the
projection from input neuron

269
00:14:10,280 --> 00:14:13,820
one to output neuron
one is cosine phi.

270
00:14:13,820 --> 00:14:16,490
Input neuron two to output
neuron two is cosine phi.

271
00:14:16,490 --> 00:14:21,110
And then these cross-connections
are a plus and minus sine phi.

272
00:14:21,110 --> 00:14:23,540
OK, so what does that do?

273
00:14:23,540 --> 00:14:28,950
So we can see that the output
firing rate vector is just

274
00:14:28,950 --> 00:14:32,100
a product of this rotation
matrix times the input firing

275
00:14:32,100 --> 00:14:32,970
rate vector.

276
00:14:32,970 --> 00:14:36,860
And you can write down
each component like that.

277
00:14:36,860 --> 00:14:39,340
All right, so what does that do?

278
00:14:39,340 --> 00:14:41,310
So let's take a
particular rotation angle.

279
00:14:41,310 --> 00:14:44,280
We're going to take a
rotation angle of pi

280
00:14:44,280 --> 00:14:46,930
over 4, which is 45 degrees.

281
00:14:46,930 --> 00:14:49,930
That's what the weight
matrix looks like.

282
00:14:49,930 --> 00:14:53,010
And we can do that
multiplication

283
00:14:53,010 --> 00:14:58,880
to find that the output firing
rate vector looks like--

284
00:14:58,880 --> 00:15:01,880
one of the neurons
has a firing rate

285
00:15:01,880 --> 00:15:06,500
that looks like the sum of
the two input firing rates,

286
00:15:06,500 --> 00:15:10,310
and the other output
neuron has a firing rate

287
00:15:10,310 --> 00:15:14,220
that looks like the difference
between the two input firing

288
00:15:14,220 --> 00:15:15,463
rates.

289
00:15:15,463 --> 00:15:16,880
And if you look
at what this looks

290
00:15:16,880 --> 00:15:22,310
like in the space of
firing rates of the input

291
00:15:22,310 --> 00:15:26,620
layer and the output layer,
we can see what happens, OK?

292
00:15:26,620 --> 00:15:29,720
So what we'll often
do when we look

293
00:15:29,720 --> 00:15:31,970
at the behavior
of neural networks

294
00:15:31,970 --> 00:15:35,930
is we'll make a
plot of the firing

295
00:15:35,930 --> 00:15:38,600
rates of the different
neurons in the network.

296
00:15:38,600 --> 00:15:42,620
And what we'll often do for
simple feed-forward networks,

297
00:15:42,620 --> 00:15:45,230
and we'll also do this
for recurrent networks,

298
00:15:45,230 --> 00:15:53,890
is we'll plot the input firing
rates as in the plane of u1

299
00:15:53,890 --> 00:15:56,010
and u2.

300
00:15:56,010 --> 00:16:00,670
And then we can plot the output
firing rates in the same plane.

301
00:16:00,670 --> 00:16:05,370
So, for example, if we
have an input state that

302
00:16:05,370 --> 00:16:10,080
looks like u1 equals u2,
it will be some point

303
00:16:10,080 --> 00:16:11,520
on this diagonal line.

304
00:16:14,100 --> 00:16:17,850
We can then plot the
output firing rate

305
00:16:17,850 --> 00:16:21,270
on this plane, v1 versus v2.

306
00:16:21,270 --> 00:16:26,540
And what will the output
firing rate look like?

307
00:16:26,540 --> 00:16:30,190
What will the firing rate of
v1 look like in this case?

308
00:16:34,910 --> 00:16:36,810
AUDIENCE: [INAUDIBLE]

309
00:16:36,810 --> 00:16:41,410
MICHALE FEE: Yeah let's
say this is one and one.

310
00:16:41,410 --> 00:16:44,140
So what will the firing rate
of this neuron look like?

311
00:16:44,140 --> 00:16:44,852
[INAUDIBLE]?

312
00:16:44,852 --> 00:16:46,398
AUDIENCE: [INAUDIBLE]

313
00:16:46,398 --> 00:16:47,440
MICHALE FEE: What's that?

314
00:16:47,440 --> 00:16:48,340
AUDIENCE: [INAUDIBLE]

315
00:16:48,340 --> 00:16:50,410
MICHALE FEE: So the
firing rate of v1

316
00:16:50,410 --> 00:16:52,720
is just this quantity
right here, right?

317
00:16:52,720 --> 00:16:56,690
So it's u1 plus u2, right?

318
00:16:56,690 --> 00:17:00,920
So it's like 1
plus 1 over root 2.

319
00:17:00,920 --> 00:17:02,450
So it will be big.

320
00:17:02,450 --> 00:17:06,109
What will the firing rate
of neuron v2 look like?

321
00:17:06,109 --> 00:17:09,079
It'll be u2 minus u1, which is?

322
00:17:09,079 --> 00:17:09,810
AUDIENCE: Zero.

323
00:17:09,810 --> 00:17:10,700
MICHALE FEE: Zero.

324
00:17:10,700 --> 00:17:15,300
So it will be over here, right?

325
00:17:15,300 --> 00:17:20,579
So it will be that input
rotated by 45 degrees.

326
00:17:24,290 --> 00:17:26,089
And input down here--

327
00:17:26,089 --> 00:17:30,110
so the firing rate of the one
will be the sum of those two.

328
00:17:30,110 --> 00:17:32,220
Those two inputs
are both negative.

329
00:17:32,220 --> 00:17:37,940
So v1 for this input
will be big and negative.

330
00:17:37,940 --> 00:17:41,960
And v2 will be the
difference of u1 and u2,

331
00:17:41,960 --> 00:17:44,785
which for anything
on this line is?

332
00:17:44,785 --> 00:17:45,410
AUDIENCE: Zero.

333
00:17:45,410 --> 00:17:46,160
MICHALE FEE: Zero.

334
00:17:48,430 --> 00:17:49,840
OK.

335
00:17:49,840 --> 00:17:54,680
And so that input will
be rotated over to here.

336
00:17:54,680 --> 00:17:58,160
So you can think
of it this way--

337
00:17:58,160 --> 00:18:05,110
any input in this space of
u1 and u2, in the output

338
00:18:05,110 --> 00:18:10,880
will be just rotate by, in this
case, it's minus 45 degrees.

339
00:18:10,880 --> 00:18:15,220
So that's clockwise,
are the minus rotations.

340
00:18:15,220 --> 00:18:19,240
So you can just predict the
output firing rates simply

341
00:18:19,240 --> 00:18:23,170
by taking the input
firing rates in this plane

342
00:18:23,170 --> 00:18:26,740
and rotating them
by minus 45 degrees.

343
00:18:26,740 --> 00:18:28,520
All right, any
questions about that?

344
00:18:28,520 --> 00:18:31,610
It's very simple.

345
00:18:31,610 --> 00:18:34,990
So this little neural
network implements

346
00:18:34,990 --> 00:18:41,280
rotations of this input space.

347
00:18:41,280 --> 00:18:42,170
That's pretty cool.

348
00:18:51,370 --> 00:18:54,190
Why would you want a
network to do rotations?

349
00:18:54,190 --> 00:18:57,040
Well, this solves
exactly the problem

350
00:18:57,040 --> 00:18:59,890
that we were
working on last time

351
00:18:59,890 --> 00:19:02,440
when we were talking about
our perceptron, where we were

352
00:19:02,440 --> 00:19:07,450
trying to classify stimuli
that could not be separated

353
00:19:07,450 --> 00:19:10,330
in one dimension,
but rather, can

354
00:19:10,330 --> 00:19:12,050
be separated in two dimensions.

355
00:19:12,050 --> 00:19:14,890
So if we have
different categories--

356
00:19:14,890 --> 00:19:17,050
dogs and non-dogs--

357
00:19:17,050 --> 00:19:22,190
that can be viewed along
different dimensions--

358
00:19:22,190 --> 00:19:24,820
how furry they are--

359
00:19:24,820 --> 00:19:26,810
but can't be separated--

360
00:19:26,810 --> 00:19:30,640
the two categories can't be
separated from each other

361
00:19:30,640 --> 00:19:35,260
on the basis of just one
dimension of observation.

362
00:19:35,260 --> 00:19:42,580
So in this case, what we want to
do is take this base of inputs

363
00:19:42,580 --> 00:19:49,120
and rotate it into a new what
we'll call a new basis set

364
00:19:49,120 --> 00:19:53,620
so that now we can take the
firing rates of these output

365
00:19:53,620 --> 00:19:59,090
neurons and use
those to separate

366
00:19:59,090 --> 00:20:01,935
these different categories
from each other.

367
00:20:01,935 --> 00:20:02,810
Does that make sense?

368
00:20:07,660 --> 00:20:09,980
OK, so let me show you a
few more examples of that.

369
00:20:12,670 --> 00:20:15,490
So this is one way to
think about what we

370
00:20:15,490 --> 00:20:18,120
do when we do color vision, OK?

371
00:20:18,120 --> 00:20:23,320
So you know that we have
different cones in our retina

372
00:20:23,320 --> 00:20:26,590
that are sensitive to
different wavelengths.

373
00:20:26,590 --> 00:20:31,020
Most colors are combinations
of those wavelengths.

374
00:20:31,020 --> 00:20:35,040
So if we look at the
activity of, let's say,

375
00:20:35,040 --> 00:20:39,060
a cone that's sensitive to
wavelength one and the activity

376
00:20:39,060 --> 00:20:45,485
in a cone that's sensitive to
wavelength two, we might see--

377
00:20:45,485 --> 00:20:47,180
and then we look
around the world.

378
00:20:47,180 --> 00:20:49,370
We'll see a bunch
of different objects

379
00:20:49,370 --> 00:20:51,590
or a bunch of
different stimuli that

380
00:20:51,590 --> 00:20:54,815
activate those two different
cones in different ratios.

381
00:20:57,750 --> 00:21:03,120
And you might imagine that this
axis corresponds to, let's say,

382
00:21:03,120 --> 00:21:05,290
how much red there
is in a stimulus.

383
00:21:05,290 --> 00:21:07,530
This axis corresponds
to how much green

384
00:21:07,530 --> 00:21:08,970
there is in a stimulus.

385
00:21:08,970 --> 00:21:11,190
But let's say that you're
in an environment where

386
00:21:11,190 --> 00:21:15,460
there's some cloud of
contribution of red and green.

387
00:21:15,460 --> 00:21:19,530
So what would this direction
correspond to in this cloud?

388
00:21:24,870 --> 00:21:28,860
This direction corresponds
to more red and more green.

389
00:21:28,860 --> 00:21:32,465
What would that correspond to?

390
00:21:32,465 --> 00:21:34,693
AUDIENCE: Brown.

391
00:21:34,693 --> 00:21:36,610
MICHALE FEE: So what I'm
trying to get at here

392
00:21:36,610 --> 00:21:39,520
is that the sum of
those two is sort

393
00:21:39,520 --> 00:21:42,040
of the brightness of
the object, right?

394
00:21:42,040 --> 00:21:45,540
Something that has little
red and little green

395
00:21:45,540 --> 00:21:47,890
will look the same color
as something that has

396
00:21:47,890 --> 00:21:50,620
more red and more green, right?

397
00:21:50,620 --> 00:21:52,510
But what's different
about those two stimuli

398
00:21:52,510 --> 00:21:55,120
is that the one's
brighter than the other.

399
00:21:55,120 --> 00:21:57,860
The second one is brighter
than the first one.

400
00:21:57,860 --> 00:22:02,810
But this dimension
corresponds to what?

401
00:22:02,810 --> 00:22:06,560
Differences in the ratio
of those two colors, right?

402
00:22:06,560 --> 00:22:11,870
Sort of changes in the different
[AUDIO OUT] wavelengths,

403
00:22:11,870 --> 00:22:14,460
and that corresponds to color.

404
00:22:14,460 --> 00:22:17,780
So if we can take
this base of stimuli

405
00:22:17,780 --> 00:22:21,230
and rotate it such that
one axis corresponds

406
00:22:21,230 --> 00:22:25,490
to the sum of the two colors
and the other axis corresponds

407
00:22:25,490 --> 00:22:27,920
to the difference
of the two colors,

408
00:22:27,920 --> 00:22:31,310
then this axis will tell
you how bright it is,

409
00:22:31,310 --> 00:22:35,560
and this axis will tell you what
the hue is, what the color is.

410
00:22:35,560 --> 00:22:36,660
Does that makes sense?

411
00:22:36,660 --> 00:22:38,750
So there's a simple
case of where

412
00:22:38,750 --> 00:22:46,220
taking a rotation of a inputs
base, of a set of sensors,

413
00:22:46,220 --> 00:22:48,470
will give you
different information

414
00:22:48,470 --> 00:22:53,060
than you would get if you
just had one of those stimuli.

415
00:22:53,060 --> 00:22:58,070
If you were to just look at
the activity of the cone that's

416
00:22:58,070 --> 00:23:01,670
giving you a red
signal, if one object

417
00:23:01,670 --> 00:23:03,830
has more activity
in that cone, you

418
00:23:03,830 --> 00:23:07,220
don't know whether that
other object is just brighter

419
00:23:07,220 --> 00:23:11,220
or if it's actually more
red, that looked red.

420
00:23:11,220 --> 00:23:12,740
Does that makes sense?

421
00:23:12,740 --> 00:23:18,770
So doing a rotation gives
us signals in single neurons

422
00:23:18,770 --> 00:23:20,480
that carries useful information.

423
00:23:20,480 --> 00:23:24,520
It can disambiguate different
kinds of information.

424
00:23:24,520 --> 00:23:27,740
All right, so we can use
that simple rotation matrix

425
00:23:27,740 --> 00:23:32,360
to perform that
kind of separation.

426
00:23:32,360 --> 00:23:36,100
So brightness and color.

427
00:23:36,100 --> 00:23:38,860
Here's another example.

428
00:23:38,860 --> 00:23:45,070
I didn't get to talk about this
in this class, but there are--

429
00:23:45,070 --> 00:23:53,400
so barn owls, they can very
exquisitely localize objects

430
00:23:53,400 --> 00:23:56,100
by sound.

431
00:23:56,100 --> 00:23:59,100
So they hunt, essentially,
at night in the dark.

432
00:23:59,100 --> 00:24:04,800
They can hear a mouse
scurrying around in the grass.

433
00:24:04,800 --> 00:24:07,710
They just listen to
that sound, and they

434
00:24:07,710 --> 00:24:10,950
can tell exactly where it
is, and then they dive down

435
00:24:10,950 --> 00:24:13,500
and catch the mouse.

436
00:24:13,500 --> 00:24:15,160
So how did they do that?

437
00:24:15,160 --> 00:24:17,130
Well, they used
timing differences

438
00:24:17,130 --> 00:24:21,750
to tell which way the sound
is coming from side to side,

439
00:24:21,750 --> 00:24:23,730
and they use
intensity differences

440
00:24:23,730 --> 00:24:26,200
to tell which way the sound
is coming from up and down.

441
00:24:26,200 --> 00:24:27,950
Now, how do you use
intensity differences?

442
00:24:27,950 --> 00:24:30,570
Well, one of their
ears, their right ear

443
00:24:30,570 --> 00:24:32,950
pointed slightly upwards.

444
00:24:32,950 --> 00:24:35,710
And their left ear is
pointed slightly downwards.

445
00:24:35,710 --> 00:24:38,770
So when they hear a sound
that's slightly louder

446
00:24:38,770 --> 00:24:42,040
in the right ear and slightly
softer in the left ear,

447
00:24:42,040 --> 00:24:45,550
they know that it's coming
from up above, right?

448
00:24:45,550 --> 00:24:47,195
And if it's the
other way around,

449
00:24:47,195 --> 00:24:49,480
if it's slightly louder
in the left ear and softer

450
00:24:49,480 --> 00:24:55,550
in the right ear, they know it's
coming from below horizontal.

451
00:24:55,550 --> 00:24:59,060
And it's extremely
precise system, OK?

452
00:24:59,060 --> 00:25:00,450
So here's an example.

453
00:25:00,450 --> 00:25:02,390
So if they're sitting
there listening

454
00:25:02,390 --> 00:25:06,030
to the intensity, the amplitude
of the sound in the left ear

455
00:25:06,030 --> 00:25:10,040
and the amplitude of the
sound in the right ear,

456
00:25:10,040 --> 00:25:14,390
some sounds will be up here with
high amplitude in both ears.

457
00:25:14,390 --> 00:25:16,970
Some sounds will be
over here, with more

458
00:25:16,970 --> 00:25:22,070
amplitude in the right ear and
less amplitude in the left ear.

459
00:25:22,070 --> 00:25:24,800
What does this
dimension correspond to?

460
00:25:24,800 --> 00:25:27,764
That dimension corresponds to?

461
00:25:27,764 --> 00:25:28,620
AUDIENCE: Proximity.

462
00:25:28,620 --> 00:25:30,390
MICHALE FEE:
Proximity or, overall,

463
00:25:30,390 --> 00:25:32,610
the loudness of
the sound, right?

464
00:25:32,610 --> 00:25:36,060
And what does this
dimension correspond to?

465
00:25:36,060 --> 00:25:37,650
AUDIENCE: Direction.

466
00:25:37,650 --> 00:25:40,060
MICHALE FEE: The difference
in intensity corresponds

467
00:25:40,060 --> 00:25:47,830
to the elevation of the sound
relative to the horizontal.

468
00:25:47,830 --> 00:25:48,330
All right?

469
00:25:48,330 --> 00:25:52,200
So, in fact, what happens
in the owl's brain

470
00:25:52,200 --> 00:25:57,540
is that these two signals
undergo a rotation to produce

471
00:25:57,540 --> 00:26:01,080
activity in some neurons
that's sensitive to the overall

472
00:26:01,080 --> 00:26:04,830
loudness and activity
in other neurons that's

473
00:26:04,830 --> 00:26:09,190
sensitive to the difference
between the intensity

474
00:26:09,190 --> 00:26:10,570
of the two sounds.

475
00:26:10,570 --> 00:26:14,560
It's a measure of the
elevation of the sounds.

476
00:26:14,560 --> 00:26:17,590
All right, so this
kind of rotation matrix

477
00:26:17,590 --> 00:26:22,060
is very useful for
projecting stimuli

478
00:26:22,060 --> 00:26:26,530
into the right dimension so
that they give useful signals.

479
00:26:33,320 --> 00:26:38,600
All right, so let's come back
to our matrix transformations

480
00:26:38,600 --> 00:26:41,060
and look in a little
bit more detail

481
00:26:41,060 --> 00:26:43,640
about what kinds
of transformations

482
00:26:43,640 --> 00:26:45,270
you can do with matrices.

483
00:26:45,270 --> 00:26:50,720
So we talked about
how matrices can do

484
00:26:50,720 --> 00:26:53,630
stretch, compression, rotation.

485
00:26:53,630 --> 00:26:56,660
And we're going to talk about
a new kind of transformation

486
00:26:56,660 --> 00:26:59,880
that they can do.

487
00:26:59,880 --> 00:27:05,070
So you remember we talked about
how a matrix multiplication

488
00:27:05,070 --> 00:27:08,580
implements a transformation
from one set of vectors

489
00:27:08,580 --> 00:27:10,500
into another set of vectors?

490
00:27:10,500 --> 00:27:14,250
And the inverse of that
matrix transforms back

491
00:27:14,250 --> 00:27:17,820
to the original
set of vectors, OK?

492
00:27:17,820 --> 00:27:19,870
So you can make
a transformation,

493
00:27:19,870 --> 00:27:21,900
and then you can undo
that transformation

494
00:27:21,900 --> 00:27:25,680
by multiplying by the
inverse of the matrix.

495
00:27:25,680 --> 00:27:29,760
OK, so we talked about different
kinds of transformations

496
00:27:29,760 --> 00:27:31,380
that you can do.

497
00:27:31,380 --> 00:27:33,790
So if you take the
identity matrix

498
00:27:33,790 --> 00:27:35,580
and you make a
small perturbation

499
00:27:35,580 --> 00:27:38,730
to both of the diagonal
elements, the same perturbation

500
00:27:38,730 --> 00:27:40,860
to both diagonal
elements, you're basically

501
00:27:40,860 --> 00:27:43,620
taking a set of vectors
and you're stretching them

502
00:27:43,620 --> 00:27:45,810
uniformly in all directions.

503
00:27:45,810 --> 00:27:48,900
If you make a perturbation
to just one of the components

504
00:27:48,900 --> 00:27:51,540
of the identity matrix,
you can take the data

505
00:27:51,540 --> 00:27:55,200
and stretch it in one
direction or stretch it

506
00:27:55,200 --> 00:27:57,060
in the other direction.

507
00:27:57,060 --> 00:28:01,740
If you add something
to the first component

508
00:28:01,740 --> 00:28:04,020
and subtract something
from the second component,

509
00:28:04,020 --> 00:28:06,630
you can stretch in one
direction and compress

510
00:28:06,630 --> 00:28:08,710
in another direction.

511
00:28:08,710 --> 00:28:13,447
We talked about reflections and
inversions through the origin.

512
00:28:13,447 --> 00:28:15,030
These are all
transformations that are

513
00:28:15,030 --> 00:28:18,840
produced by diagonal matrices.

514
00:28:18,840 --> 00:28:22,260
And the inverse of
those diagonal matrices

515
00:28:22,260 --> 00:28:25,800
is just one over the
diagonal elements.

516
00:28:25,800 --> 00:28:28,170
OK, we also talked
about rotations

517
00:28:28,170 --> 00:28:30,960
that you can do with
this rotation matrix.

518
00:28:30,960 --> 00:28:34,380
And then the inverse
of the rotation matrix

519
00:28:34,380 --> 00:28:39,420
is, basically, you compute the
inverse of a rotation matrix

520
00:28:39,420 --> 00:28:41,610
simply by computing
the rotation matrix

521
00:28:41,610 --> 00:28:46,260
with a minus sign for
this, using the negative

522
00:28:46,260 --> 00:28:47,550
of the rotation angle.

523
00:28:50,750 --> 00:28:54,440
And we also talked about
how a rotation matrix--

524
00:28:54,440 --> 00:28:56,480
for a rotation
matrix, the inverse

525
00:28:56,480 --> 00:28:58,280
is also equal to the transpose.

526
00:28:58,280 --> 00:29:01,220
And the reason is
that rotation matrices

527
00:29:01,220 --> 00:29:04,460
have this antisymmetry, where
the off-diagonal elements have

528
00:29:04,460 --> 00:29:06,720
the opposite sign.

529
00:29:06,720 --> 00:29:09,680
One of the things we
haven't talked about is--

530
00:29:09,680 --> 00:29:15,950
so we talked about how
this kind of matrix

531
00:29:15,950 --> 00:29:20,960
can produce a stretch along
one dimension or a stretch

532
00:29:20,960 --> 00:29:24,560
along the other
dimension of the vectors.

533
00:29:24,560 --> 00:29:30,380
But one really important
kind of transformation

534
00:29:30,380 --> 00:29:35,180
that we need to understand is
how you can produce stretches

535
00:29:35,180 --> 00:29:37,460
in an arbitrary direction, OK?

536
00:29:37,460 --> 00:29:42,380
So not just along the x-axis or
along the y-axis, but along any

537
00:29:42,380 --> 00:29:44,990
arbitrary direction.

538
00:29:44,990 --> 00:29:48,410
And the reason we need
to know how that works

539
00:29:48,410 --> 00:29:53,240
is because that formulation
of how you write down a matrix

540
00:29:53,240 --> 00:29:56,780
to stretch data in any
arbitrary direction

541
00:29:56,780 --> 00:30:01,520
is the basis of a lot of
really important data analysis

542
00:30:01,520 --> 00:30:04,490
methods, including
principal component

543
00:30:04,490 --> 00:30:07,910
analysis and other methods.

544
00:30:07,910 --> 00:30:09,980
So I'm going to
walk you through how

545
00:30:09,980 --> 00:30:12,630
to think about making
stretches in data

546
00:30:12,630 --> 00:30:14,180
in arbitrary dimensions.

547
00:30:14,180 --> 00:30:18,380
OK, so here's what we're
going to walk through.

548
00:30:18,380 --> 00:30:20,000
Let's say we have
a set of vectors.

549
00:30:20,000 --> 00:30:21,093
I just picked--

550
00:30:21,093 --> 00:30:22,260
I don't know, what is that--

551
00:30:22,260 --> 00:30:25,940
20 or so random vectors.

552
00:30:25,940 --> 00:30:29,840
So I just called a random
number generator 20 times

553
00:30:29,840 --> 00:30:33,720
and just picked
20 random vectors.

554
00:30:33,720 --> 00:30:40,280
And we're going to figure out
how to write down a matrix that

555
00:30:40,280 --> 00:30:43,640
will transform
that set of vectors

556
00:30:43,640 --> 00:30:48,560
into another set of vectors that
stretched along some arbitrary

557
00:30:48,560 --> 00:30:50,870
axis.

558
00:30:50,870 --> 00:30:52,700
Does that make sense?

559
00:30:52,700 --> 00:30:55,450
So how do we do that?

560
00:30:55,450 --> 00:30:59,210
And remember, we know
how to do two things.

561
00:30:59,210 --> 00:31:03,070
We know how to stretch a set
of vectors along the x-axis.

562
00:31:03,070 --> 00:31:06,680
We know how to stretch
vectors along the y-axis,

563
00:31:06,680 --> 00:31:09,020
and we know how to
rotate a set of vectors.

564
00:31:09,020 --> 00:31:11,140
So we're just going
to combine those two

565
00:31:11,140 --> 00:31:14,500
ingredients to produce this
stretch in an arbitrary

566
00:31:14,500 --> 00:31:15,560
direction.

567
00:31:15,560 --> 00:31:18,290
So now I've given
you the recipe--

568
00:31:18,290 --> 00:31:20,200
or I've given you
the ingredients.

569
00:31:20,200 --> 00:31:21,900
The recipe's pretty
obvious, right?

570
00:31:21,900 --> 00:31:25,450
We're going to take this
set of initial vectors.

571
00:31:25,450 --> 00:31:26,150
Good.

572
00:31:26,150 --> 00:31:26,650
Lina?

573
00:31:26,650 --> 00:31:28,960
AUDIENCE: You [INAUDIBLE].

574
00:31:28,960 --> 00:31:29,890
That's it.

575
00:31:29,890 --> 00:31:30,880
MICHALE FEE: Bingo.

576
00:31:30,880 --> 00:31:32,170
That's it.

577
00:31:32,170 --> 00:31:33,630
OK, so we're going to take--

578
00:31:33,630 --> 00:31:36,400
all right, so we're going to
rotate this thing 45 degrees.

579
00:31:36,400 --> 00:31:38,500
We take this original
set of vectors.

580
00:31:38,500 --> 00:31:39,550
We're going to--

581
00:31:39,550 --> 00:31:41,830
OK, so first of
all, the first thing

582
00:31:41,830 --> 00:31:45,010
we do when we want to
take a set of points

583
00:31:45,010 --> 00:31:47,530
and stretch it along
an arbitrary direction,

584
00:31:47,530 --> 00:31:49,960
we pick that angle that
we want to stretch it

585
00:31:49,960 --> 00:31:51,880
on-- in this case, 45 degrees.

586
00:31:51,880 --> 00:31:55,150
And we write down a rotation
matrix corresponding

587
00:31:55,150 --> 00:31:58,490
to that rotation,
corresponding to that angle.

588
00:31:58,490 --> 00:32:01,780
So that's the first thing we do.

589
00:32:01,780 --> 00:32:04,550
So we've chosen 45
degrees as the angle

590
00:32:04,550 --> 00:32:06,040
we want to stretch on.

591
00:32:06,040 --> 00:32:08,080
So now we write down
a rotation matrix

592
00:32:08,080 --> 00:32:11,230
for a 45-degree rotation.

593
00:32:11,230 --> 00:32:12,730
Then what we're
going to do is we're

594
00:32:12,730 --> 00:32:15,820
going to take that set
of points and we're

595
00:32:15,820 --> 00:32:19,540
going to rotate it
by minus 45 degrees.

596
00:32:24,190 --> 00:32:27,210
So how do we do that?

597
00:32:27,210 --> 00:32:33,000
How do we take any one of those
vectors x and rotate it by--

598
00:32:33,000 --> 00:32:36,800
so this that rotation
matrix is for plus 45.

599
00:32:36,800 --> 00:32:41,322
How do we rotate that
vector by minus 45?

600
00:32:41,322 --> 00:32:44,150
AUDIENCE: [INAUDIBLE] multiply
it by the [INAUDIBLE]..

601
00:32:44,150 --> 00:32:44,900
MICHALE FEE: Good.

602
00:32:44,900 --> 00:32:45,760
Say it.

603
00:32:45,760 --> 00:32:47,510
AUDIENCE: Multiply by
the inverse of that.

604
00:32:47,510 --> 00:32:49,170
MICHALE FEE: Yeah, and
what's the inverse of a--

605
00:32:49,170 --> 00:32:49,770
AUDIENCE: Transpose.

606
00:32:49,770 --> 00:32:50,728
MICHALE FEE: Transpose.

607
00:32:50,728 --> 00:32:54,360
So we don't have to go to Matlab
and use the inverse matrix

608
00:32:54,360 --> 00:32:55,470
in inversion.

609
00:32:55,470 --> 00:32:58,560
We can just do the transpose.

610
00:32:58,560 --> 00:33:03,570
OK, so we take that vector and
we multiply it by transpose.

611
00:33:03,570 --> 00:33:06,060
So that does a minus
45-degree rotation

612
00:33:06,060 --> 00:33:08,082
of all of those points.

613
00:33:08,082 --> 00:33:09,040
And then what do we do?

614
00:33:13,290 --> 00:33:14,250
Lina, you said it.

615
00:33:14,250 --> 00:33:14,760
Stretch it.

616
00:33:14,760 --> 00:33:16,708
Stretch it along?

617
00:33:16,708 --> 00:33:19,180
AUDIENCE: The x-axis?

618
00:33:19,180 --> 00:33:21,520
MICHALE FEE: The x-axis, good.

619
00:33:21,520 --> 00:33:25,040
What does that matrix
look like that does that?

620
00:33:25,040 --> 00:33:27,234
Just give me-- yup?

621
00:33:27,234 --> 00:33:29,165
AUDIENCE: 5, 0, 0, 1.

622
00:33:29,165 --> 00:33:30,040
MICHALE FEE: Awesome.

623
00:33:30,040 --> 00:33:30,580
That's it.

624
00:33:30,580 --> 00:33:36,480
So we're going to stretch
using a stretch matrix.

625
00:33:36,480 --> 00:33:39,220
So I use phi for
a rotation matrix,

626
00:33:39,220 --> 00:33:42,830
and I use lambda for a
stretch matrix, a stretch

627
00:33:42,830 --> 00:33:45,940
matrix along x or y.

628
00:33:45,940 --> 00:33:48,790
Lambda is a diagonal
matrix, which always just

629
00:33:48,790 --> 00:33:52,910
stretches or compresses
along the x or y direction.

630
00:33:52,910 --> 00:33:55,045
And then what do we do?

631
00:33:55,045 --> 00:33:56,320
AUDIENCE: [INAUDIBLE]

632
00:33:56,320 --> 00:33:57,070
MICHALE FEE: Good.

633
00:33:57,070 --> 00:34:00,220
By multiplying by?

634
00:34:00,220 --> 00:34:01,690
By this.

635
00:34:01,690 --> 00:34:03,700
Excellent.

636
00:34:03,700 --> 00:34:04,210
That's all.

637
00:34:04,210 --> 00:34:05,890
So how do we write this down?

638
00:34:05,890 --> 00:34:09,370
So, remember, here, we're sort
of marching through the recipe

639
00:34:09,370 --> 00:34:12,520
from left to right.

640
00:34:12,520 --> 00:34:16,070
When you write down matrices,
you go the other way.

641
00:34:16,070 --> 00:34:18,070
So when you do matrix
multiplication,

642
00:34:18,070 --> 00:34:22,300
you take your vector x and you
multiply it on the left side

643
00:34:22,300 --> 00:34:25,929
by phi transpose.

644
00:34:25,929 --> 00:34:28,810
And then you take that and you
multiply that on the left side

645
00:34:28,810 --> 00:34:30,969
by lambda.

646
00:34:30,969 --> 00:34:33,020
And then you take that.

647
00:34:33,020 --> 00:34:34,719
That now gives you these.

648
00:34:34,719 --> 00:34:38,210
And now to get the
final answer here,

649
00:34:38,210 --> 00:34:42,565
you multiply again on
the left side by phi.

650
00:34:42,565 --> 00:34:44,409
That's it.

651
00:34:44,409 --> 00:34:47,230
That's how you produce
an arbitrary stretch--

652
00:34:47,230 --> 00:34:49,630
a stretch or a
compression of a data

653
00:34:49,630 --> 00:34:52,480
in an arbitrary
direction, all right?

654
00:34:52,480 --> 00:34:55,449
You take the data, the vector.

655
00:34:55,449 --> 00:34:59,200
You multiply it by a
rotation matrix transpose,

656
00:34:59,200 --> 00:35:02,560
multiply it by a stretch
matrix, a diagonal matrix,

657
00:35:02,560 --> 00:35:07,150
and you multiply it
by a rotation matrix.

658
00:35:07,150 --> 00:35:09,376
Rotate, stretch, unrotate.

659
00:35:14,860 --> 00:35:18,610
So let's actually do
this for 45 degrees.

660
00:35:18,610 --> 00:35:22,850
So there's our rotation matrix--

661
00:35:22,850 --> 00:35:27,470
1, minus 1, 1, 1.

662
00:35:27,470 --> 00:35:31,590
The transpose is
1, 1, minus 1, 1.

663
00:35:31,590 --> 00:35:33,670
And here's our stretch matrix.

664
00:35:33,670 --> 00:35:37,320
In this case, it was
stretched by a factor of two.

665
00:35:37,320 --> 00:35:44,250
So we multiply x by phi
transpose, multiply by lambda,

666
00:35:44,250 --> 00:35:46,680
and then multiply by phi.

667
00:35:46,680 --> 00:35:49,320
So we can now write that down.

668
00:35:49,320 --> 00:35:51,600
If you just do
those three matrix

669
00:35:51,600 --> 00:35:54,580
multiplications-- those two
matrix multiplications, sorry,

670
00:35:54,580 --> 00:35:55,080
yes?

671
00:35:55,080 --> 00:35:56,370
One, two.

672
00:35:56,370 --> 00:35:58,920
Two matrix multiplications.

673
00:35:58,920 --> 00:36:02,797
You get a single matrix
that when you multiply it by

674
00:36:02,797 --> 00:36:05,175
x implements this stretch.

675
00:36:08,040 --> 00:36:10,000
Any questions about that?

676
00:36:10,000 --> 00:36:12,910
You should ask me now
if you don't understand,

677
00:36:12,910 --> 00:36:16,780
because I want you to be able
to do this for an arbitrary--

678
00:36:16,780 --> 00:36:20,980
so I'm going to
give you some angle,

679
00:36:20,980 --> 00:36:25,090
and I'll tell you,
construct a matrix that

680
00:36:25,090 --> 00:36:32,290
stretches data along a 30-degree
axis by a factor of five.

681
00:36:32,290 --> 00:36:35,410
You should be able to
write down that matrix.

682
00:36:35,410 --> 00:36:37,510
All right, so this is
what you're going to do,

683
00:36:37,510 --> 00:36:41,710
and that's what that matrix will
look like, something like that.

684
00:36:41,710 --> 00:36:48,880
Now, we can stretch these
data along a 45-degree axis

685
00:36:48,880 --> 00:36:50,350
by some factor.

686
00:36:50,350 --> 00:36:52,430
It's a factor of two here.

687
00:36:52,430 --> 00:36:53,500
How do we go back?

688
00:36:53,500 --> 00:36:57,340
How do we undo that stretch?

689
00:36:57,340 --> 00:37:01,780
So how do you take the inverse
of a product of a bunch

690
00:37:01,780 --> 00:37:03,480
of matrices like this?

691
00:37:03,480 --> 00:37:05,600
So the answer is very simple.

692
00:37:05,600 --> 00:37:10,420
If we want to take the inverse
of a product of three matrices,

693
00:37:10,420 --> 00:37:13,570
what we do is we just--

694
00:37:13,570 --> 00:37:16,790
it's, again, a product
of three matrices.

695
00:37:16,790 --> 00:37:20,830
It's a product of the inverse
of those three matrices,

696
00:37:20,830 --> 00:37:22,880
but you have to
reverse the order.

697
00:37:22,880 --> 00:37:25,660
So if you want to find the
inverse of matrix A times B

698
00:37:25,660 --> 00:37:31,270
times C, it's C inverse times
B inverse times A inverse.

699
00:37:31,270 --> 00:37:34,970
And you can prove that that's
the right term as follows.

700
00:37:34,970 --> 00:37:42,520
So ABC inverse times ABC should
be the identity matrix, right?

701
00:37:42,520 --> 00:37:49,150
So let's replace this
by this result here.

702
00:37:49,150 --> 00:37:52,020
So C inverse B inverse
A inverse times

703
00:37:52,020 --> 00:37:54,790
ABC would be the
identity matrix.

704
00:37:54,790 --> 00:38:01,340
And you can see that right
here, A inverse times A is i.

705
00:38:01,340 --> 00:38:03,290
So you can get rid of that.

706
00:38:03,290 --> 00:38:06,740
B inverse times B is i.

707
00:38:06,740 --> 00:38:11,480
C inverse times C is i.

708
00:38:11,480 --> 00:38:14,900
So we just proved that
that is the correct way

709
00:38:14,900 --> 00:38:18,070
of taking the inverse of a
product of matrices, all right?

710
00:38:20,650 --> 00:38:26,050
So the inverse of
this kind of matrix

711
00:38:26,050 --> 00:38:30,210
that stretches data along
an arbitrary direction

712
00:38:30,210 --> 00:38:31,080
looks like this.

713
00:38:31,080 --> 00:38:37,050
It's phi transpose inverse
lambda inverse phi inverse.

714
00:38:37,050 --> 00:38:40,320
So let's figure out what
each one of those things is.

715
00:38:40,320 --> 00:38:44,100
So what is phi
transpose inverse,

716
00:38:44,100 --> 00:38:46,228
where phi is a rotation matrix?

717
00:38:46,228 --> 00:38:47,020
AUDIENCE: Just phi.

718
00:38:47,020 --> 00:38:48,940
MICHALE FEE: Phi, good.

719
00:38:48,940 --> 00:38:51,171
And what is phi inverse?

720
00:38:51,171 --> 00:38:52,464
AUDIENCE: [INAUDIBLE]

721
00:38:52,464 --> 00:38:53,760
MICHALE FEE: [INAUDIBLE].

722
00:38:53,760 --> 00:38:55,190
Good.

723
00:38:55,190 --> 00:38:58,320
And lambda inverse we'll
get to in a second.

724
00:38:58,320 --> 00:39:05,300
So the inverse of this
arbitrary rotated stretch matrix

725
00:39:05,300 --> 00:39:12,450
is just another rotated
stretch matrix, right?

726
00:39:12,450 --> 00:39:17,860
Where the lambda now has--

727
00:39:17,860 --> 00:39:21,370
lambda inverse is just
given by the inverse of each

728
00:39:21,370 --> 00:39:24,240
of those diagonal elements.

729
00:39:24,240 --> 00:39:28,815
So it's super easy to
find the inverse of one

730
00:39:28,815 --> 00:39:33,200
of these matrices that computes
this stretch in an arbitrary

731
00:39:33,200 --> 00:39:34,550
direction.

732
00:39:34,550 --> 00:39:36,800
You just keep the same phi.

733
00:39:36,800 --> 00:39:40,940
It's just phi times some
diagonal matrix times

734
00:39:40,940 --> 00:39:45,963
phi transpose, but the
diagonals are inverted.

735
00:39:45,963 --> 00:39:46,880
Does that makes sense?

736
00:39:49,700 --> 00:39:51,110
All right, so
let's write it out.

737
00:39:51,110 --> 00:39:55,550
We're going to undo this
45-degree stretch that we just

738
00:39:55,550 --> 00:39:56,520
did.

739
00:39:56,520 --> 00:40:02,060
We're going to do it by
rotating, stretching by 1/2

740
00:40:02,060 --> 00:40:04,520
instead of stretching by two.

741
00:40:04,520 --> 00:40:09,060
So you can see that compresses
now along the x-axis.

742
00:40:09,060 --> 00:40:10,790
And then we rotate
back, and we're back

743
00:40:10,790 --> 00:40:14,380
to our original data.

744
00:40:14,380 --> 00:40:17,110
Any questions about that?

745
00:40:17,110 --> 00:40:19,570
It's really easy,
as long as you just

746
00:40:19,570 --> 00:40:25,000
think through what you're doing
as you go through those steps,

747
00:40:25,000 --> 00:40:25,870
all right?

748
00:40:25,870 --> 00:40:26,970
Any questions about that?

749
00:40:31,200 --> 00:40:32,410
OK.

750
00:40:32,410 --> 00:40:32,910
Wow.

751
00:40:37,910 --> 00:40:38,690
All right.

752
00:40:38,690 --> 00:40:41,090
So you can actually
just write those down

753
00:40:41,090 --> 00:40:46,100
and compute the
single matrix that

754
00:40:46,100 --> 00:40:55,710
implements this compression
along that 45-degree axis, OK?

755
00:40:55,710 --> 00:40:56,210
All right.

756
00:41:00,040 --> 00:41:02,470
So let me just show
you one other example.

757
00:41:02,470 --> 00:41:04,930
And I'll show you
something interesting

758
00:41:04,930 --> 00:41:09,680
that happens if you construct
a matrix that instead

759
00:41:09,680 --> 00:41:13,400
of stretching along a
45-degree axis does compression

760
00:41:13,400 --> 00:41:16,130
along a 45-degree axis.

761
00:41:16,130 --> 00:41:18,690
So here's our original data.

762
00:41:18,690 --> 00:41:25,155
Let's take that data and
rotate it by plus 45 degrees.

763
00:41:28,100 --> 00:41:33,720
Multiplied by lambda, that
compresses along the x-axis

764
00:41:33,720 --> 00:41:39,150
and then rotates by
minus 45 degrees.

765
00:41:39,150 --> 00:41:44,670
So here's an example where we
can take data and compress it

766
00:41:44,670 --> 00:41:48,750
along an axis of minus
45 degrees, all right?

767
00:41:48,750 --> 00:41:50,130
So you can write this down.

768
00:41:50,130 --> 00:41:52,440
So we're going to say
we're going to compress

769
00:41:52,440 --> 00:41:54,630
along a minus 45 degree axis.

770
00:41:54,630 --> 00:41:57,450
We write down phi of minus 45.

771
00:41:57,450 --> 00:42:00,453
Notice that when you do this
compression or stretching,

772
00:42:00,453 --> 00:42:02,370
there are different ways
you can do it, right?

773
00:42:02,370 --> 00:42:03,930
You can take the data.

774
00:42:03,930 --> 00:42:08,190
You can rotate it this way and
then squish along this axis.

775
00:42:08,190 --> 00:42:12,090
Or you could rotate it this
way and squish along this axis,

776
00:42:12,090 --> 00:42:12,590
right?

777
00:42:15,770 --> 00:42:18,277
So there are choices
for how you do it.

778
00:42:18,277 --> 00:42:19,860
But in the end,
you're going to end up

779
00:42:19,860 --> 00:42:23,460
with the same matrix that
does all of those equivalent

780
00:42:23,460 --> 00:42:24,420
transformations.

781
00:42:24,420 --> 00:42:25,750
OK, so here we are.

782
00:42:25,750 --> 00:42:27,000
We're going to write this out.

783
00:42:27,000 --> 00:42:28,458
So we're writing
down a matrix that

784
00:42:28,458 --> 00:42:32,730
produces this compression
along a minus 45-degree axis.

785
00:42:32,730 --> 00:42:34,770
So there's 5 minus 45.

786
00:42:34,770 --> 00:42:37,720
There's lambda, a
compression along the x-axis.

787
00:42:37,720 --> 00:42:41,950
So here, it's 0.2001.

788
00:42:41,950 --> 00:42:44,250
And here's the phi transpose.

789
00:42:44,250 --> 00:42:52,310
So you write all that out, and
you get 0.6, 0.4, 0.4, 0.4.

790
00:42:52,310 --> 00:42:53,540
Let me show you one more.

791
00:42:56,240 --> 00:43:03,980
What happens if we accidentally
take this data, we rotate it,

792
00:43:03,980 --> 00:43:09,068
and then we squish
the data to zero?

793
00:43:09,068 --> 00:43:10,496
Yes?

794
00:43:10,496 --> 00:43:16,210
AUDIENCE: [INAUDIBLE]

795
00:43:16,210 --> 00:43:17,350
MICHALE FEE: It doesn't.

796
00:43:17,350 --> 00:43:18,340
You can do either one.

797
00:43:21,780 --> 00:43:22,490
Let me go back.

798
00:43:32,940 --> 00:43:34,690
Let me just go back
to the very first one.

799
00:43:37,680 --> 00:43:42,390
So here, we rotated
clockwise and then

800
00:43:42,390 --> 00:43:46,020
stretched along the
x-axis and then unrotated.

801
00:43:46,020 --> 00:43:51,930
We could have taken these
data, rotated counterclockwise,

802
00:43:51,930 --> 00:43:56,695
stretched along the y-axis,
and then rotated back, right?

803
00:43:56,695 --> 00:43:57,570
Does that make sense?

804
00:44:01,240 --> 00:44:03,070
You'll still get
the same answer.

805
00:44:03,070 --> 00:44:07,750
You'll still get the same
answer for this matrix here.

806
00:44:11,940 --> 00:44:13,230
OK, now watch this.

807
00:44:19,560 --> 00:44:23,120
What happens if we take
these data, we rotate them,

808
00:44:23,120 --> 00:44:29,650
and then we compress
data all the way to zero?

809
00:44:29,650 --> 00:44:32,660
So by compressing
the data to a line,

810
00:44:32,660 --> 00:44:34,820
we're multiplying it by zero.

811
00:44:34,820 --> 00:44:40,440
We put a zero in this element of
the stretch matrix, all right?

812
00:44:40,440 --> 00:44:41,450
And what happens?

813
00:44:41,450 --> 00:44:46,120
The data get compressed
right to zero, OK?

814
00:44:46,120 --> 00:44:47,360
And then we can rotate back.

815
00:44:47,360 --> 00:44:49,460
So we've taken these data.

816
00:44:49,460 --> 00:44:53,150
We can write down a matrix
that takes those data

817
00:44:53,150 --> 00:45:00,310
and squishes them to zero
along some arbitrary direction.

818
00:45:00,310 --> 00:45:08,510
Now, can we take those data and
go back to the original data?

819
00:45:08,510 --> 00:45:10,220
Can we write down
a transformation

820
00:45:10,220 --> 00:45:13,310
that takes those and goes
back to the original data?

821
00:45:13,310 --> 00:45:15,119
Why not?

822
00:45:15,119 --> 00:45:16,877
AUDIENCE: Lambda
doesn't [INAUDIBLE]..

823
00:45:16,877 --> 00:45:17,960
MICHALE FEE: Say it again.

824
00:45:17,960 --> 00:45:19,340
AUDIENCE: Lambda
doesn't [INAUDIBLE]..

825
00:45:19,340 --> 00:45:20,090
MICHALE FEE: Good.

826
00:45:20,090 --> 00:45:22,412
What's another way
to think about that?

827
00:45:22,412 --> 00:45:24,180
AUDIENCE: We've
lost [INAUDIBLE]..

828
00:45:24,180 --> 00:45:26,310
MICHALE FEE: You've
lost that information.

829
00:45:26,310 --> 00:45:30,990
So in order to go back from
here to the original data,

830
00:45:30,990 --> 00:45:35,280
you have to have information
somewhere here that tells you

831
00:45:35,280 --> 00:45:40,240
how far out to stretch it
again when you try to go back.

832
00:45:40,240 --> 00:45:42,160
But in this case, we've
compressed everything

833
00:45:42,160 --> 00:45:46,260
to a line, and so
there's no information

834
00:45:46,260 --> 00:45:48,140
how to go back to
the original data.

835
00:45:51,610 --> 00:45:54,700
And how do you know
if you've done this?

836
00:45:54,700 --> 00:45:58,195
Well, you can take a look at
this matrix that you created.

837
00:46:00,720 --> 00:46:03,210
So let's say somebody
gave you this matrix.

838
00:46:03,210 --> 00:46:05,810
How would you tell
whether you could

839
00:46:05,810 --> 00:46:07,100
back to the original data?

840
00:46:09,660 --> 00:46:11,890
Any ideas?

841
00:46:11,890 --> 00:46:13,042
Abiba?

842
00:46:13,042 --> 00:46:14,260
AUDIENCE: [INAUDIBLE]

843
00:46:14,260 --> 00:46:15,010
MICHALE FEE: Good.

844
00:46:15,010 --> 00:46:16,177
You look at the determinant.

845
00:46:16,177 --> 00:46:19,480
So if you calculate the
determinant of this matrix,

846
00:46:19,480 --> 00:46:21,100
the determinant is zero.

847
00:46:21,100 --> 00:46:23,620
And as soon as you see
a zero determinant,

848
00:46:23,620 --> 00:46:27,100
you know right away
that you can't go back.

849
00:46:27,100 --> 00:46:28,840
After you've made
this transformation,

850
00:46:28,840 --> 00:46:32,320
you can't go back to
the original data.

851
00:46:32,320 --> 00:46:36,010
And we're going to get into a
little more detail about why

852
00:46:36,010 --> 00:46:39,040
that is and what that means.

853
00:46:39,040 --> 00:46:43,660
And the reason here is that the
determinant of lambda is zero.

854
00:46:43,660 --> 00:46:46,780
The determinant of
a product matrices

855
00:46:46,780 --> 00:46:49,210
like this is the product
of the determinants.

856
00:46:49,210 --> 00:46:51,940
And in this case, the
determinant of the lambda

857
00:46:51,940 --> 00:46:55,510
matrix is zero, and so the
determinant of the product

858
00:46:55,510 --> 00:46:57,910
is zero, OK?

859
00:46:57,910 --> 00:47:02,930
All right, so now let's
talk about basis sets.

860
00:47:02,930 --> 00:47:07,230
All right, so we can think of
vectors in abstract directions.

861
00:47:07,230 --> 00:47:11,190
So if I hold my arm
out here and tell you

862
00:47:11,190 --> 00:47:13,220
this is a vector--
there's the origin.

863
00:47:13,220 --> 00:47:15,390
The vectors pointing
in that direction.

864
00:47:15,390 --> 00:47:19,020
You don't need a
coordinate system

865
00:47:19,020 --> 00:47:21,540
to know which way I'm pointing.

866
00:47:21,540 --> 00:47:25,800
I don't need to tell
you my arm is pointing

867
00:47:25,800 --> 00:47:28,470
80 centimeters in
that direction and 40

868
00:47:28,470 --> 00:47:30,938
centimeters in that
direction and 10 centimeters

869
00:47:30,938 --> 00:47:31,980
in that direction, right?

870
00:47:31,980 --> 00:47:34,200
You don't need a
coordinate system

871
00:47:34,200 --> 00:47:38,930
to know which way
I'm pointing, right?

872
00:47:38,930 --> 00:47:44,870
But if I want to
quantify that vector so

873
00:47:44,870 --> 00:47:47,690
that-- if you want to quantify
that vector so that you can

874
00:47:47,690 --> 00:47:50,780
maybe tell somebody else
precisely which direction I'm

875
00:47:50,780 --> 00:47:55,890
pointing, you need to write
down those numbers, OK?

876
00:47:55,890 --> 00:48:00,280
So you can think of vectors
in abstract directions,

877
00:48:00,280 --> 00:48:05,040
but if you want to actually
quantify it or write it down,

878
00:48:05,040 --> 00:48:07,410
you need to choose
a coordinate system.

879
00:48:07,410 --> 00:48:10,170
And so to do this,
you choose a set

880
00:48:10,170 --> 00:48:13,890
of vectors, special
vectors, called a basis set.

881
00:48:13,890 --> 00:48:16,590
And now we just say,
here's a vector.

882
00:48:16,590 --> 00:48:21,510
How much is it pointing in
that direction, that direction,

883
00:48:21,510 --> 00:48:22,890
and that direction?

884
00:48:22,890 --> 00:48:24,870
And that's called a basis set.

885
00:48:24,870 --> 00:48:28,230
So we can write
down our vector now

886
00:48:28,230 --> 00:48:32,490
as a set of three numbers
that simply tell us

887
00:48:32,490 --> 00:48:35,520
how far that vector
is overlapped

888
00:48:35,520 --> 00:48:39,810
with three other vectors
that form the basis set.

889
00:48:39,810 --> 00:48:41,430
So the standard
way of doing this

890
00:48:41,430 --> 00:48:47,080
is to describe a vector as a
component in the x direction,

891
00:48:47,080 --> 00:48:51,400
which is a vector 1, 1, 0, sort
of in the standard notation;

892
00:48:51,400 --> 00:48:53,880
a component in the y
direction, which is 0,

893
00:48:53,880 --> 00:48:58,380
1, 0; and a component in
the z direction, 0, 0, 1.

894
00:48:58,380 --> 00:49:04,920
So we can write those vectors
as standard basis vectors.

895
00:49:04,920 --> 00:49:07,260
The numbers x, y,
and z here are called

896
00:49:07,260 --> 00:49:09,150
the coordinates of the vector.

897
00:49:09,150 --> 00:49:13,950
And the vectors e1, e2, and e3
are called the basis vectors.

898
00:49:13,950 --> 00:49:16,380
And this is how you
would write that down

899
00:49:16,380 --> 00:49:18,660
for a three-dimensional
vector, OK?

900
00:49:18,660 --> 00:49:20,640
Again, the little
hat here denotes

901
00:49:20,640 --> 00:49:25,600
that those are unit vectors
that have a length one.

902
00:49:25,600 --> 00:49:27,680
All right, so in order
to describe an arbitrary

903
00:49:27,680 --> 00:49:30,770
vector in a space
of n real numbers,

904
00:49:30,770 --> 00:49:36,620
Rn, the basis vectors each
need to have n numbers.

905
00:49:36,620 --> 00:49:39,410
And in order to describe an
arbitrary vector in that space,

906
00:49:39,410 --> 00:49:42,710
you need to have
n basis vectors.

907
00:49:42,710 --> 00:49:44,570
You need to have--

908
00:49:44,570 --> 00:49:47,630
in n dimensions, you need
to have n basis vectors,

909
00:49:47,630 --> 00:49:52,830
and each one knows basis vectors
has to have n numbers in them.

910
00:49:52,830 --> 00:49:55,130
So these vectors here--

911
00:49:55,130 --> 00:49:59,435
1, 0, 0; 0, 1, 0; and 0, 0, 1--
are called the standard basis.

912
00:50:03,120 --> 00:50:06,120
And each one of these values
has one element that's one

913
00:50:06,120 --> 00:50:07,200
and the rest are zero.

914
00:50:07,200 --> 00:50:08,340
That's the standard basis.

915
00:50:12,720 --> 00:50:16,450
The standard basis
has the property

916
00:50:16,450 --> 00:50:20,470
that any one of those vectors
dotted into itself is one.

917
00:50:20,470 --> 00:50:22,060
That's because
they're unit vectors.

918
00:50:22,060 --> 00:50:23,440
They have length one.

919
00:50:23,440 --> 00:50:28,960
So i dot ei is the length
squared of the i-th vector.

920
00:50:28,960 --> 00:50:32,340
And if the length is one, then
the length squared is one.

921
00:50:32,340 --> 00:50:36,210
Each vector is orthogonal
to all the other vectors.

922
00:50:36,210 --> 00:50:41,440
That means that each e1 dot e2
is zero, and e1 dot e3 is zero,

923
00:50:41,440 --> 00:50:43,770
and e2 dot e3 is zero.

924
00:50:43,770 --> 00:50:49,350
You can write down as e sub i
dot e sub j equals zero for i

925
00:50:49,350 --> 00:50:52,580
not equal to j.

926
00:50:52,580 --> 00:50:54,470
You can write all
of those properties

927
00:50:54,470 --> 00:50:56,420
down in one equation--

928
00:50:56,420 --> 00:51:00,920
e sub i dot e sub
j equals delta i j.

929
00:51:00,920 --> 00:51:05,950
Delta i j is what's called
the Kronecker delta function.

930
00:51:05,950 --> 00:51:09,800
The Kronecker delta function is
a one if i equals j and a zero

931
00:51:09,800 --> 00:51:13,430
if i is not equal to j, OK?

932
00:51:13,430 --> 00:51:16,730
So it's a very compact way
of writing down this property

933
00:51:16,730 --> 00:51:19,070
that each vector
is a unit vector

934
00:51:19,070 --> 00:51:23,370
and each vector is orthogonal
to all the other vectors.

935
00:51:23,370 --> 00:51:28,140
And the set with that property
is called an off an orthonormal

936
00:51:28,140 --> 00:51:28,790
basis set.

937
00:51:31,700 --> 00:51:37,600
All right, now, the standard
basis is not the only basis--

938
00:51:37,600 --> 00:51:39,110
sorry.

939
00:51:39,110 --> 00:51:41,850
I'm trying to do
x, y, and z here.

940
00:51:41,850 --> 00:51:45,510
So if you have x,
y, and z, that's

941
00:51:45,510 --> 00:51:48,690
not the only
orthonormal basis set.

942
00:51:48,690 --> 00:51:54,480
Any basis set that is a
rotation of those three vectors

943
00:51:54,480 --> 00:51:57,880
is also an orthonormal basis.

944
00:51:57,880 --> 00:52:02,720
Let's write down two other
orthogonal unit vectors.

945
00:52:02,720 --> 00:52:06,500
We can write down our
vector v in this other basis

946
00:52:06,500 --> 00:52:08,760
set as follows.

947
00:52:08,760 --> 00:52:13,910
We just take our vector v.
We can plot the basis vectors

948
00:52:13,910 --> 00:52:15,650
in this other basis.

949
00:52:15,650 --> 00:52:20,370
And we can simply project v
onto those other basis vectors.

950
00:52:20,370 --> 00:52:26,540
So we can project v onto f1,
and we can project v onto f2.

951
00:52:26,540 --> 00:52:32,030
So we can write v as a sum of
a vector in the direction of f1

952
00:52:32,030 --> 00:52:33,890
and a vector in the
direction of f2.

953
00:52:36,420 --> 00:52:42,720
You can write down this vector
v in this different basis set

954
00:52:42,720 --> 00:52:45,580
as a vector with two components.

955
00:52:45,580 --> 00:52:48,270
This is two dimensional.

956
00:52:48,270 --> 00:52:50,180
This is R2.

957
00:52:50,180 --> 00:52:53,010
You can write it down as
a two-component vector--

958
00:52:53,010 --> 00:52:56,460
v dot f1 and v dot f2.

959
00:52:56,460 --> 00:52:59,460
So that's a simple intuition
for what [AUDIO OUT]

960
00:52:59,460 --> 00:53:01,050
in two dimensions.

961
00:53:01,050 --> 00:53:05,370
We're going to develop the
formalism for doing this

962
00:53:05,370 --> 00:53:07,100
in arbitrary dimensions, OK?

963
00:53:07,100 --> 00:53:09,620
And it's very simple.

964
00:53:09,620 --> 00:53:14,100
All right, these
components here are

965
00:53:14,100 --> 00:53:20,240
called the vector coordinates
of this vector basis f.

966
00:53:20,240 --> 00:53:26,360
All right, now, basis
sets, or basis vectors,

967
00:53:26,360 --> 00:53:29,000
don't have to be
orthogonal to each other,

968
00:53:29,000 --> 00:53:31,750
and they don't
have to be normal.

969
00:53:31,750 --> 00:53:33,980
They don't have
to be unit vector.

970
00:53:33,980 --> 00:53:37,220
You can write down
an arbitrary vector

971
00:53:37,220 --> 00:53:41,570
as a sum of
components that aren't

972
00:53:41,570 --> 00:53:43,350
orthogonal to each other.

973
00:53:43,350 --> 00:53:45,080
So you can write
down this vector v

974
00:53:45,080 --> 00:53:50,510
as a sum of a component
here in the f1 direction

975
00:53:50,510 --> 00:53:53,100
and a component in
the f2 direction,

976
00:53:53,100 --> 00:53:56,330
even if f1 and f2 are not
orthogonal to each other

977
00:53:56,330 --> 00:53:59,100
and even if they're
not unit vectors.

978
00:53:59,100 --> 00:54:02,930
So, again, v is expressed
as a linear combination

979
00:54:02,930 --> 00:54:05,360
of a vector in the f1
direction and a vector

980
00:54:05,360 --> 00:54:07,760
in the f2 direction.

981
00:54:07,760 --> 00:54:12,400
OK, so let's take a
vector and decompose it

982
00:54:12,400 --> 00:54:15,420
into an arbitrary
basis set f1 and f2.

983
00:54:18,120 --> 00:54:22,510
So v equals c1 f1 plus c2 f2.

984
00:54:22,510 --> 00:54:24,560
The coefficients here are
called the coordinates

985
00:54:24,560 --> 00:54:27,020
of the vector in this basis.

986
00:54:27,020 --> 00:54:30,710
And the vector v sub f--

987
00:54:30,710 --> 00:54:39,620
these numbers, c1 and c2, when
combined into this vector,

988
00:54:39,620 --> 00:54:44,840
is called the coordinate
vector of v in the basis f1

989
00:54:44,840 --> 00:54:46,880
and f2, OK?

990
00:54:46,880 --> 00:54:47,870
Does that makes sense?

991
00:54:47,870 --> 00:54:49,440
Just some terminology.

992
00:54:52,630 --> 00:54:56,440
OK, so let's define
this basis, f1 and f2.

993
00:54:56,440 --> 00:55:00,550
We just pick two vectors,
an arbitrary two vectors.

994
00:55:00,550 --> 00:55:05,740
And I'll explain later that not
all choices of vectors work,

995
00:55:05,740 --> 00:55:08,030
but most of them do.

996
00:55:08,030 --> 00:55:11,080
So here are two vectors that
we can choose as a basis--

997
00:55:11,080 --> 00:55:17,105
so 1, 3, which is sort of
like this, and minus 2, 1

998
00:55:17,105 --> 00:55:17,980
is kind of like that.

999
00:55:22,442 --> 00:55:24,150
And we're going to
write down this vector

1000
00:55:24,150 --> 00:55:26,070
v in this new basis.

1001
00:55:26,070 --> 00:55:30,250
So we have a vector v that's
3, 5 in the standard basis,

1002
00:55:30,250 --> 00:55:35,250
and we're going to rewrite it
in this new basis, all right?

1003
00:55:35,250 --> 00:55:37,680
So we're going to find the
vector coordinates of v

1004
00:55:37,680 --> 00:55:39,150
in the new basis.

1005
00:55:39,150 --> 00:55:40,840
So we're going to
do this as follows.

1006
00:55:40,840 --> 00:55:43,650
We're going to write v as a
linear combination of these two

1007
00:55:43,650 --> 00:55:45,240
basis vectors.

1008
00:55:45,240 --> 00:55:49,580
So c1 times f1--

1009
00:55:49,580 --> 00:55:53,060
1, 3-- plus c2 times f2--

1010
00:55:53,060 --> 00:55:56,240
minus 2, 1-- is equal to 3, 5.

1011
00:55:56,240 --> 00:55:57,680
That make sense?

1012
00:55:57,680 --> 00:55:58,370
So what is that?

1013
00:55:58,370 --> 00:56:04,010
That is just a system
of equations, right?

1014
00:56:04,010 --> 00:56:08,570
And what we're trying to
do is solve for c1 and c2.

1015
00:56:08,570 --> 00:56:09,470
That's it.

1016
00:56:09,470 --> 00:56:13,010
So we already did this
problem in the last lecture.

1017
00:56:16,420 --> 00:56:18,440
So we have this
system of equations.

1018
00:56:18,440 --> 00:56:23,060
We can write this down in the
following matrix notation.

1019
00:56:23,060 --> 00:56:29,280
F times vf-- vf is
just c1 and c2--

1020
00:56:29,280 --> 00:56:31,305
equals v. So there's F--

1021
00:56:31,305 --> 00:56:32,970
1, 3; minus 2, 1.

1022
00:56:32,970 --> 00:56:36,030
Those are our two basis vectors.

1023
00:56:36,030 --> 00:56:41,130
Times c1 c2-- the vector
c1, c2-- is equal to 3, 5.

1024
00:56:41,130 --> 00:56:43,620
And we solve for vf.

1025
00:56:43,620 --> 00:56:46,080
In other words, we
solve for c1 and c2

1026
00:56:46,080 --> 00:56:56,540
simply by multiplying v by
the inverse of this matrix F.

1027
00:56:56,540 --> 00:57:02,820
So the coordinate vector
in this new base is said

1028
00:57:02,820 --> 00:57:06,810
is just the old vector
times f inverse.

1029
00:57:06,810 --> 00:57:08,430
And what is f inverse?

1030
00:57:08,430 --> 00:57:16,750
F inverse is just a matrix
that has the basis vectors

1031
00:57:16,750 --> 00:57:18,175
as the columns of the matrix.

1032
00:57:24,330 --> 00:57:29,730
So the coordinates of this
vector in his new basis set

1033
00:57:29,730 --> 00:57:35,260
are given by f inverse times v.
We can find the inverse of f.

1034
00:57:35,260 --> 00:57:40,690
So if that's our f, we can
calculate the inverse of that.

1035
00:57:40,690 --> 00:57:44,050
Remember, you flip
the diagonal elements.

1036
00:57:44,050 --> 00:57:47,020
You multiply the
off-diagonals by minus 1,

1037
00:57:47,020 --> 00:57:49,930
and you divide by
the determinant.

1038
00:57:49,930 --> 00:58:01,000
So f inverse is this times v is
that, and v sub f is just 13/7

1039
00:58:01,000 --> 00:58:04,380
over minus 4/7.

1040
00:58:04,380 --> 00:58:08,550
So that's just a different
way of writing v.

1041
00:58:08,550 --> 00:58:10,710
So there's v in
the standard basis.

1042
00:58:10,710 --> 00:58:15,210
There's v in this
new basis, all right?

1043
00:58:15,210 --> 00:58:21,890
And all you do to go
from the standard basis

1044
00:58:21,890 --> 00:58:25,520
to any arbitrary new basis
is multiply the vector

1045
00:58:25,520 --> 00:58:26,270
by f inverse.

1046
00:58:33,800 --> 00:58:38,550
And when you're actually
doing this in Matlab,

1047
00:58:38,550 --> 00:58:39,940
this is really simple.

1048
00:58:39,940 --> 00:58:43,800
You just write down
a matrix F that has

1049
00:58:43,800 --> 00:58:46,530
the basis sets in the columns.

1050
00:58:46,530 --> 00:58:49,620
You just use the matrix
inverse function,

1051
00:58:49,620 --> 00:58:52,710
and then you multiply
that by the data matrix,

1052
00:58:52,710 --> 00:58:54,300
by the data vector.

1053
00:58:54,300 --> 00:58:58,490
All right, so I'm just
going to summarize again.

1054
00:58:58,490 --> 00:59:02,060
In order to find the coordinate
vector for v in this new basis,

1055
00:59:02,060 --> 00:59:05,780
you construct a matrix
F, whose columns

1056
00:59:05,780 --> 00:59:09,000
are just the elements
of the basis vectors.

1057
00:59:09,000 --> 00:59:11,720
So if you have
two basis vectors,

1058
00:59:11,720 --> 00:59:14,600
it's a two-- remember, each
of those basis vectors.

1059
00:59:14,600 --> 00:59:16,850
In two dimensions, there
are two basis vectors.

1060
00:59:16,850 --> 00:59:20,180
Each has two numbers, so
this is a 2 by 2 matrix.

1061
00:59:20,180 --> 00:59:24,200
In n dimensions, you
have n basis vectors.

1062
00:59:24,200 --> 00:59:26,640
Each of the basis
vectors has n numbers.

1063
00:59:26,640 --> 00:59:31,440
And so this matrix F is an
n by n matrix, all right?

1064
00:59:31,440 --> 00:59:38,730
You know that you can write down
v as this basis times v sub f.

1065
00:59:38,730 --> 00:59:41,310
You solve for v sub f by
multiplying both sides

1066
00:59:41,310 --> 00:59:42,840
by f inverse, all right?

1067
00:59:42,840 --> 00:59:45,720
That performs whats
called change of basis.

1068
00:59:50,100 --> 00:59:54,670
Now, that only works
if f has an inverse.

1069
00:59:54,670 --> 00:59:59,550
So if you're going to choose
a new basis to write down

1070
00:59:59,550 --> 01:00:02,250
your vector, you have to
be careful to pick one

1071
01:00:02,250 --> 01:00:04,320
that has an inverse, all right?

1072
01:00:04,320 --> 01:00:05,820
And I want to show
you what it looks

1073
01:00:05,820 --> 01:00:08,640
like when you pick a basis
that doesn't have an inverse

1074
01:00:08,640 --> 01:00:10,110
and what that means.

1075
01:00:10,110 --> 01:00:14,620
All right, and that gets to the
idea of linear independence.

1076
01:00:14,620 --> 01:00:20,140
All right, so, remember I said
that if in n dimensions, in Rn,

1077
01:00:20,140 --> 01:00:25,390
in order to have a basis in Rn,
you have certain requirements?

1078
01:00:25,390 --> 01:00:26,990
Not any vectors will work.

1079
01:00:26,990 --> 01:00:29,920
So let's take a look
at these vectors.

1080
01:00:29,920 --> 01:00:32,800
Will those work to describe an--

1081
01:00:32,800 --> 01:00:35,890
will that basis set work
to describe an arbitrary

1082
01:00:35,890 --> 01:00:37,435
vector in three dimensions?

1083
01:00:37,435 --> 01:00:38,050
No?

1084
01:00:38,050 --> 01:00:39,913
Why not?

1085
01:00:39,913 --> 01:00:45,068
AUDIENCE: [INAUDIBLE] vectors,
so if you're [INAUDIBLE]..

1086
01:00:45,068 --> 01:00:45,860
MICHALE FEE: Right.

1087
01:00:45,860 --> 01:00:48,950
So the problem is in which
coordinate, which axis?

1088
01:00:48,950 --> 01:00:49,700
AUDIENCE: Z-axis.

1089
01:00:49,700 --> 01:00:50,700
MICHALE FEE: The z-axis.

1090
01:00:50,700 --> 01:00:54,020
You can see that you have zeros
in all three of those vectors,

1091
01:00:54,020 --> 01:00:56,690
OK?

1092
01:00:56,690 --> 01:00:59,720
You can't describe any
vector with this basis

1093
01:00:59,720 --> 01:01:03,169
that has a non-zero
component in the z direction.

1094
01:01:08,710 --> 01:01:11,700
And the reason is that
any linear combination

1095
01:01:11,700 --> 01:01:16,700
of these three vectors will
always lie in the xy plane.

1096
01:01:16,700 --> 01:01:19,310
So you can't describe
any vector here

1097
01:01:19,310 --> 01:01:25,720
that has a non-zero z
component, all right?

1098
01:01:25,720 --> 01:01:28,330
So what we say is that
this set of vectors

1099
01:01:28,330 --> 01:01:31,910
doesn't span all of R3.

1100
01:01:31,910 --> 01:01:36,830
It only spans the
xy plane, which

1101
01:01:36,830 --> 01:01:40,225
is what we call a
subspace of R3, OK?

1102
01:01:44,990 --> 01:01:47,210
OK, so let's take a look
at these three vectors.

1103
01:01:47,210 --> 01:01:48,770
The other thing to
notice is that you

1104
01:01:48,770 --> 01:01:52,250
can write any one
of these vectors

1105
01:01:52,250 --> 01:01:56,240
as a linear combination
of the other two.

1106
01:01:56,240 --> 01:02:01,750
So you can write f3
as a sum of f1 and f2.

1107
01:02:01,750 --> 01:02:03,850
The sum of those two vectors
is equal to that one.

1108
01:02:03,850 --> 01:02:06,850
You can write f2 as f3 minus f1.

1109
01:02:06,850 --> 01:02:09,940
So any of these vectors can be
written as a linear combination

1110
01:02:09,940 --> 01:02:11,330
of the others.

1111
01:02:11,330 --> 01:02:15,310
And so that set of vectors
is called linearly dependent.

1112
01:02:19,180 --> 01:02:23,630
And any set of linearly
dependent vectors cannot form

1113
01:02:23,630 --> 01:02:24,130
a basis.

1114
01:02:26,880 --> 01:02:28,980
And how do you know
if a set of vectors

1115
01:02:28,980 --> 01:02:33,480
that you choose for your
basis is linearly dependent?

1116
01:02:33,480 --> 01:02:38,560
Well, again, you just find the
determinant of that matrix.

1117
01:02:38,560 --> 01:02:44,030
And if it's zero, those
vectors are linearly dependent.

1118
01:02:44,030 --> 01:02:48,670
So what that corresponds to
is you're taking your data

1119
01:02:48,670 --> 01:02:54,890
and when you transform
it into a new basis,

1120
01:02:54,890 --> 01:02:58,220
if the determinant
of that matrix F

1121
01:02:58,220 --> 01:03:01,580
is zero, then what you're doing
is you're taking those data

1122
01:03:01,580 --> 01:03:05,763
and transforming them to a space
where they're being collapsed.

1123
01:03:05,763 --> 01:03:07,430
Let's say if you're
in three dimensions,

1124
01:03:07,430 --> 01:03:12,350
those data are being collapsed
onto a plane or onto a line,

1125
01:03:12,350 --> 01:03:14,390
OK?

1126
01:03:14,390 --> 01:03:18,510
And that means you can't undo
that transformation, all right?

1127
01:03:18,510 --> 01:03:20,730
And the way to tell whether
you've got that problem

1128
01:03:20,730 --> 01:03:23,862
is looking at the determinant.

1129
01:03:23,862 --> 01:03:25,820
All right, let me show
you one other cool thing

1130
01:03:25,820 --> 01:03:27,920
about the determinant.

1131
01:03:27,920 --> 01:03:30,500
There's a very simple
geometrical interpretation

1132
01:03:30,500 --> 01:03:33,320
of what the determinant is, OK?

1133
01:03:33,320 --> 01:03:34,700
All right, sorry.

1134
01:03:34,700 --> 01:03:37,580
So if f maps your
data onto a subspace,

1135
01:03:37,580 --> 01:03:39,290
then the mapping
is not reversible.

1136
01:03:39,290 --> 01:03:43,910
OK, so what does the
determinant correspond to?

1137
01:03:43,910 --> 01:03:48,770
Let's say in two dimensions,
if I have two orthogonal unit

1138
01:03:48,770 --> 01:03:52,670
vectors, you can
think of those vectors

1139
01:03:52,670 --> 01:03:58,460
as kind of forming a
square in this space.

1140
01:03:58,460 --> 01:04:01,470
Or in three dimensions, if I
have three orthogonal vectors,

1141
01:04:01,470 --> 01:04:05,810
you can think of those vectors
as defining a cube, OK?

1142
01:04:05,810 --> 01:04:07,700
And if there unit
vectors, then they

1143
01:04:07,700 --> 01:04:10,990
define a cube of volume one.

1144
01:04:10,990 --> 01:04:16,010
Here, you have the
square of area one.

1145
01:04:16,010 --> 01:04:21,454
So let's think about
this unit volume.

1146
01:04:21,454 --> 01:04:26,120
If I transform those two
vectors or those three vectors

1147
01:04:26,120 --> 01:04:30,710
in 3D space by a
matrix A, those vectors

1148
01:04:30,710 --> 01:04:34,730
get rotated and transformed.

1149
01:04:34,730 --> 01:04:38,450
They point in different
directions, and they define--

1150
01:04:38,450 --> 01:04:42,150
it's no longer a cube, but they
define some sort of rhombus,

1151
01:04:42,150 --> 01:04:43,720
OK?

1152
01:04:43,720 --> 01:04:48,580
You can ask, what is the
volume of that rhombus?

1153
01:04:48,580 --> 01:04:53,560
The volume of that rhombus
is just the determinant

1154
01:04:53,560 --> 01:04:58,550
of that matrix A.
So now what happens

1155
01:04:58,550 --> 01:05:03,180
if I have a cube in
three-dimensional space

1156
01:05:03,180 --> 01:05:06,210
and I multiply it by a
matrix that transforms it

1157
01:05:06,210 --> 01:05:10,230
into a rhombus that
has zero volume?

1158
01:05:10,230 --> 01:05:12,240
So let's say I have
those three vectors.

1159
01:05:12,240 --> 01:05:16,440
It transforms it into,
let's say, a square.

1160
01:05:16,440 --> 01:05:20,200
The volume of that square
in three dimensional space

1161
01:05:20,200 --> 01:05:22,830
is zero.

1162
01:05:22,830 --> 01:05:25,800
So what that means is I'm
transforming my vectors

1163
01:05:25,800 --> 01:05:28,770
into a space that
has zero volume

1164
01:05:28,770 --> 01:05:30,640
in the original dimensions, OK?

1165
01:05:30,640 --> 01:05:35,880
So I'm transforming things
from 3D into a 2D plane.

1166
01:05:35,880 --> 01:05:39,210
And what that means is
I've lost information,

1167
01:05:39,210 --> 01:05:40,260
and I can't go back.

1168
01:05:44,430 --> 01:05:49,840
OK, notice that a rotation
matrix, if I take this cube

1169
01:05:49,840 --> 01:05:53,620
and I rotate it, has
exactly the same volume

1170
01:05:53,620 --> 01:05:55,940
as it did before I rotated it.

1171
01:05:55,940 --> 01:06:00,400
And so you can always tell when
you have a rotation matrix,

1172
01:06:00,400 --> 01:06:04,640
because the determinant of
a rotation matrix is one.

1173
01:06:04,640 --> 01:06:11,050
So if you take a matrix A
and you find the determinant

1174
01:06:11,050 --> 01:06:12,910
and you find that the
determinant is one,

1175
01:06:12,910 --> 01:06:18,190
you know that you have
a pure rotation matrix.

1176
01:06:18,190 --> 01:06:20,736
What does it mean if the
determinant is minus one?

1177
01:06:24,310 --> 01:06:26,800
What it means is
you have a rotation,

1178
01:06:26,800 --> 01:06:32,620
but that one of the axes
is inverted, is flipped.

1179
01:06:32,620 --> 01:06:33,730
There's a mirror in there.

1180
01:06:36,660 --> 01:06:39,470
So you can tell if you
have a pure rotation

1181
01:06:39,470 --> 01:06:43,610
or if you have a rotation and
one of the axes is flipped.

1182
01:06:43,610 --> 01:06:46,490
Because in the pure rotation,
the determinant is one.

1183
01:06:46,490 --> 01:06:53,360
And in an impure rotation, you
have a rotation and a mirror

1184
01:06:53,360 --> 01:06:53,860
flip.

1185
01:06:56,890 --> 01:07:02,750
All right, and I just want to
make a couple more comments

1186
01:07:02,750 --> 01:07:05,990
about change of basis, OK?

1187
01:07:05,990 --> 01:07:10,580
All right, so let's choose
a set of basis vectors

1188
01:07:10,580 --> 01:07:13,370
for our new basis.

1189
01:07:13,370 --> 01:07:17,470
Let's write those
into a matrix F.

1190
01:07:17,470 --> 01:07:22,140
It's going to be our
matrix of basis vectors.

1191
01:07:22,140 --> 01:07:24,990
If the determinant
is not equal to zero,

1192
01:07:24,990 --> 01:07:27,300
then these vectors,
that set of vectors,

1193
01:07:27,300 --> 01:07:29,790
are linearly independent.

1194
01:07:29,790 --> 01:07:34,050
That means you cannot write one
of those vectors as a linear

1195
01:07:34,050 --> 01:07:35,280
combination of--

1196
01:07:35,280 --> 01:07:37,800
any one of those vectors
as a linear combination

1197
01:07:37,800 --> 01:07:39,800
of the others.

1198
01:07:39,800 --> 01:07:45,230
Those vectors form a complete
basis in that n dimensional

1199
01:07:45,230 --> 01:07:47,820
space.

1200
01:07:47,820 --> 01:07:50,960
The matrix F implements
a change of basis,

1201
01:07:50,960 --> 01:07:54,110
and you can go from
the standard basis to F

1202
01:07:54,110 --> 01:07:56,600
by multiplying your
vector by F inverse

1203
01:07:56,600 --> 01:07:59,570
to get the coordinate
vector and your new basis.

1204
01:07:59,570 --> 01:08:05,100
And you can go back from that
rotated or transformed basis

1205
01:08:05,100 --> 01:08:10,440
back to the coordinate basis
by multiplying by F, OK?

1206
01:08:10,440 --> 01:08:14,250
Multiply by F inverse
transforms to the new basis.

1207
01:08:14,250 --> 01:08:16,200
Multiplying by F
transforms back.

1208
01:08:19,319 --> 01:08:26,260
If that set of vectors is
an orthonormal basis, then--

1209
01:08:26,260 --> 01:08:31,200
OK, so let's take this
matrix F that has columns

1210
01:08:31,200 --> 01:08:32,729
that are the new basis vectors.

1211
01:08:32,729 --> 01:08:38,630
And let's say that those
form an orthonormal basis.

1212
01:08:38,630 --> 01:08:42,020
In that case, we can write
down-- so, in any case,

1213
01:08:42,020 --> 01:08:46,100
we can write down the transpose
of this matrix, F transpose.

1214
01:08:46,100 --> 01:08:51,210
And now the rows of that
matrix are the basis vectors.

1215
01:08:51,210 --> 01:08:55,569
Notice that if we multiply
F transpose times F,

1216
01:08:55,569 --> 01:08:59,990
we have basis vectors in
rows here and columns here.

1217
01:08:59,990 --> 01:09:03,060
So what is F transpose
F for the case

1218
01:09:03,060 --> 01:09:05,399
where these are
unit vectors that

1219
01:09:05,399 --> 01:09:07,180
are orthogonal to each other?

1220
01:09:07,180 --> 01:09:08,385
What is that product?

1221
01:09:08,385 --> 01:09:09,260
AUDIENCE: [INAUDIBLE]

1222
01:09:09,260 --> 01:09:09,479
MICHALE FEE: It's what?

1223
01:09:09,479 --> 01:09:10,060
AUDIENCE: [INAUDIBLE]

1224
01:09:10,060 --> 01:09:10,810
MICHALE FEE: Good.

1225
01:09:10,810 --> 01:09:14,738
Because F1 dot F1 is one.

1226
01:09:14,738 --> 01:09:17,840
F1 dot F2 is zero.

1227
01:09:17,840 --> 01:09:21,330
F2 dot F1 is zero,
and F2 dot F2 is 0.

1228
01:09:21,330 --> 01:09:24,140
So that's equal to the
identity matrix, right?

1229
01:09:26,880 --> 01:09:30,300
So F transpose equals F inverse.

1230
01:09:30,300 --> 01:09:33,899
If the inverse of a matrix
is just its transpose,

1231
01:09:33,899 --> 01:09:35,924
then that matrix is
a rotation matrix.

1232
01:09:38,810 --> 01:09:41,100
So F is just the
rotation matrix.

1233
01:09:41,100 --> 01:09:43,109
All right, now let's
see what happens.

1234
01:09:43,109 --> 01:09:48,810
So that means the inverse of
F is just this F transpose.

1235
01:09:48,810 --> 01:09:51,359
Let's do this coordinate--
let's [AUDIO OUT]

1236
01:09:51,359 --> 01:09:54,310
change of basis for this case.

1237
01:09:54,310 --> 01:09:58,680
So you can see that v sub f,
the coordinate vector in the new

1238
01:09:58,680 --> 01:10:04,770
basis, is F transpose
v. Here's F transpose--

1239
01:10:04,770 --> 01:10:07,020
the basis vectors
are in the rows--

1240
01:10:07,020 --> 01:10:14,525
times v. This is just v
dot F1, v dot F2, right?

1241
01:10:14,525 --> 01:10:20,830
So this shows how for
a orthonormal basis,

1242
01:10:20,830 --> 01:10:25,270
the transpose, which
is the inverse of F--

1243
01:10:25,270 --> 01:10:27,190
taking the transpose
of F times v

1244
01:10:27,190 --> 01:10:29,290
is just taking the
dot product of v

1245
01:10:29,290 --> 01:10:32,320
with each of the
basis vectors, OK?

1246
01:10:32,320 --> 01:10:36,880
So that ties it back to what we
were showing before about how

1247
01:10:36,880 --> 01:10:39,220
to do this change of basis, OK?

1248
01:10:39,220 --> 01:10:42,400
Just tying up those two
ways of thinking about it.

1249
01:10:45,190 --> 01:10:53,490
So, again, what
we've been developing

1250
01:10:53,490 --> 01:10:56,720
when we talk about
change of basis

1251
01:10:56,720 --> 01:11:02,500
are ways of rotating
vectors, rotating sets

1252
01:11:02,500 --> 01:11:04,480
of data, into
different dimensions,

1253
01:11:04,480 --> 01:11:07,780
into different basis
sets so that we

1254
01:11:07,780 --> 01:11:11,510
can look at data from
different directions.

1255
01:11:11,510 --> 01:11:14,210
That's all we're doing.

1256
01:11:14,210 --> 01:11:16,370
And you can see
that when you look

1257
01:11:16,370 --> 01:11:20,300
at data from different
directions, you can get--

1258
01:11:20,300 --> 01:11:23,720
some views of data, you have
a lot of things overlapping,

1259
01:11:23,720 --> 01:11:24,680
and you can't see them.

1260
01:11:24,680 --> 01:11:28,010
But when you rotate those
data, now, all of a sudden,

1261
01:11:28,010 --> 01:11:31,820
you can see things
clearly that used to be--

1262
01:11:31,820 --> 01:11:36,590
things get separated in some
views, whereas in other views,

1263
01:11:36,590 --> 01:11:39,980
things are kind of mixed up
and covering each other, OK?

1264
01:11:39,980 --> 01:11:44,270
And that's exactly what neural
networks are doing when they're

1265
01:11:44,270 --> 01:11:48,260
analyzing sensory stimuli.

1266
01:11:48,260 --> 01:11:50,150
They're doing that
kind of rotations

1267
01:11:50,150 --> 01:11:54,440
and untangling the
data to see what's

1268
01:11:54,440 --> 01:11:58,400
there in that
high-dimensional data, OK?

1269
01:11:58,400 --> 01:12:00,670
All right, that's it.