1
00:00:09,500 --> 00:00:12,130
In this lecture, we
introduce linear regression

2
00:00:12,130 --> 00:00:16,470
a simple but very powerful
method to analyze data

3
00:00:16,470 --> 00:00:18,840
and make predictions
and apply it

4
00:00:18,840 --> 00:00:22,360
in a very unexpected
context-- predicting

5
00:00:22,360 --> 00:00:23,670
the quality of wines.

6
00:00:29,250 --> 00:00:34,170
Bordeaux is a region in France
popular for producing wine.

7
00:00:34,170 --> 00:00:35,970
While this wine
has been produced

8
00:00:35,970 --> 00:00:38,670
in much the same way
for hundreds of years,

9
00:00:38,670 --> 00:00:40,650
there are differences
in price and quality

10
00:00:40,650 --> 00:00:44,400
from year to year that are
sometimes very significant.

11
00:00:44,400 --> 00:00:46,250
Bordeaux wines are
widely believed

12
00:00:46,250 --> 00:00:48,640
to taste better
when they are order,

13
00:00:48,640 --> 00:00:51,250
so there's an incentive
to store young wines

14
00:00:51,250 --> 00:00:53,300
until they are mature.

15
00:00:53,300 --> 00:00:56,280
The main problem is that
it is hard to determine

16
00:00:56,280 --> 00:01:00,200
the quality of the wine when it
is so young just by tasting it,

17
00:01:00,200 --> 00:01:03,990
since the taste will change so
significantly by the time it

18
00:01:03,990 --> 00:01:06,360
will actually be consumed.

19
00:01:06,360 --> 00:01:10,010
This is why wine tasters
and experts are helpful.

20
00:01:10,010 --> 00:01:13,230
They taste the wines
and then predict

21
00:01:13,230 --> 00:01:16,690
which ones will be
the best wines later.

22
00:01:16,690 --> 00:01:19,340
The question we'll
address in this lecture--

23
00:01:19,340 --> 00:01:24,260
can analytics model
this process better

24
00:01:24,260 --> 00:01:25,510
and make stronger predictions?

25
00:01:28,840 --> 00:01:31,990
On March 4, 1990,
the New York Times

26
00:01:31,990 --> 00:01:35,479
announced that Princeton
economics professor Orley

27
00:01:35,479 --> 00:01:38,759
Ashenfelter can predict the
quality of Bordeaux wine

28
00:01:38,759 --> 00:01:41,370
without tasting a single drop.

29
00:01:41,370 --> 00:01:43,300
Ashenfelter's
predictions have nothing

30
00:01:43,300 --> 00:01:46,450
to do with assessing
the aroma of the wine,

31
00:01:46,450 --> 00:01:50,870
looking at the legs, or
declaring that the wine tastes

32
00:01:50,870 --> 00:01:53,740
citrusy, oaky, or nutty.

33
00:01:53,740 --> 00:01:55,890
They are the results of
a mathematical model.

34
00:01:58,430 --> 00:02:01,850
Ashenfelter used a method
called linear regression.

35
00:02:01,850 --> 00:02:06,970
The methods predicts an outcome
variable or dependent variable.

36
00:02:06,970 --> 00:02:10,060
And in doing so, it
uses a set of what

37
00:02:10,060 --> 00:02:11,420
is called independent variables.

38
00:02:15,260 --> 00:02:16,890
For the dependent
variable, Ashenfelter

39
00:02:16,890 --> 00:02:25,240
chose a typical price in
1990-1991 for Bordeaux wine

40
00:02:25,240 --> 00:02:26,340
in an auction.

41
00:02:26,340 --> 00:02:28,640
This approximates quality.

42
00:02:28,640 --> 00:02:31,480
As independent
variables, he used

43
00:02:31,480 --> 00:02:35,620
age of the wine-- so the older
wines are more expensive--

44
00:02:35,620 --> 00:02:39,120
and weather-related
information, specifically

45
00:02:39,120 --> 00:02:42,200
the average growing season
temperature, the harvest

46
00:02:42,200 --> 00:02:43,700
rain, and winter rain.

47
00:02:46,570 --> 00:02:49,820
In these figures,
we depict the data

48
00:02:49,820 --> 00:02:53,340
during the period
from 1952 to 1978.

49
00:02:53,340 --> 00:02:56,490
There are four
independent variables--

50
00:02:56,490 --> 00:03:02,080
the age of the wine, the average
growing season temperature,

51
00:03:02,080 --> 00:03:04,450
the harvest rain,
and winter rain.

52
00:03:07,650 --> 00:03:12,810
And on the vertical axis,
you observe the logarithm

53
00:03:12,810 --> 00:03:18,130
of the price, the
realization in an auction.

54
00:03:18,130 --> 00:03:21,050
So these are the primitive
data that Ashenfelter used.

55
00:03:24,360 --> 00:03:28,640
So Ashenfelter believed
that his predictions

56
00:03:28,640 --> 00:03:31,140
are more accurate than
those of the world's

57
00:03:31,140 --> 00:03:33,850
most influential wine critic.

58
00:03:33,850 --> 00:03:37,560
His name is Robert Parker.

59
00:03:37,560 --> 00:03:41,480
In response, Parker
called Ashenfelter

60
00:03:41,480 --> 00:03:45,910
to be "an absolute total
sham," and he adds that,

61
00:03:45,910 --> 00:03:51,030
"rather like a movie critic
who never goes to see the movie

62
00:03:51,030 --> 00:03:53,150
but tells you how
good it is based

63
00:03:53,150 --> 00:03:55,490
on the actors and the director."