1
00:00:00,500 --> 00:00:03,370
Given a set of symbols
and their probabilities,

2
00:00:03,370 --> 00:00:06,960
Huffman’s Algorithm tells us
how to construct an optimal

3
00:00:06,960 --> 00:00:08,750
variable-length encoding.

4
00:00:08,750 --> 00:00:11,570
By “optimal” we mean that,
assuming we’re encoding each

5
00:00:11,570 --> 00:00:15,000
symbol one-at-a-time, no other
variable-length code will have

6
00:00:15,000 --> 00:00:17,370
a shorter expected length.

7
00:00:17,370 --> 00:00:20,000
The algorithm builds the
binary tree for the encoding

8
00:00:20,000 --> 00:00:21,340
from the bottom up.

9
00:00:21,340 --> 00:00:23,830
Start by choosing the two
symbols with the smallest

10
00:00:23,830 --> 00:00:25,550
probability (which
means they have

11
00:00:25,550 --> 00:00:28,010
highest information content
and should have the longest

12
00:00:28,010 --> 00:00:29,750
encoding).

13
00:00:29,750 --> 00:00:31,440
If anywhere along
the way, two symbols

14
00:00:31,440 --> 00:00:35,010
have the same probability,
simply choose one arbitrarily.

15
00:00:35,010 --> 00:00:37,340
In our running example, the
two symbols with the lowest

16
00:00:37,340 --> 00:00:40,310
probability are C and D.

17
00:00:40,310 --> 00:00:42,580
Combine the symbols
as a binary subtree,

18
00:00:42,580 --> 00:00:44,770
with one branch labeled
“0” and the other “1”.

19
00:00:44,770 --> 00:00:48,530
It doesn’t matter which
labels go with which branch.

20
00:00:48,530 --> 00:00:51,050
Remove C and D from
our list of symbols,

21
00:00:51,050 --> 00:00:54,840
and replace them with the newly
constructed subtree, whose root

22
00:00:54,840 --> 00:00:57,660
has the associated
probability 1/6,

23
00:00:57,660 --> 00:01:01,640
the sum of the probabilities
of its two branches.

24
00:01:01,640 --> 00:01:04,620
Now continue, at each step
choosing the two symbols

25
00:01:04,620 --> 00:01:07,050
and/or subtrees with the
lowest probabilities,

26
00:01:07,050 --> 00:01:10,620
combining the choices
into a new subtree.

27
00:01:10,620 --> 00:01:12,650
At this point in our
example, the symbol A

28
00:01:12,650 --> 00:01:16,880
has the probability 1/3, the
symbol B the probability 1/2

29
00:01:16,880 --> 00:01:19,610
and the C/D subtree
probability 1/6.

30
00:01:19,610 --> 00:01:24,290
So we’ll combine A
with the C/D subtree.

31
00:01:24,290 --> 00:01:26,880
On the final step we only
have two choices left:

32
00:01:26,880 --> 00:01:29,810
B and the A/C/D
subtree, which we

33
00:01:29,810 --> 00:01:32,350
combine in a new subtree,
whose root then becomes

34
00:01:32,350 --> 00:01:33,910
the root of the
tree representing

35
00:01:33,910 --> 00:01:36,440
the optimal
variable-length code.

36
00:01:36,440 --> 00:01:40,060
Happily, this is the code
we’ve been using all along!

37
00:01:40,060 --> 00:01:42,730
As mentioned above, we can
produce a number of different

38
00:01:42,730 --> 00:01:45,890
variable-length codes by
swapping the “0” and “1” labels

39
00:01:45,890 --> 00:01:48,290
on any of the subtree branches.

40
00:01:48,290 --> 00:01:51,120
But all those encodings would
have the same expected length,

41
00:01:51,120 --> 00:01:53,770
which is determined by the
distance of each symbol

42
00:01:53,770 --> 00:01:56,640
from the root of the tree,
not the labels along the path

43
00:01:56,640 --> 00:01:58,560
from root to leaf.

44
00:01:58,560 --> 00:02:00,250
So all these different
encodings are

45
00:02:00,250 --> 00:02:03,110
equivalent in terms
of their efficiency.

46
00:02:03,110 --> 00:02:05,450
Now try your hand at
using Huffman’s Algorithm

47
00:02:05,450 --> 00:02:08,960
to construct optimal
variable-length encodings.