1
00:00:00,500 --> 00:00:02,890
For this problem,
assume that you

2
00:00:02,890 --> 00:00:05,310
have discovered a
room full of discarded

3
00:00:05,310 --> 00:00:07,460
5-stage pipelined betas.

4
00:00:07,460 --> 00:00:10,320
These betas fall
into four categories.

5
00:00:10,320 --> 00:00:13,460
The first is completely
functional 5-stage Betas

6
00:00:13,460 --> 00:00:16,490
with full bypass
and annulment logic.

7
00:00:16,490 --> 00:00:19,960
The second are betas
with a bad register file.

8
00:00:19,960 --> 00:00:23,560
In these betas, all data read
directly from the register file

9
00:00:23,560 --> 00:00:24,960
is zero.

10
00:00:24,960 --> 00:00:27,840
Note that if the data is
read from a bypass path,

11
00:00:27,840 --> 00:00:30,730
then the correct
value will be read.

12
00:00:30,730 --> 00:00:34,400
The third set are betas
without bypass paths.

13
00:00:34,400 --> 00:00:37,260
And finally, the fourth
are betas without annulment

14
00:00:37,260 --> 00:00:39,330
of branch delay slots.

15
00:00:39,330 --> 00:00:41,710
The problem is that the
betas are not labeled,

16
00:00:41,710 --> 00:00:46,450
so we do not know which
falls into which category.

17
00:00:46,450 --> 00:00:49,140
You come up with the
test program shown here.

18
00:00:49,140 --> 00:00:51,760
Your plan is to single
step through the program

19
00:00:51,760 --> 00:00:54,570
using each Beta chip,
carefully noting

20
00:00:54,570 --> 00:00:58,340
the address that the final
JMP loads into the PC.

21
00:00:58,340 --> 00:01:01,090
Your goal is to determine
which of the four classes

22
00:01:01,090 --> 00:01:05,080
each chip falls into
via this JMP address.

23
00:01:05,080 --> 00:01:07,610
Notice that on a
fully functional beta,

24
00:01:07,610 --> 00:01:11,060
this code would execute the
instructions sequentially

25
00:01:11,060 --> 00:01:12,670
skipping the MULC instruction.

26
00:01:15,560 --> 00:01:17,880
Here we see a pipeline
diagram showing

27
00:01:17,880 --> 00:01:21,040
execution of this program
on a fully functional beta

28
00:01:21,040 --> 00:01:23,610
from category C1.

29
00:01:23,610 --> 00:01:26,620
It shows that the although the
MULC instruction is fetched

30
00:01:26,620 --> 00:01:31,730
in cycle 2, it gets annulled
when the BEQ is in the RF stage

31
00:01:31,730 --> 00:01:35,280
and it determines that the
branch to label X, or the SUBC

32
00:01:35,280 --> 00:01:37,870
instruction, will be taken.

33
00:01:37,870 --> 00:01:43,280
The MULC is annulled by
inserting a NOP in its place.

34
00:01:43,280 --> 00:01:45,500
The ADDC and BEQ
instructions read R31

35
00:01:45,500 --> 00:01:48,440
from the register file.

36
00:01:48,440 --> 00:01:50,840
The SUBC, however,
gets the value

37
00:01:50,840 --> 00:01:54,550
of R2 via the bypass path
from the BEQ instruction

38
00:01:54,550 --> 00:01:57,120
which is in the MEM stage.

39
00:01:57,120 --> 00:02:00,840
The ADD then reads R0 and R2.

40
00:02:00,840 --> 00:02:03,390
R0 has already made it
back to the register file

41
00:02:03,390 --> 00:02:05,690
because the ADDC
instruction completed

42
00:02:05,690 --> 00:02:07,980
by the end of cycle 4.

43
00:02:07,980 --> 00:02:10,910
R2, however, is read
via the bypass path

44
00:02:10,910 --> 00:02:15,290
from the SUBC instruction
which is in the ALU stage.

45
00:02:15,290 --> 00:02:17,460
Finally, the JMP,
reads the value

46
00:02:17,460 --> 00:02:21,400
of R3 via the bypass path
from the ADD instruction

47
00:02:21,400 --> 00:02:25,540
which is in the ALU
stage in cycle 6.

48
00:02:25,540 --> 00:02:29,040
When run on a fully functional
beta with bypass paths

49
00:02:29,040 --> 00:02:31,200
and annulment of
branch delay slots,

50
00:02:31,200 --> 00:02:34,380
the code behaves as follows:

51
00:02:34,380 --> 00:02:37,530
The ADDC sets R0 = 4.

52
00:02:37,530 --> 00:02:40,610
The BEQ stores PC + 4 into R2.

53
00:02:40,610 --> 00:02:46,220
Since the ADDC is at address
0, the BEQ is at address 4,

54
00:02:46,220 --> 00:02:51,520
so PC + 4 = 8 is stored into
R2, and the program branches

55
00:02:51,520 --> 00:02:53,430
to label X.

56
00:02:53,430 --> 00:02:57,390
Next, the SUBC subtracts 4
from the latest value of R2

57
00:02:57,390 --> 00:03:00,940
and stores the result
which is 4 back into R2.

58
00:03:00,940 --> 00:03:08,240
The ADD adds R0 and R2, or 4
and 4, and stores the result

59
00:03:08,240 --> 00:03:09,560
which is 8 into R3.

60
00:03:09,560 --> 00:03:17,230
The JMP jumps to the
address in R3 which is 8.

61
00:03:17,230 --> 00:03:20,390
When run on C2 which has
a bad register file that

62
00:03:20,390 --> 00:03:23,740
always outputs a zero, the
behavior of the program

63
00:03:23,740 --> 00:03:26,430
changes a bit.

64
00:03:26,430 --> 00:03:27,940
The ADDC and BEQ
instructions which

65
00:03:27,940 --> 00:03:32,040
use R31 which is 0 anyways
behave in the same way

66
00:03:32,040 --> 00:03:33,810
as before.

67
00:03:33,810 --> 00:03:37,990
The SUBC, which gets the value
of R2 from the bypass path,

68
00:03:37,990 --> 00:03:41,790
reads the correct value
for R2 which is 8.

69
00:03:41,790 --> 00:03:45,130
Recall that only reads directly
from the register file return

70
00:03:45,130 --> 00:03:49,160
a zero, whereas if the data is
coming from a bypass path, then

71
00:03:49,160 --> 00:03:50,960
you get the correct value.

72
00:03:50,960 --> 00:03:52,700
So the SUBC produces a 4.

73
00:03:52,700 --> 00:03:56,940
The ADD reads R0 from
the register file

74
00:03:56,940 --> 00:03:59,600
and R2 from the bypass path.

75
00:03:59,600 --> 00:04:01,810
The result is that
the value of R0

76
00:04:01,810 --> 00:04:07,690
is read as if it was 0 while
that of R2 is correct and is 4.

77
00:04:07,690 --> 00:04:10,930
So R3 gets the value
4 assigned to it,

78
00:04:10,930 --> 00:04:13,830
and that is the address that
the JMP instruction jumps

79
00:04:13,830 --> 00:04:20,190
to because it too reads its
register from a bypass path.

80
00:04:20,190 --> 00:04:24,300
When run on C3 which does
not have any bypass paths,

81
00:04:24,300 --> 00:04:27,100
some of the instructions
will read stale values

82
00:04:27,100 --> 00:04:29,030
of their source operands.

83
00:04:29,030 --> 00:04:31,670
Let’s go through the
example in detail.

84
00:04:31,670 --> 00:04:35,400
The ADDC and BEQ
instructions read

85
00:04:35,400 --> 00:04:39,810
R31 which is 0 from the register
file so what they ultimately

86
00:04:39,810 --> 00:04:41,820
produce does not change.

87
00:04:41,820 --> 00:04:44,760
However, you must keep
in mind that the updated

88
00:04:44,760 --> 00:04:47,000
value of the
destination register

89
00:04:47,000 --> 00:04:51,100
will not get updated until
after that instruction completes

90
00:04:51,100 --> 00:04:54,430
the WB stage of the pipeline.

91
00:04:54,430 --> 00:04:58,470
When the SUBC reads R2, it
gets a stale value of R2

92
00:04:58,470 --> 00:05:02,030
because the BEQ instruction
has not yet completed,

93
00:05:02,030 --> 00:05:05,080
so it assumes that R2 = 0.

94
00:05:05,080 --> 00:05:10,830
It then subtracts 4 from that
and tries to write -4 into R2.

95
00:05:10,830 --> 00:05:12,980
Next the ADD runs.

96
00:05:12,980 --> 00:05:14,920
Recall that the
ADD would normally

97
00:05:14,920 --> 00:05:18,200
read R0 from the register
file because by the time

98
00:05:18,200 --> 00:05:21,990
the ADD is in the RF stage,
the ADDC which writes to R0

99
00:05:21,990 --> 00:05:25,160
has completed all
the pipeline stages.

100
00:05:25,160 --> 00:05:29,510
The ADD also normally read
R2 from the bypass path.

101
00:05:29,510 --> 00:05:33,200
However, since C3 does
not have bypass paths,

102
00:05:33,200 --> 00:05:36,920
it reads a stale value of
R2 from the register file.

103
00:05:36,920 --> 00:05:39,440
To determine which
stale value it reads,

104
00:05:39,440 --> 00:05:42,210
we need to examine
the pipeline diagram

105
00:05:42,210 --> 00:05:46,880
to see if either the BEQ or the
SUBC operations, both of which

106
00:05:46,880 --> 00:05:49,590
eventually update
R2, have completed

107
00:05:49,590 --> 00:05:52,090
by the time the ADD reads
its source operands.

108
00:05:55,130 --> 00:05:57,500
Looking at our
pipeline diagram, we

109
00:05:57,500 --> 00:06:00,210
see that when the ADD
is in the RF stage,

110
00:06:00,210 --> 00:06:03,770
neither the BEQ nor the
SUBC have completed,

111
00:06:03,770 --> 00:06:08,210
thus the value read for R2 is
the initial value of R2 which

112
00:06:08,210 --> 00:06:08,710
is 0.

113
00:06:08,710 --> 00:06:13,670
We can now determine the
behavior of the ADD instruction

114
00:06:13,670 --> 00:06:17,510
which is that it assumes
R0 = 4 and R2 = 0

115
00:06:17,510 --> 00:06:22,220
and will eventually
write a 4 into R3.

116
00:06:22,220 --> 00:06:24,310
Finally, the JMP
instruction wants

117
00:06:24,310 --> 00:06:26,780
to read the result
of the ADD, however,

118
00:06:26,780 --> 00:06:28,820
since there are no
bypass paths, it

119
00:06:28,820 --> 00:06:31,940
reads the original
value of R3 which was 0

120
00:06:31,940 --> 00:06:34,440
and jumps to address 0.

121
00:06:34,440 --> 00:06:37,530
Of course, had we looked at
our code a little more closely,

122
00:06:37,530 --> 00:06:39,450
we could have determined
this without all

123
00:06:39,450 --> 00:06:43,180
the intermediate steps because
the ADD is the only instruction

124
00:06:43,180 --> 00:06:45,850
that tries to update
R3, and you know

125
00:06:45,850 --> 00:06:49,400
that the JMP would normally get
R3 from the bypass path which

126
00:06:49,400 --> 00:06:53,200
is not available, therefore it
must read the original value

127
00:06:53,200 --> 00:06:57,520
of R3 which is 0.

128
00:06:57,520 --> 00:07:01,550
In category C4, the betas do
not annul instructions that were

129
00:07:01,550 --> 00:07:04,430
fetched after a branch
instruction but aren’t supposed

130
00:07:04,430 --> 00:07:05,990
to get executed.

131
00:07:05,990 --> 00:07:07,910
This means that
the MULC which is

132
00:07:07,910 --> 00:07:10,640
fetched after the
BEQ is actually

133
00:07:10,640 --> 00:07:14,850
executed in the pipeline
and affects the value of R2.

134
00:07:14,850 --> 00:07:19,230
Let’s take a close look at what
happens in each instruction.

135
00:07:19,230 --> 00:07:22,570
Once again we begin with
the ADDC setting R0 to 4

136
00:07:22,570 --> 00:07:25,870
and the BEQ setting R2 to 8.

137
00:07:25,870 --> 00:07:28,130
Since our bypass
paths are now working,

138
00:07:28,130 --> 00:07:29,990
we can assume that
we can immediately

139
00:07:29,990 --> 00:07:32,260
get the updated value.

140
00:07:32,260 --> 00:07:36,170
Next, we execute the MULC which
takes the latest value of R2

141
00:07:36,170 --> 00:07:40,070
from the bypass path
and multiplies it by 2.

142
00:07:40,070 --> 00:07:42,300
So it sets R2 = 16.

143
00:07:42,300 --> 00:07:48,440
The SUBC now uses this value
for R2 from which it subtracts 4

144
00:07:48,440 --> 00:07:52,060
to produce R2 = 12.

145
00:07:52,060 --> 00:07:56,740
The ADD then reads R0 =
4 and adds to it R2 = 12

146
00:07:56,740 --> 00:08:02,300
to produce R3 = 16.

147
00:08:02,300 --> 00:08:04,750
Finally, the JMP
jumps to address 16.

148
00:08:04,750 --> 00:08:08,480
Since each of the
four categories

149
00:08:08,480 --> 00:08:11,790
produces a unique jump
address, this program

150
00:08:11,790 --> 00:08:16,630
can be used to sort out all the
betas into the four categories.