rocket-inclusive-cache

介绍rocket chip中的inclusive cache。

整体框架

Inclusive Cache中,每个Bank会对应一个scheduler。每个scheduler会有一套独立的处理。

Scheduler的框架如下图所示。包含如下几个部分。其中红线表示内部请求req通道,蓝线表示数据通道。

  • sinkA:接收上级A通道的请求,转为req发给MSHR处理
  • sourceB:接收MSHR的命令,发送B通道的请求给上级B通道。
  • sinkC:接收上级C通道的请求,包括release和probeAck(Data)。如果是release请求就转化为req发给MSHR处理;如果是probeAck(Data)就发送命令给MSHR,数据写入BankedStore。
  • sourceD:接收MSHR的命令,读写BankedStore,发送resp给上级D通道。
  • sinkE:接收上级E通道的请求,转发给MSHR
  • sinkX:接收上级CMO的请求,转为req发给MSHR
  • sourceA:接收MSHR的命令,发送请求给下级通道
  • sinkB:因为inclusive cache不支持作为中间级cache,所以没有sinkB
  • sourceC:接收MSHR的命令,从BankedStore读取数据,发送请求给下级C通道
  • sinkD:接收下级D通道来的命令和数据,一方面把数据写入BankedStore,另一方面发送命令给MSHR
  • sourceE:接收MSHR的命令,转发到下级E通道
  • sourceX:接收MSHR的resp,转发
  • BankedStore:cache的data部分
  • Directory:保存directory的结构
  • Requests:保存请求,里面是个listBuffer,有3MSHR个数的队列,每个队列对应MSHR abc。
  • MSHR:每个req的处理,里面是个大状态机。
  • scheduler:内部有不少逻辑,用来处理req的调度

ListBuffer

先来看一下inclusive cache里用到的一个特殊的数据结构,叫ListBuffer,简而言之,它就是个buffer,用来保存数据的,那为什么叫listBuffer呢,来看下下面这张图。

这个ListBuffer里共有8个entry,可以被用来保存数据。而且共有2个queue,每个queue是一个单独的链表。在初始状态,所有的entry都是unused,没被使用,是空的。外面可以申请一个entry,将其挂入queue0中。假设外面对queue0申请了3个entry,对queue1申请了2个entry,就变成了如下图的数据结构。

也就是说,ListBuffer可以有entries个空间用来保存数据,有queues个队列用来排队,所有的queue共用entries个的存储空间,而每个queue又是独立排队的。

1
2
3
4
5
6
7
8
9
10
11
case class ListBufferParameters[T <: Data](gen: T, queues: Int, entries: Int, bypass: Boolean)
{
val queueBits = log2Up(queues)
val entryBits = log2Up(entries)
}

class ListBufferPush[T <: Data](params: ListBufferParameters[T]) extends GenericParameterizedBundle(params)
{
val index = UInt(width = params.queueBits)
val data = params.gen.asOutput
}

这样就能理解以上代码中:

  • T :表示一个数据结构,也就是要保存的数据结构,作为参数传递进来
  • queues:表示里面有多少个队列,需要独立排队的
  • entries:表示里面有多少个entries,用来保存数据的。
  • index:表示要push到那个queue里面去排队
1
2
3
4
5
6
7
8
9
10
11
12
13
14
class ListBuffer[T <: Data](params: ListBufferParameters[T]) extends Module
{
val io = new Bundle {
// 输入,要push进来的数据
// ready,如果里面没有entries了,就把ready拉低
// push is visible on the same cycle; flow queues
val push = Decoupled(new ListBufferPush(params)).flip
// 表示哪些队列里有数据
val valid = UInt(width = params.queues)
// 将某个队列pop出去的请求
val pop = Valid(UInt(width = params.queueBits)).flip
// pop出去的数据,在pop请求当拍有效,因为是寄存器保存的,数据已经放在head保存了,只是选择出来而已
val data = params.gen.asOutput
}
1
2
3
4
5
6
val valid = RegInit(UInt(0, width=params.queues))
val head = Mem(params.queues, UInt(width = params.entryBits))
val tail = Mem(params.queues, UInt(width = params.entryBits))
val used = RegInit(UInt(0, width=params.entries))
val next = Mem(params.entries, UInt(width = params.entryBits))
val data = Mem(params.entries, params.gen)

上面代码构建了ListBuffer的主要数据结构,分别是:

  • valid:各个队列是否有数据
  • head:各个队列的头指针,里面保存的是该队列第一个数据的序号
  • tail:各个队列的尾指针,里面保存的是该队列最后一个数据的序号
  • used:所有entry是否被使用的标识。
  • data:所有entry保存的数据内容。
  • next:最难理解的是这个结构,它是所有entry指向下一个指针的链表。

以上面的图作为例子,假设有2个队列,8个entry,展示下各数据结构的内容。

从上图可以看出,由head就能拿到queue0的第一个数据的index,再用这个index查找next,得到1,也就是0的下一个entry是1,再去next查1的下一个,得到2,然后发现2已经和tail里的值相等了,说明已经是尾巴了,从而完成链表的索引,得到了上面图的链表结构。

1
2
val freeOH = ~(leftOR(~used) << 1) & ~used
val freeIdx = OHToUInt(freeOH)

leftOR: 从低往高找,遇到1就停,把剩下的高位全置一。比如:

  • b0101 -> b1111

  • b1010 -> b1110

  • b1000 -> b1000

OHToUINT: return the bit position of the sole high bit of the input bitvector。assume exactly one high bit. results undefined otherwise.

  • b0100 -> 2.U

上面这两句举例来说明。

used ~used leftOR(~used) leftOR(~used)<<1 ~(leftOR(~used)<<1) ~(leftOR(~used)<<1) & ~used OHToUInt(freeOH)
0101 1010 1110 1100 0011 0010 1
1010 0101 1111 1110 0001 0001 0

可以看出,freeIdx就是拿到used中最低位为0的位索引,也就是拿到一个可用的位置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
  val valid_set = Wire(init = UInt(0, width=params.queues))
val valid_clr = Wire(init = UInt(0, width=params.queues))
val used_set = Wire(init = UInt(0, width=params.entries))
val used_clr = Wire(init = UInt(0, width=params.entries))

val push_tail = tail.read(io.push.bits.index)
val push_valid = valid(io.push.bits.index)

// 输出的ready是只要used里面有0,就为1;如果used全1了,ready就为0。也就是只要有空闲位置就返回1。
io.push.ready := !used.andR()
// fire 函数也就是ready和valid相与
when (io.push.fire()) {
// valid_set是 1<< io.push.bits.index,也就是新push进来的bitmap,假设index是1,那valid_set是15'b10
valid_set := UIntToOH(io.push.bits.index, params.queues)
// used_set是 freeOH,本次用掉了哪一bit。
used_set := freeOH
// 将io.push.bits.data放入data里面的freeIdx位置
data.write(freeIdx, io.push.bits.data)
when (push_valid) {
// 如果valid已经是高了,就把freeIdx写入next的push_tail位置
next.write(push_tail, freeIdx)
} .otherwise {
// 如果valid不是高,就把freeIdx写入head的io.push.bits.index位置
head.write(io.push.bits.index, freeIdx)
}
// 将freeIdx写入tail的io.push.bits.index位置。tail放的是push.index到used_index的关系。
tail.write(io.push.bits.index, freeIdx)
}

val pop_head = head.read(io.pop.bits)
val pop_valid = valid(io.pop.bits)

// params.bypass=0。io.data直接取data得pop_head位置,io.valid直接等于valid。
// 可以看出,head里面存了第一个由index到Idx的对应关系,由pop.bits查head拿到Idx,再由Idx查data拿到输出的data。
// Bypass push data to the peek port
io.data := (if (!params.bypass) data.read(pop_head) else Mux(!pop_valid, io.push.bits.data, data.read(pop_head)))
io.valid := (if (!params.bypass) valid else (valid | valid_set))

// 检查,pop了一个不valid的请求,就报错。
// It is an error to pop something that is not valid
assert (!io.pop.fire() || (io.valid)(io.pop.bits))

when (io.pop.fire()) {
used_clr := UIntToOH(pop_head, params.entries)
when (pop_head === tail.read(io.pop.bits)) {
valid_clr := UIntToOH(io.pop.bits, params.queues)
}
head.write(io.pop.bits, Mux(io.push.fire() && push_valid && push_tail === pop_head, freeIdx, next.read(pop_head)))
}

// params.bypass=0,所以used就是(used & ~used_clr) | used_set,就是计算当前used
// Empty bypass changes no state
when (Bool(!params.bypass) || !io.pop.valid || pop_valid) {
used := (used & ~used_clr) | used_set
valid := (valid & ~valid_clr) | valid_set
}
}

最后再来看看里面的数据结构,head、tail、next是怎么组织起来的。举例说明。

  1. 初始used为0,那最低可用的就是freeIdx为0,valid也为全0
  2. 收到一个push请求,index是1,source是40
  3. 下一拍。70行data[0]的值被写为40。因为valid[1]为0,所以push_valid为0,走74行分支,head[1]的值被写为0。76行tail[1]的值为写为0。valid[1]的值被写为1,表示index为1已有一个请求了。
  4. 又收到一个push请求,index还是1,source是80
  5. 下一拍。70行data[0]的值被写为40。因为valid[1]为1了,所以push_valid为1,走72行分支,next[0]的值为写为1,其中的0是由tail[push_index(1)]读出来的,也就是图中线a;其中的1是当前freeIdx;可以看出next其实是保存了上一个freeIdx到下一个freeIdx的关系。76行tail[1]的值为写为1。
  6. 又收到一个请求,处理和上面类似。
  7. 收到一个pop请求,index为1。直接从head[1]里面取到used的index,也就是0,再从used[0]里面拿到40,输出的数据就拿到了。
  8. 此时还需要更新head的值,根据used的index 0(线b),从next[0]里面拿出下一个used的index(线c),也就是1,再把这个1写入head[pop_index(1)]里面。这样就完成了数据结构处理。
  9. 当再收到一个pop请求时,同样,从head[1]里面就能取到used[1]的值80了。

总结一下三个数据结构。

head是根据push/pop index来作为地址索引的,head里面的内容是used的地址,根据这个地址作为索引,可以拿到保存的数据。

tail是根据push/pop index来作为地址索引的,tail里面的内容是最后一个used的地址,根据这个可以判断是否该index的请求都处理完了。

next是根据上一个used地址来作为地址索引的,next里面的内容是下一个used的地址,每pop出一个请求,需要将下一个used的地址写入head,以便于下一次pop请求。

也就是说,该模块里面可以根据index来保存一个链表来保存数据。所有的数据共享一个内存data。head、tail、next保存了index到data索引之间的关系。

sinkA

sinkA里面就做了下面 几个事情:

  • 接收上面A通道来的请求,转化为req发给MSHR
  • 如果有数据,就存入putBuffer里
  • 接收sourceD来的读数据请求,从putBuffer里把数据给sourceD

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
// sinkA里putBuffer的entry内容
class PutBufferAEntry(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val data = UInt(width = params.inner.bundle.dataBits)
val mask = UInt(width = params.inner.bundle.dataBits/8)
val corrupt = Bool()
}

class PutBufferPop(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
// putBuffer的queue index
val index = UInt(width = params.putBits)
// 是不是最后一个,由sourceD计算得到的,对于sinkA来说是输入
val last = Bool()
}

class SinkA(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
// 输出给MSHR的req,输出
val req = Decoupled(new FullRequest(params))
// 上面来的a通道请求,输入
val a = Decoupled(new TLBundleA(params.inner.bundle)).flip
// for use by SourceD:
// sourceD过来的读请求,里面有要读的queue index,输入。
val pb_pop = Decoupled(new PutBufferPop(params)).flip
// 输出,给sourceD的数据,当拍有效。会由sourceD写入bankedStore
val pb_beat = new PutBufferAEntry(params)
}

// No restrictions on the type of buffer
val a = params.micro.innerBuf.a(io.a)

// putBuffer也是个listBuffer,奇怪的是为什么在外面又维护了个list,其实和里面应该一样。
val putbuffer = Module(new ListBuffer(ListBufferParameters(new PutBufferAEntry(params), params.putLists, params.putBeats, false)))
val lists = RegInit(UInt(0, width = params.putLists))

val lists_set = Wire(init = UInt(0, width = params.putLists))
val lists_clr = Wire(init = UInt(0, width = params.putLists))
lists := (lists | lists_set) & ~lists_clr

val free = !lists.andR()
// 取反的findFirstOne
val freeOH = ~(leftOR(~lists) << 1) & ~lists
val freeIdx = OHToUInt(freeOH)

val first = params.inner.first(a)
val hasData = params.inner.hasData(a.bits)

// We need to split the A input to three places:
// If it is the first beat, it must go to req
// If it has Data, it must go to the putbuffer
// If it has Data AND is the first beat, it must claim a list

val req_block = first && !io.req.ready
val buf_block = hasData && !putbuffer.io.push.ready
val set_block = hasData && first && !free

params.ccover(a.valid && req_block, "SINKA_REQ_STALL", "No MSHR available to sink request")
params.ccover(a.valid && buf_block, "SINKA_BUF_STALL", "No space in putbuffer for beat")
params.ccover(a.valid && set_block, "SINKA_SET_STALL", "No space in putbuffer for request")

a.ready := !req_block && !buf_block && !set_block
io.req.valid := a.valid && first && !buf_block && !set_block
// 只有hasData才会进putbuffer,只有put才会进。
putbuffer.io.push.valid := a.valid && hasData && !req_block && !set_block
when (a.valid && first && hasData && !req_block && !buf_block) { lists_set := freeOH }

val (tag, set, offset) = params.parseAddress(a.bits.address)
// 由于是burst,也就是所有burst的数据被排成了一个队,共用一个put(也就是index),这里暂存了下put,因为burst不会interleave,所以不会有问题。
val put = Mux(first, freeIdx, RegEnable(freeIdx, first))

io.req.bits.prio := Vec(UInt(1, width=3).asBools)
io.req.bits.control:= Bool(false)
io.req.bits.opcode := a.bits.opcode
io.req.bits.param := a.bits.param
io.req.bits.size := a.bits.size
io.req.bits.source := a.bits.source
io.req.bits.offset := offset
io.req.bits.set := set
io.req.bits.tag := tag
io.req.bits.put := put

putbuffer.io.push.bits.index := put
putbuffer.io.push.bits.data.data := a.bits.data
putbuffer.io.push.bits.data.mask := a.bits.mask
putbuffer.io.push.bits.data.corrupt := a.bits.corrupt

// Grant access to pop the data
putbuffer.io.pop.bits := io.pb_pop.bits.index
putbuffer.io.pop.valid := io.pb_pop.fire()
io.pb_pop.ready := putbuffer.io.valid(io.pb_pop.bits.index)
io.pb_beat := putbuffer.io.data

when (io.pb_pop.fire() && io.pb_pop.bits.last) {
lists_clr := UIntToOH(io.pb_pop.bits.index, params.putLists)
}
}

sourceB

sourceB干的事情比较简单,只需要把从MSHR来的请求,转化为向上级B通道的probe就行了。

唯一的数据结构就是remain这个寄存器,用来保存本次probe需要probe哪几个client,在req来的时候置上remain,每发出一个probe就把其中的那bit清零。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class SourceBRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val param = UInt(width = 3)
val tag = UInt(width = params.tagBits)
val set = UInt(width = params.setBits)
val clients = UInt(width = params.clientBits)
}

class SourceB(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val req = Decoupled(new SourceBRequest(params)).flip
val b = Decoupled(new TLBundleB(params.inner.bundle))
}

if (params.firstLevel) {
// Tie off unused ports
io.req.ready := Bool(true)
io.b.valid := Bool(false)
} else {
// 用来保存,需要probe的client
val remain = RegInit(UInt(0, width=params.clientBits))
val remain_set = Wire(init = UInt(0, width=params.clientBits))
val remain_clr = Wire(init = UInt(0, width=params.clientBits))
remain := (remain | remain_set) & ~remain_clr

// 只要本次probe还没结束,就不能接收新的req
val busy = remain.orR()
val todo = Mux(busy, remain, io.req.bits.clients)
// 找到下一个待发送的client
val next = ~(leftOR(todo) << 1) & todo

if (params.clientBits > 1) {
params.ccover(PopCount(remain) > UInt(1), "SOURCEB_MULTI_PROBE", "Had to probe more than one client")
}

assert (!io.req.valid || io.req.bits.clients =/= UInt(0))

io.req.ready := !busy
when (io.req.fire()) { remain_set := io.req.bits.clients }

// No restrictions on the type of buffer used here
val b = Wire(io.b)
io.b <> params.micro.innerBuf.b(b)

// 会连续发送probe出去
b.valid := busy || io.req.valid
// 只有probe被接收,才会切换将next指向的当前remain清除,然后指向下一个
when (b.fire()) { remain_clr := next }
params.ccover(b.valid && !b.ready, "SOURCEB_STALL", "Backpressured when issuing a probe")

val tag = Mux(!busy, io.req.bits.tag, RegEnable(io.req.bits.tag, io.req.fire()))
val set = Mux(!busy, io.req.bits.set, RegEnable(io.req.bits.set, io.req.fire()))
val param = Mux(!busy, io.req.bits.param, RegEnable(io.req.bits.param, io.req.fire()))

b.bits.opcode := TLMessages.Probe
b.bits.param := param
b.bits.size := UInt(params.offsetBits)
b.bits.source := params.clientSource(next)
b.bits.address := params.expandAddress(tag, set, UInt(0))
b.bits.mask := ~UInt(0, width = params.inner.manager.beatBytes)
b.bits.data := UInt(0)
}
}

sinkC

sinkC会根据上面来的C通道请求的类型来走不同分支。如果是probeAck(Data)就走左侧处理流程,如果是release(Data)就走右侧处理流程。

先看左侧probeAck(Data)处理流程。

  • 将resp发送给MSHRs
  • 去MSHRs通过set查询出要写入的way
  • 如果有数据,将数据写入bankedStore。不会进putBuffer了,因为它已经是最老的了,已经分配过MSHR了,不需要排队了。

再看右侧release(Data)处理流程。

  • 发送请求给scheduler
  • release的数据要进putBuffer,因为它要去排队
  • 接收sourceD的读取数据的请求,把数据发送给sourceD,让它写入bankedStore
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class SinkC(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val req = Decoupled(new FullRequest(params)) // Release
val resp = Valid(new SinkCResponse(params)) // ProbeAck
val c = Decoupled(new TLBundleC(params.inner.bundle)).flip
// Find 'way' via MSHR CAM lookup
val set = UInt(width = params.setBits)
val way = UInt(width = params.wayBits).flip
// ProbeAck write-back
val bs_adr = Decoupled(new BankedStoreInnerAddress(params))
val bs_dat = new BankedStoreInnerPoison(params)
// SourceD sideband
val rel_pop = Decoupled(new PutBufferPop(params)).flip
val rel_beat = new PutBufferCEntry(params)
}

if (params.firstLevel) {
// Tie off unused ports
io.req.valid := Bool(false)
io.resp.valid := Bool(false)
io.c.ready := Bool(true)
io.set := UInt(0)
io.bs_adr.valid := Bool(false)
io.rel_pop.ready := Bool(true)
} else {
// No restrictions on the type of buffer
val c = params.micro.innerBuf.c(io.c)

val (tag, set, offset) = params.parseAddress(c.bits.address)
val (first, last, _, beat) = params.inner.count(c)
val hasData = params.inner.hasData(c.bits)
val raw_resp = c.bits.opcode === TLMessages.ProbeAck || c.bits.opcode === TLMessages.ProbeAckData
val resp = Mux(c.valid, raw_resp, RegEnable(raw_resp, c.valid))

// Handling of C is broken into two cases:
// ProbeAck
// if hasData, must be written to BankedStore
// if last beat, trigger resp
// Release
// if first beat, trigger req
// if hasData, go to putBuffer
// if hasData && first beat, must claim a list

assert (!(c.valid && c.bits.corrupt), "Data poisoning unavailable")

io.set := Mux(c.valid, set, RegEnable(set, c.valid)) // finds us the way

// 很遥远的地方来的数据,用个Queue打一拍,再送往bankedStore
// Cut path from inner C to the BankedStore SRAM setup
// ... this makes it easier to layout the L2 data banks far away
val bs_adr = Wire(io.bs_adr)
io.bs_adr <> Queue(bs_adr, 1, pipe=true)
io.bs_dat.data := RegEnable(c.bits.data, bs_adr.fire())
bs_adr.valid := resp && (!first || (c.valid && hasData))
// noop,占位的意思。因为在burst来了之后,bs_adr.valid会持续拉高,这个时候是不能真的写ram的,所以noop取c.valid的反,标识不能取写sram。
bs_adr.bits.noop := !c.valid
bs_adr.bits.way := io.way
bs_adr.bits.set := io.set
bs_adr.bits.beat := Mux(c.valid, beat, RegEnable(beat + bs_adr.ready.asUInt, c.valid))
bs_adr.bits.mask := ~UInt(0, width = params.innerMaskBits)
params.ccover(bs_adr.valid && !bs_adr.ready, "SINKC_SRAM_STALL", "Data SRAM busy")

io.resp.valid := resp && c.valid && (first || last) && (!hasData || bs_adr.ready)
io.resp.bits.last := last
io.resp.bits.set := set
io.resp.bits.tag := tag
io.resp.bits.source := c.bits.source
io.resp.bits.param := c.bits.param
io.resp.bits.data := hasData

val putbuffer = Module(new ListBuffer(ListBufferParameters(new PutBufferCEntry(params), params.relLists, params.relBeats, false)))
val lists = RegInit(UInt(0, width = params.relLists))

val lists_set = Wire(init = UInt(0, width = params.relLists))
val lists_clr = Wire(init = UInt(0, width = params.relLists))
lists := (lists | lists_set) & ~lists_clr

val free = !lists.andR()
val freeOH = ~(leftOR(~lists) << 1) & ~lists
val freeIdx = OHToUInt(freeOH)

val req_block = first && !io.req.ready
val buf_block = hasData && !putbuffer.io.push.ready
val set_block = hasData && first && !free

params.ccover(c.valid && !raw_resp && req_block, "SINKC_REQ_STALL", "No MSHR available to sink request")
params.ccover(c.valid && !raw_resp && buf_block, "SINKC_BUF_STALL", "No space in putbuffer for beat")
params.ccover(c.valid && !raw_resp && set_block, "SINKC_SET_STALL", "No space in putbuffer for request")

c.ready := Mux(raw_resp, !hasData || bs_adr.ready, !req_block && !buf_block && !set_block)

io.req.valid := !resp && c.valid && first && !buf_block && !set_block
putbuffer.io.push.valid := !resp && c.valid && hasData && !req_block && !set_block
when (!resp && c.valid && first && hasData && !req_block && !buf_block) { lists_set := freeOH }

val put = Mux(first, freeIdx, RegEnable(freeIdx, first))

io.req.bits.prio := Vec(UInt(4, width=3).asBools)
io.req.bits.control:= Bool(false)
io.req.bits.opcode := c.bits.opcode
io.req.bits.param := c.bits.param
io.req.bits.size := c.bits.size
io.req.bits.source := c.bits.source
io.req.bits.offset := offset
io.req.bits.set := set
io.req.bits.tag := tag
io.req.bits.put := put

putbuffer.io.push.bits.index := put
putbuffer.io.push.bits.data.data := c.bits.data
putbuffer.io.push.bits.data.corrupt := c.bits.corrupt

// Grant access to pop the data
putbuffer.io.pop.bits := io.rel_pop.bits.index
putbuffer.io.pop.valid := io.rel_pop.fire()
io.rel_pop.ready := putbuffer.io.valid(io.rel_pop.bits.index)
io.rel_beat := putbuffer.io.data

when (io.rel_pop.fire() && io.rel_pop.bits.last) {
lists_clr := UIntToOH(io.rel_pop.bits.index, params.relLists)
}
}
}

sourceD

sourceD干的事情如下。

  • 接收MSHR来的请求。
  • 维护一个流水,共7级
  • 发读请求给bankedStore
  • 发读请求给sinkA/sinkC
  • 回应上级D通道
  • 发送写请求给bankedStore
  • 留三级stage给bypass用
  • 响应sourceC / sinkD 过来的查询请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
class SourceD(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val req = Decoupled(new SourceDRequest(params)).flip
val d = Decoupled(new TLBundleD(params.inner.bundle))
// Put data from SinkA
val pb_pop = Decoupled(new PutBufferPop(params))
val pb_beat = new PutBufferAEntry(params).flip
// Release data from SinkC
val rel_pop = Decoupled(new PutBufferPop(params))
val rel_beat = new PutBufferCEntry(params).flip
// Access to the BankedStore
val bs_radr = Decoupled(new BankedStoreInnerAddress(params))
val bs_rdat = new BankedStoreInnerDecoded(params).flip
val bs_wadr = Decoupled(new BankedStoreInnerAddress(params))
val bs_wdat = new BankedStoreInnerPoison(params)
// Is it safe to evict/replace this way?
val evict_req = new SourceDHazard(params).flip
val evict_safe = Bool()
val grant_req = new SourceDHazard(params).flip
val grant_safe = Bool()
}

val beatBytes = params.inner.manager.beatBytes
val writeBytes = params.micro.writeBytes

val s1_valid = Wire(Bool())
val s2_valid = Wire(Bool())
val s3_valid = Wire(Bool())
val s2_ready = Wire(Bool())
val s3_ready = Wire(Bool())
val s4_ready = Wire(Bool())

////////////////////////////////////// STAGE 1 //////////////////////////////////////
// Reform the request beats

val busy = RegInit(Bool(false))
// s1_block_r:上一拍的SRAM读出来,bankedSTtore fire了,但是s2_ready没拉高,所以该请求被挡在了s1,但是因为已经读过bandedStore了,所以不需要再读了,就通过这个信号把读bankedStore的valid给拉低。
val s1_block_r = RegInit(Bool(false))
// s1_counter:burst的计数器
val s1_counter = RegInit(UInt(0, width = params.innerBeatBits))
val s1_req_reg = RegEnable(io.req.bits, !busy && io.req.valid)
val s1_req = Mux(!busy, io.req.bits, s1_req_reg)
// s1_x_bypass:s2/3/4是否要bypass给s1
val s1_x_bypass = Wire(UInt(width = beatBytes/writeBytes)) // might go from high=>low during stall
// s1_latch_bypass:流水线在流动?**TODO**
val s1_latch_bypass = RegNext(!(busy || io.req.valid) || s2_ready)
// s1_bypass:update unless,流水线流动了就用实时算出来的,不然就用寄存器里的
val s1_bypass = Mux(s1_latch_bypass, s1_x_bypass, RegEnable(s1_x_bypass, s1_latch_bypass))
// s1_mask:这个是给bankedStore的,如果是bypass就不需要去读了
val s1_mask = MaskGen(s1_req.offset, s1_req.size, beatBytes, writeBytes) & ~s1_bypass
val s1_grant = (s1_req.opcode === AcquireBlock && s1_req.param === BtoT) || s1_req.opcode === AcquirePerm
// s1_need_r:是否需要去读bankedStore。
val s1_need_r = s1_mask.orR && s1_req.prio(0) && s1_req.opcode =/= Hint && !s1_grant &&
(s1_req.opcode =/= PutFullData || s1_req.size < UInt(log2Ceil(writeBytes)))
// s1_valid_r:去读bankedStore的有效
val s1_valid_r = (busy || io.req.valid) && s1_need_r && !s1_block_r
// s1_need_pb: 是否需要putBuffer的数据,可能是sinkA,也可能是sinkC来的。
val s1_need_pb = Mux(s1_req.prio(0), !s1_req.opcode(2), s1_req.opcode(0)) // hasData
// s1_single:一拍就结束的请求,没有数据
val s1_single = Mux(s1_req.prio(0), s1_req.opcode === Hint || s1_grant, s1_req.opcode === Release)
// s1_retires:把数据保存在s3后面几拍,用来做bypass
val s1_retires = !s1_single // retire all operations with data in s3 for bypass (saves energy)
// Alternatively: val s1_retires = s1_need_pb // retire only updates for bypass (less backpressure from WB)
// s1_beats1:最后一个burst的序号,如果4个burst,那就是3。
val s1_beats1 = Mux(s1_single, UInt(0), UIntToOH1(s1_req.size, log2Up(params.cache.blockBytes)) >> log2Ceil(beatBytes))
// s1_beat:第几个burst,前面offset是说可能从中间读起。
val s1_beat = (s1_req.offset >> log2Ceil(beatBytes)) | s1_counter
val s1_last = s1_counter === s1_beats1
val s1_first = s1_counter === UInt(0)

params.ccover(s1_block_r, "SOURCED_1_SRAM_HOLD", "SRAM read-out successful, but stalled by stage 2")
params.ccover(!s1_latch_bypass, "SOURCED_1_BYPASS_HOLD", "Bypass match successful, but stalled by stage 2")
params.ccover((busy || io.req.valid) && !s1_need_r, "SOURCED_1_NO_MODIFY", "Transaction servicable without SRAM")

io.bs_radr.valid := s1_valid_r
io.bs_radr.bits.noop := Bool(false)
io.bs_radr.bits.way := s1_req.way
io.bs_radr.bits.set := s1_req.set
io.bs_radr.bits.beat := s1_beat
io.bs_radr.bits.mask := s1_mask

params.ccover(io.bs_radr.valid && !io.bs_radr.ready, "SOURCED_1_READ_STALL", "Data readout stalled")

// Make a queue to catch BS readout during stalls
val queue = Module(new Queue(io.bs_rdat, 3, flow=true))
queue.io.enq.valid := RegNext(RegNext(io.bs_radr.fire()))
queue.io.enq.bits := io.bs_rdat
assert (!queue.io.enq.valid || queue.io.enq.ready)

params.ccover(!queue.io.enq.ready, "SOURCED_1_QUEUE_FULL", "Filled SRAM skidpad queue completely")

when (io.bs_radr.fire()) { s1_block_r := Bool(true) }
// 这里没有ready是因为,scheduler判断了只有ready了才会把valid选出来。
when (io.req.valid) { busy := Bool(true) }
when (s1_valid && s2_ready) {
s1_counter := s1_counter + UInt(1)
s1_block_r := Bool(false)
when (s1_last) {
s1_counter := UInt(0)
busy := Bool(false)
}
}

params.ccover(s1_valid && !s2_ready, "SOURCED_1_STALL", "Stage 1 pipeline blocked")

io.req.ready := !busy
s1_valid := (busy || io.req.valid) && (!s1_valid_r || io.bs_radr.ready)

////////////////////////////////////// STAGE 2 //////////////////////////////////////
// Fetch the request data

val s2_latch = s1_valid && s2_ready
val s2_full = RegInit(Bool(false))
val s2_valid_pb = RegInit(Bool(false))
val s2_beat = RegEnable(s1_beat, s2_latch)
val s2_bypass = RegEnable(s1_bypass, s2_latch)
val s2_req = RegEnable(s1_req, s2_latch)
val s2_last = RegEnable(s1_last, s2_latch)
val s2_need_r = RegEnable(s1_need_r, s2_latch)
val s2_need_pb = RegEnable(s1_need_pb, s2_latch)
val s2_retires = RegEnable(s1_retires, s2_latch)
// s2_need_d:需要d通道发请求,s1_need_bp表示是写的请求,取反表示是读的请求,是读的请求就需要回多拍d通道。s1_first表示是single或者写的请求,只要回一拍。
val s2_need_d = RegEnable(!s1_need_pb || s1_first, s2_latch)
val s2_pdata_raw = Wire(new PutBufferACEntry(params))
// putBuffer是用寄存器搭的,所以当拍就能拿到数据,157行能立即拿到数据?
val s2_pdata = s2_pdata_raw holdUnless s2_valid_pb

s2_pdata_raw.data := Mux(s2_req.prio(0), io.pb_beat.data, io.rel_beat.data)
s2_pdata_raw.mask := Mux(s2_req.prio(0), io.pb_beat.mask, ~UInt(0, width = params.inner.manager.beatBytes))
s2_pdata_raw.corrupt := Mux(s2_req.prio(0), io.pb_beat.corrupt, io.rel_beat.corrupt)

io.pb_pop.valid := s2_valid_pb && s2_req.prio(0)
io.pb_pop.bits.index := s2_req.put
io.pb_pop.bits.last := s2_last
io.rel_pop.valid := s2_valid_pb && !s2_req.prio(0)
io.rel_pop.bits.index := s2_req.put
io.rel_pop.bits.last := s2_last

params.ccover(io.pb_pop.valid && !io.pb_pop.ready, "SOURCED_2_PUTA_STALL", "Channel A put buffer was not ready in time")
if (!params.firstLevel)
params.ccover(io.rel_pop.valid && !io.rel_pop.ready, "SOURCED_2_PUTC_STALL", "Channel C put buffer was not ready in time")

val pb_ready = Mux(s2_req.prio(0), io.pb_pop.ready, io.rel_pop.ready)
when (pb_ready) { s2_valid_pb := Bool(false) }
when (s2_valid && s3_ready) { s2_full := Bool(false) }
when (s2_latch) { s2_valid_pb := s1_need_pb }
when (s2_latch) { s2_full := Bool(true) }

params.ccover(s2_valid && !s3_ready, "SOURCED_2_STALL", "Stage 2 pipeline blocked")

s2_valid := s2_full && (!s2_valid_pb || pb_ready)
s2_ready := !s2_full || (s3_ready && (!s2_valid_pb || pb_ready))

////////////////////////////////////// STAGE 3 //////////////////////////////////////
// Send D response

val s3_latch = s2_valid && s3_ready
val s3_full = RegInit(Bool(false))
val s3_valid_d = RegInit(Bool(false))
val s3_beat = RegEnable(s2_beat, s3_latch)
val s3_bypass = RegEnable(s2_bypass, s3_latch)
val s3_req = RegEnable(s2_req, s3_latch)
val s3_adjusted_opcode = Mux(s3_req.bad, Get, s3_req.opcode) // kill update when denied
val s3_last = RegEnable(s2_last, s3_latch)
val s3_pdata = RegEnable(s2_pdata, s3_latch)
val s3_need_pb = RegEnable(s2_need_pb, s3_latch)
val s3_retires = RegEnable(s2_retires, s3_latch)
val s3_need_r = RegEnable(s2_need_r, s3_latch)
val s3_need_bs = s3_need_pb
val s3_acq = s3_req.opcode === AcquireBlock || s3_req.opcode === AcquirePerm

// Collect s3's data from either the BankedStore or bypass
// NOTE: we use the s3_bypass passed down from s1_bypass, because s2-s4 were guarded by the hazard checks and not stale
val s3_bypass_data = Wire(UInt())
// 8个8个切
def chunk(x: UInt): Seq[UInt] = Seq.tabulate(beatBytes/writeBytes) { i => x((i+1)*writeBytes*8-1, i*writeBytes*8) }
// 一位一位在切
def chop (x: UInt): Seq[Bool] = Seq.tabulate(beatBytes/writeBytes) { i => x(i) }
def bypass(sel: UInt, x: UInt, y: UInt) =
(chop(sel) zip (chunk(x) zip chunk(y))) .map { case (s, (x, y)) => Mux(s, x, y) } .asUInt
// s3_rdata:会从bypass的data和bankedStore里读出来的部分数据(前面读的时候会把bypass的数据给mask掉),做拼接。
// s1只是用来判断是否可以bypass,s3才是真的bypass数据。而在s1_bypass判断的时候用的是s2/3/4,而s3用的时候用的是4/5/6,就流水流起来了。
val s3_rdata = bypass(s3_bypass, s3_bypass_data, queue.io.deq.bits.data)

// Lookup table for response codes
val grant = Mux(s3_req.param === BtoT, Grant, GrantData)
val resp_opcode = Vec(Seq(AccessAck, AccessAck, AccessAckData, AccessAckData, AccessAckData, HintAck, grant, Grant))

// No restrictions on the type of buffer used here
val d = Wire(io.d)
io.d <> params.micro.innerBuf.d(d)

// 有多个beat,就会发多个请求给sourceD??是的。
d.valid := s3_valid_d
d.bits.opcode := Mux(s3_req.prio(0), resp_opcode(s3_req.opcode), ReleaseAck)
d.bits.param := Mux(s3_req.prio(0) && s3_acq, Mux(s3_req.param =/= NtoB, toT, toB), UInt(0))
d.bits.size := s3_req.size
d.bits.source := s3_req.source
d.bits.sink := s3_req.sink
d.bits.denied := s3_req.bad
d.bits.data := s3_rdata
d.bits.corrupt := s3_req.bad && d.bits.opcode(0)

queue.io.deq.ready := s3_valid && s4_ready && s3_need_r
assert (!s3_full || !s3_need_r || queue.io.deq.valid)

when (d.ready) { s3_valid_d := Bool(false) }
when (s3_valid && s4_ready) { s3_full := Bool(false) }
when (s3_latch) { s3_valid_d := s2_need_d }
when (s3_latch) { s3_full := Bool(true) }

params.ccover(s3_valid && !s4_ready, "SOURCED_3_STALL", "Stage 3 pipeline blocked")

s3_valid := s3_full && (!s3_valid_d || d.ready)
s3_ready := !s3_full || (s4_ready && (!s3_valid_d || d.ready))
////////////////////////////////////// STAGE 4 //////////////////////////////////////
// Writeback updated data

val s4_latch = s3_valid && s3_retires && s4_ready
val s4_full = RegInit(Bool(false))
val s4_beat = RegEnable(s3_beat, s4_latch)
val s4_need_r = RegEnable(s3_need_r, s4_latch)
val s4_need_bs = RegEnable(s3_need_bs, s4_latch)
val s4_need_pb = RegEnable(s3_need_pb, s4_latch)
val s4_req = RegEnable(s3_req, s4_latch)
val s4_adjusted_opcode = RegEnable(s3_adjusted_opcode, s4_latch)
val s4_pdata = RegEnable(s3_pdata, s4_latch)
val s4_rdata = RegEnable(s3_rdata, s4_latch)

val atomics = Module(new Atomics(params.inner.bundle))
atomics.io.write := s4_req.prio(2)
atomics.io.a.opcode := s4_adjusted_opcode
atomics.io.a.param := s4_req.param
atomics.io.a.size := UInt(0)
atomics.io.a.source := UInt(0)
atomics.io.a.address := UInt(0)
atomics.io.a.mask := s4_pdata.mask
atomics.io.a.data := s4_pdata.data
atomics.io.data_in := s4_rdata

io.bs_wadr.valid := s4_full && s4_need_bs
io.bs_wadr.bits.noop := Bool(false)
io.bs_wadr.bits.way := s4_req.way
io.bs_wadr.bits.set := s4_req.set
io.bs_wadr.bits.beat := s4_beat
io.bs_wadr.bits.mask := Cat(s4_pdata.mask.asBools.grouped(writeBytes).map(_.reduce(_||_)).toList.reverse)
io.bs_wdat.data := atomics.io.data_out
assert (!(s4_full && s4_need_pb && s4_pdata.corrupt), "Data poisoning unsupported")

params.ccover(io.bs_wadr.valid && !io.bs_wadr.ready, "SOURCED_4_WRITEBACK_STALL", "Data writeback stalled")
params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === MIN, "SOURCED_4_ATOMIC_MIN", "Evaluated a signed minimum atomic")
params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === MAX, "SOURCED_4_ATOMIC_MAX", "Evaluated a signed maximum atomic")
params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === MINU, "SOURCED_4_ATOMIC_MINU", "Evaluated an unsigned minimum atomic")
params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === MAXU, "SOURCED_4_ATOMIC_MAXU", "Evaluated an unsigned minimum atomic")
params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === ADD, "SOURCED_4_ATOMIC_ADD", "Evaluated an addition atomic")
params.ccover(s4_req.prio(0) && s4_req.opcode === LogicalData && s4_req.param === XOR, "SOURCED_4_ATOMIC_XOR", "Evaluated a bitwise XOR atomic")
params.ccover(s4_req.prio(0) && s4_req.opcode === LogicalData && s4_req.param === OR, "SOURCED_4_ATOMIC_OR", "Evaluated a bitwise OR atomic")
params.ccover(s4_req.prio(0) && s4_req.opcode === LogicalData && s4_req.param === AND, "SOURCED_4_ATOMIC_AND", "Evaluated a bitwise AND atomic")
params.ccover(s4_req.prio(0) && s4_req.opcode === LogicalData && s4_req.param === SWAP, "SOURCED_4_ATOMIC_SWAP", "Evaluated a bitwise SWAP atomic")

when (io.bs_wadr.ready || !s4_need_bs) { s4_full := Bool(false) }
when (s4_latch) { s4_full := Bool(true) }

s4_ready := !s3_retires || !s4_full || io.bs_wadr.ready || !s4_need_bs

////////////////////////////////////// RETIRED //////////////////////////////////////

// Record for bypass the last three retired writebacks
// We need 3 slots to collect what was in s2, s3, s4 when the request was in s1
// ... you can't rely on s4 being full if bubbles got introduced between s1 and s2
val retire = s4_full && (io.bs_wadr.ready || !s4_need_bs)

val s5_req = RegEnable(s4_req, retire)
val s5_beat = RegEnable(s4_beat, retire)
val s5_dat = RegEnable(atomics.io.data_out, retire)

val s6_req = RegEnable(s5_req, retire)
val s6_beat = RegEnable(s5_beat, retire)
val s6_dat = RegEnable(s5_dat, retire)

val s7_dat = RegEnable(s6_dat, retire)

////////////////////////////////////// BYPASSS //////////////////////////////////////

// Manually retime this circuit to pull a register stage forward
val pre_s3_req = Mux(s3_latch, s2_req, s3_req)
val pre_s4_req = Mux(s4_latch, s3_req, s4_req)
val pre_s5_req = Mux(retire, s4_req, s5_req)
val pre_s6_req = Mux(retire, s5_req, s6_req)
val pre_s3_beat = Mux(s3_latch, s2_beat, s3_beat)
val pre_s4_beat = Mux(s4_latch, s3_beat, s4_beat)
val pre_s5_beat = Mux(retire, s4_beat, s5_beat)
val pre_s6_beat = Mux(retire, s5_beat, s6_beat)
val pre_s5_dat = Mux(retire, atomics.io.data_out, s5_dat)
val pre_s6_dat = Mux(retire, s5_dat, s6_dat)
val pre_s7_dat = Mux(retire, s6_dat, s7_dat)
val pre_s4_full = s4_latch || (!(io.bs_wadr.ready || !s4_need_bs) && s4_full)

val pre_s3_4_match = pre_s4_req.set === pre_s3_req.set && pre_s4_req.way === pre_s3_req.way && pre_s4_beat === pre_s3_beat && pre_s4_full
val pre_s3_5_match = pre_s5_req.set === pre_s3_req.set && pre_s5_req.way === pre_s3_req.way && pre_s5_beat === pre_s3_beat
val pre_s3_6_match = pre_s6_req.set === pre_s3_req.set && pre_s6_req.way === pre_s3_req.way && pre_s6_beat === pre_s3_beat

val pre_s3_4_bypass = Mux(pre_s3_4_match, MaskGen(pre_s4_req.offset, pre_s4_req.size, beatBytes, writeBytes), UInt(0))
val pre_s3_5_bypass = Mux(pre_s3_5_match, MaskGen(pre_s5_req.offset, pre_s5_req.size, beatBytes, writeBytes), UInt(0))
val pre_s3_6_bypass = Mux(pre_s3_6_match, MaskGen(pre_s6_req.offset, pre_s6_req.size, beatBytes, writeBytes), UInt(0))

s3_bypass_data :=
bypass(RegNext(pre_s3_4_bypass), atomics.io.data_out, RegNext(
bypass(pre_s3_5_bypass, pre_s5_dat,
bypass(pre_s3_6_bypass, pre_s6_dat,
pre_s7_dat))))

// Detect which parts of s1 will be bypassed from later pipeline stages (s1-s4)
// Note: we also bypass from reads ahead in the pipeline to save power
val s1_2_match = s2_req.set === s1_req.set && s2_req.way === s1_req.way && s2_beat === s1_beat && s2_full && s2_retires
val s1_3_match = s3_req.set === s1_req.set && s3_req.way === s1_req.way && s3_beat === s1_beat && s3_full && s3_retires
val s1_4_match = s4_req.set === s1_req.set && s4_req.way === s1_req.way && s4_beat === s1_beat && s4_full

for (i <- 0 until 8) {
val cover = UInt(i)
val s2 = s1_2_match === cover(0)
val s3 = s1_3_match === cover(1)
val s4 = s1_4_match === cover(2)
params.ccover(io.req.valid && s2 && s3 && s4, "SOURCED_BYPASS_CASE_" + i, "Bypass data from all subsets of pipeline stages")
}

val s1_2_bypass = Mux(s1_2_match, MaskGen(s2_req.offset, s2_req.size, beatBytes, writeBytes), UInt(0))
val s1_3_bypass = Mux(s1_3_match, MaskGen(s3_req.offset, s3_req.size, beatBytes, writeBytes), UInt(0))
val s1_4_bypass = Mux(s1_4_match, MaskGen(s4_req.offset, s4_req.size, beatBytes, writeBytes), UInt(0))

s1_x_bypass := s1_2_bypass | s1_3_bypass | s1_4_bypass

////////////////////////////////////// HAZARDS //////////////////////////////////////

// SinkC, SourceC, and SinkD can never interfer with each other because their operation
// is fully contained with an execution plan of an MSHR. That MSHR owns the entire set, so
// there is no way for a data race.

// However, SourceD is special. We allow it to run ahead after the MSHR and scheduler have
// released control of a set+way. This is necessary to allow single cycle occupancy for
// hits. Thus, we need to be careful about data hazards between SourceD and the other ports
// of the BankedStore. We can at least compare to registers 's1_req_reg', because the first
// cycle of SourceD falls within the occupancy of the MSHR's plan.

// Must ReleaseData=> be interlocked? RaW hazard
io.evict_safe :=
(!busy || io.evict_req.way =/= s1_req_reg.way || io.evict_req.set =/= s1_req_reg.set) &&
(!s2_full || io.evict_req.way =/= s2_req.way || io.evict_req.set =/= s2_req.set) &&
(!s3_full || io.evict_req.way =/= s3_req.way || io.evict_req.set =/= s3_req.set) &&
(!s4_full || io.evict_req.way =/= s4_req.way || io.evict_req.set =/= s4_req.set)

// Must =>GrantData be interlocked? WaR hazard
io.grant_safe :=
(!busy || io.grant_req.way =/= s1_req_reg.way || io.grant_req.set =/= s1_req_reg.set) &&
(!s2_full || io.grant_req.way =/= s2_req.way || io.grant_req.set =/= s2_req.set) &&
(!s3_full || io.grant_req.way =/= s3_req.way || io.grant_req.set =/= s3_req.set) &&
(!s4_full || io.grant_req.way =/= s4_req.way || io.grant_req.set =/= s4_req.set)

// SourceD cannot overlap with SinkC b/c the only way inner caches could become
// dirty such that they want to put data in via SinkC is if we Granted them permissions,
// which must flow through the SourecD pipeline.
}

sinkE

就是个消息的转发,什么也没干。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class SinkEResponse(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val sink = UInt(width = params.inner.bundle.sinkBits)
}

class SinkE(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val resp = Valid(new SinkEResponse(params))
val e = Decoupled(new TLBundleE(params.inner.bundle)).flip
}

if (params.firstLevel) {
// Tie off unused ports
io.resp.valid := Bool(false)
io.e.ready := Bool(true)
} else {
// No restrictions on buffer
val e = params.micro.innerBuf.e(io.e)

e.ready := Bool(true)
io.resp.valid := e.valid
io.resp.bits.sink := e.bits.sink
}
}

sinkX

也是个消息的转发,把接收到的请求,适配到a通道的请求类型上,并把control标记上,表示是从X通道来的请求。

为什么不在前级转?没准很远的地方来的,可以节省线

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class SinkXRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val address = UInt(width = params.inner.bundle.addressBits)
}

class SinkX(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val req = Decoupled(new FullRequest(params))
val x = Decoupled(new SinkXRequest(params)).flip
}

val x = Queue(io.x, 1)
val (tag, set, offset) = params.parseAddress(x.bits.address)

x.ready := io.req.ready
io.req.valid := x.valid
params.ccover(x.valid && !x.ready, "SINKX_STALL", "Backpressure when accepting a control message")

io.req.bits.prio := Vec(UInt(1, width=3).asBools) // same prio as A
io.req.bits.control:= Bool(true)
io.req.bits.opcode := UInt(0)
io.req.bits.param := UInt(0)
io.req.bits.size := UInt(params.offsetBits)
// The source does not matter, because a flush command never allocates a way.
// However, it must be a legal source, otherwise assertions might spuriously fire.
io.req.bits.source := UInt(params.inner.client.clients.map(_.sourceId.start).min)
io.req.bits.offset := UInt(0)
io.req.bits.set := set
io.req.bits.tag := tag
}

sourceA

也就转下消息类型。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class SourceARequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val tag = UInt(width = params.tagBits)
val set = UInt(width = params.setBits)
val param = UInt(width = 3)
val source = UInt(width = params.outer.bundle.sourceBits)
// 表示是acquireBlock还是acquirePerm
val block = Bool()
}

class SourceA(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val req = Decoupled(new SourceARequest(params)).flip
val a = Decoupled(new TLBundleA(params.outer.bundle))
}

// ready must be a register, because we derive valid from ready
require (!params.micro.outerBuf.a.pipe && params.micro.outerBuf.a.isDefined)

val a = Wire(io.a)
io.a <> params.micro.outerBuf.a(a)

io.req.ready := a.ready
a.valid := io.req.valid
params.ccover(a.valid && !a.ready, "SOURCEA_STALL", "Backpressured when issuing an Acquire")

a.bits.opcode := Mux(io.req.bits.block, TLMessages.AcquireBlock, TLMessages.AcquirePerm)
a.bits.param := io.req.bits.param
a.bits.size := UInt(params.offsetBits)
a.bits.source := io.req.bits.source
a.bits.address := params.expandAddress(io.req.bits.tag, io.req.bits.set, UInt(0))
a.bits.mask := ~UInt(0, width = params.outer.manager.beatBytes)
a.bits.data := UInt(0)
}

sourceC

sourceC干的事情如下:

  • 接收MSHR过来的请求
  • 查询sourceD,是否有hazard的情况,如果有,就等待,如果没有,就可以往下走
  • 在s1的时候去bankedStore读取数据
  • 在s3的时候把数据拿回来,送到queue里面
  • 由queue再发到下级C端口上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
class SourceC(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val req = Decoupled(new SourceCRequest(params)).flip
val c = Decoupled(new TLBundleC(params.outer.bundle))
// BankedStore port
val bs_adr = Decoupled(new BankedStoreOuterAddress(params))
val bs_dat = new BankedStoreOuterDecoded(params).flip
// RaW hazard
val evict_req = new SourceDHazard(params)
// 是release走完了MSHR,到了sourceC,要往下踢之前,会去查sourceD里是否还有没做完的同地址的请求,因为这个同地址的请求可能往bankedStore里写数据(且是更老的),那release的数据就不是新的,就要等sourceD的这个请求做完了,才能让release走。
val evict_safe = Bool().flip
}

// We ignore the depth and pipe is useless here (we have to provision for worst-case=stall)
require (!params.micro.outerBuf.c.pipe)

val beatBytes = params.outer.manager.beatBytes
val beats = params.cache.blockBytes / beatBytes
// flow:满了还能进,只要出口还能出。pipe:空了还能出,只要入口有进来
val flow = params.micro.outerBuf.c.flow
// 为什么是加3,因为是用的下面的ready来握手的,中间还有3拍,所以当下面ready拉下来之后,还有三拍数据从cache里读出来,所以要3个空间来存数据。
val queue = Module(new Queue(io.c.bits, beats + 3 + (if (flow) 0 else 1), flow = flow))

// queue.io.count is far too slow
val fillBits = log2Up(beats + 4)
val fill = RegInit(UInt(0, width = fillBits))
val room = RegInit(Bool(true))
// 只要有入队,或者有出队
when (queue.io.enq.fire() =/= queue.io.deq.fire()) {
// fill:就是queue里的counter
fill := fill + Mux(queue.io.enq.fire(), UInt(1), ~UInt(0, width = fillBits))
// room:就是queue的counter <= 1,为什么是小于等于1?是小于等于1的时候,才能装下这个burst,才ready。或者可以理解为有空位的意思。之所以有这个问题是因为我没意识到数据是burtst的。
room := fill === UInt(0) || ((fill === UInt(1) || fill === UInt(2)) && !queue.io.enq.fire())
}
assert (room === queue.io.count <= UInt(1))

val busy = RegInit(Bool(false))
val beat = RegInit(UInt(0, width = params.outerBeatBits))
val last = beat.andR
val req = Mux(!busy, io.req.bits, RegEnable(io.req.bits, !busy && io.req.valid))
// 在有请求,且有空位(可以放得下一组burst数据),且dirty(需要evict),的时候才want_data
val want_data = busy || (io.req.valid && room && io.req.bits.dirty)

io.req.ready := !busy && room

io.evict_req.set := req.set
io.evict_req.way := req.way

// 只有第一个beat的时候,才要判断evict_safe。
io.bs_adr.valid := (beat.orR || io.evict_safe) && want_data
io.bs_adr.bits.noop := Bool(false)
io.bs_adr.bits.way := req.way
io.bs_adr.bits.set := req.set
io.bs_adr.bits.beat := beat
io.bs_adr.bits.mask := ~UInt(0, width = params.outerMaskBits)

params.ccover(io.req.valid && io.req.bits.dirty && room && !io.evict_safe, "SOURCEC_HAZARD", "Prevented Eviction data hazard with backpressure")
params.ccover(io.bs_adr.valid && !io.bs_adr.ready, "SOURCEC_SRAM_STALL", "Data SRAM busy")

when (io.req.valid && room && io.req.bits.dirty) { busy := Bool(true) }
when (io.bs_adr.fire()) {
when (last) { busy := Bool(false) }
beat := beat + UInt(1)
}

// 为什么有S2/S3,因为banked store里面读数据打了一拍,这里就是跟拍子。
val s2_latch = Mux(want_data, io.bs_adr.fire(), io.req.fire())
val s2_valid = RegNext(s2_latch)
val s2_req = RegEnable(req, s2_latch)
val s2_beat = RegEnable(beat, s2_latch)
val s2_last = RegEnable(last, s2_latch)

val s3_latch = s2_valid
val s3_valid = RegNext(s3_latch)
val s3_req = RegEnable(s2_req, s3_latch)
val s3_beat = RegEnable(s2_beat, s3_latch)
val s3_last = RegEnable(s2_last, s3_latch)

val c = Wire(io.c)
c.valid := s3_valid
c.bits.opcode := s3_req.opcode
c.bits.param := s3_req.param
c.bits.size := UInt(params.offsetBits)
c.bits.source := s3_req.source
c.bits.address := params.expandAddress(s3_req.tag, s3_req.set, UInt(0))
c.bits.data := io.bs_dat.data
c.bits.corrupt := Bool(false)

// We never accept at the front-end unless we're sure things will fit
assert(!c.valid || c.ready)
params.ccover(!c.ready, "SOURCEC_QUEUE_FULL", "Eviction queue fully utilized")

queue.io.enq <> c
io.c <> queue.io.deq
}

sinkD

和sinkC的probeAck类似,直接写入bankedStore。

hazard就是如果有相同set/way的在sourceD走流水,就暂缓一下往bankedStore写,因为没有buffer,会挡在口子上。用grant_safe来控制。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
class SinkD(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val resp = Valid(new SinkDResponse(params)) // Grant or ReleaseAck
val d = Decoupled(new TLBundleD(params.outer.bundle)).flip
// Lookup the set+way from MSHRs
val source = UInt(width = params.outer.bundle.sourceBits)
val way = UInt(width = params.wayBits).flip
val set = UInt(width = params.setBits).flip
// Banked Store port
val bs_adr = Decoupled(new BankedStoreOuterAddress(params))
val bs_dat = new BankedStoreOuterPoison(params)
// WaR hazard
val grant_req = new SourceDHazard(params)
val grant_safe = Bool().flip
}

// No restrictions on buffer
val d = params.micro.outerBuf.d(io.d)

val (first, last, _, beat) = params.outer.count(d)
val hasData = params.outer.hasData(d.bits)

io.source := Mux(d.valid, d.bits.source, RegEnable(d.bits.source, d.valid))
io.grant_req.way := io.way
io.grant_req.set := io.set

// 即使没有数据,也会往bankedStore送请求,不过会用noop来控制不往ram里写,从而保持正确的data ordering。
// Also send Grant(NoData) to BS to ensure correct data ordering
io.resp.valid := (first || last) && d.fire()
d.ready := io.bs_adr.ready && (!first || io.grant_safe)
io.bs_adr.valid := !first || (d.valid && io.grant_safe)
params.ccover(d.valid && first && !io.grant_safe, "SINKD_HAZARD", "Prevented Grant data hazard with backpressure")
params.ccover(io.bs_adr.valid && !io.bs_adr.ready, "SINKD_SRAM_STALL", "Data SRAM busy")

io.resp.bits.last := last
io.resp.bits.opcode := d.bits.opcode
io.resp.bits.param := d.bits.param
io.resp.bits.source := d.bits.source
io.resp.bits.sink := d.bits.sink
io.resp.bits.denied := d.bits.denied

io.bs_adr.bits.noop := !d.valid || !hasData
io.bs_adr.bits.way := io.way
io.bs_adr.bits.set := io.set
io.bs_adr.bits.beat := Mux(d.valid, beat, RegEnable(beat + io.bs_adr.ready.asUInt, d.valid))
io.bs_adr.bits.mask := ~UInt(0, width = params.outerMaskBits)
io.bs_dat.data := d.bits.data

assert (!(d.valid && d.bits.corrupt && !d.bits.denied), "Data poisoning unsupported")
}

sourceE

啥也没有,直接转消息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class SourceERequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val sink = UInt(width = params.outer.bundle.sinkBits)
}

class SourceE(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val req = Decoupled(new SourceERequest(params)).flip
val e = Decoupled(new TLBundleE(params.outer.bundle))
}

// ready must be a register, because we derive valid from ready
require (!params.micro.outerBuf.e.pipe && params.micro.outerBuf.e.isDefined)

val e = Wire(io.e)
io.e <> params.micro.outerBuf.e(e)

io.req.ready := e.ready
e.valid := io.req.valid

e.bits.sink := io.req.bits.sink

// we can't cover valid+!ready, because no backpressure on E is common
}

sourceX

啥也没有,直接发response。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// The control port response source
class SourceXRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val fail = Bool()
}

class SourceX(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val req = Decoupled(new SourceXRequest(params)).flip
val x = Decoupled(new SourceXRequest(params))
}

val x = Wire(io.x) // ready must not depend on valid
io.x <> Queue(x, 1)

io.req.ready := x.ready
x.valid := io.req.valid
params.ccover(x.valid && !x.ready, "SOURCEX_STALL", "Backpressure when sending a control message")

x.bits := io.req.bits
}

bankedStore

bankedStore是用来保存数据的,内部又分了多个子bank。它主要处理的事情是:

  • 保存cache的数据
  • 接收sinkC / sourceD / sourceC / sinkD 来的读写数据请求
  • 处理请求的优先级:sinkC > SourceC > sinkD > sourceDw > sourceDr
  • 处理数据位宽的转换,内外的数据位宽都可变,bank数根据位宽参数变化

该模块两大比较关键的点。

  • 为什么这么设置优先级?遗留问题
  • bank的组织方式。一条cacheline是横着放的,会被放在不同bank上,如果burst来的话,就会先访问bank0,再bank1,这样再来一个请求就能流水起来访问了。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
abstract class BankedStoreAddress(val inner: Boolean, params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
// noop是给burst用的,第一个burst来的时候,会占住下一个burst的bank,从而挡住年轻的访问。因为访问都是流水的,挡住了下一个burst,就一定挡住了他后面burst的bank。
val noop = Bool() // do not actually use the SRAMs, just block their use
val way = UInt(width = params.wayBits)
val set = UInt(width = params.setBits)
// beat是burst的个数
// innerBytes: L1->L2的bytes宽度。
// outerBytes: L2->Memory的bytes宽度。这两个都是相对于L2 CACHE来说的。
val beat = UInt(width = if (inner) params.innerBeatBits else params.outerBeatBits)
val mask = UInt(width = if (inner) params.innerMaskBits else params.outerMaskBits)
}

trait BankedStoreRW
{
val write = Bool()
}

class BankedStoreOuterAddress(params: InclusiveCacheParameters) extends BankedStoreAddress(false, params)
class BankedStoreInnerAddress(params: InclusiveCacheParameters) extends BankedStoreAddress(true, params)
class BankedStoreInnerAddressRW(params: InclusiveCacheParameters) extends BankedStoreInnerAddress(params) with BankedStoreRW

abstract class BankedStoreData(val inner: Boolean, params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val data = UInt(width = (if (inner) params.inner.manager.beatBytes else params.outer.manager.beatBytes)*8)
}

class BankedStoreOuterData(params: InclusiveCacheParameters) extends BankedStoreData(false, params)
class BankedStoreInnerData(params: InclusiveCacheParameters) extends BankedStoreData(true, params)
class BankedStoreInnerPoison(params: InclusiveCacheParameters) extends BankedStoreInnerData(params)
class BankedStoreOuterPoison(params: InclusiveCacheParameters) extends BankedStoreOuterData(params)
class BankedStoreInnerDecoded(params: InclusiveCacheParameters) extends BankedStoreInnerData(params)
class BankedStoreOuterDecoded(params: InclusiveCacheParameters) extends BankedStoreOuterData(params)

class BankedStore(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val sinkC_adr = Decoupled(new BankedStoreInnerAddress(params)).flip
val sinkC_dat = new BankedStoreInnerPoison(params).flip
val sinkD_adr = Decoupled(new BankedStoreOuterAddress(params)).flip
val sinkD_dat = new BankedStoreOuterPoison(params).flip
val sourceC_adr = Decoupled(new BankedStoreOuterAddress(params)).flip
val sourceC_dat = new BankedStoreOuterDecoded(params)
val sourceD_radr = Decoupled(new BankedStoreInnerAddress(params)).flip
val sourceD_rdat = new BankedStoreInnerDecoded(params)
val sourceD_wadr = Decoupled(new BankedStoreInnerAddress(params)).flip
val sourceD_wdat = new BankedStoreInnerPoison(params).flip
}
// innerBytes: L1->L2的bytes宽度。
val innerBytes = params.inner.manager.beatBytes
// outerBytes: L2->Memory的bytes宽度。这两个都是相对于L2 CACHE来说的。
val outerBytes = params.outer.manager.beatBytes
// rowBytes: CACHE的宽度,所有bank加起来的宽度
// portFactor:并行访问的port的个数,它会决定bankedstore的bank个数,??具体是基于什么来设置的还不明白??
val rowBytes = params.micro.portFactor * max(innerBytes, outerBytes)
require (rowBytes < params.cache.sizeBytes)
// rowEntries:CACHE的深度。
val rowEntries = params.cache.sizeBytes / rowBytes
// rowBits:CACHE的深度的index宽度。
val rowBits = log2Ceil(rowEntries)
// writeBytes:小bank的粒度,ECC的粒度
// numBanks: BANK个数
val numBanks = rowBytes / params.micro.writeBytes
val codeBits = 8*params.micro.writeBytes

val cc_banks = Seq.tabulate(numBanks) {
i =>
DescribedSRAM(
name = s"cc_banks_$i",
desc = "Banked Store",
size = rowEntries,
data = UInt(width = codeBits)
)
}
// These constraints apply on the port priorities:
// sourceC > sinkD outgoing Release > incoming Grant (we start eviction+refill concurrently)
// sinkC > sourceC incoming ProbeAck > outgoing ProbeAck (we delay probeack writeback by 1 cycle for QoR)
// sinkC > sourceDr incoming ProbeAck > SourceD read (we delay probeack writeback by 1 cycle for QoR)
// sourceDw > sourceDr modified data visible on next cycle (needed to ensure SourceD forward progress)
// sinkC > sourceC inner ProbeAck > outer ProbeAck (make wormhole routing possible [not yet implemented])
// sinkC&D > sourceD* beat arrival > beat read|update (make wormhole routing possible [not yet implemented])

// Combining these restrictions yields a priority scheme of:
// sinkC > sourceC > sinkD > sourceDw > sourceDr
// ^^^^^^^^^^^^^^^ outer interface

// 假设有3个请求,A > B > C , A要访问 - - A - (第2个bank),B要访问 - - B B (第2/3个bank),C 要访问 - - - C (第3个bank),因为入口和出口位宽可能不一样,所以访问粒度也可能不一样,这样是不能让AC先走,虽然 AC是访问不同bank,因为B在C前面,所以只能是 - - A -, - - B B , - - - C 这样走。
// 而如果是 - - A - , BB - -, - - - C,的情况,就可以一次走掉 BBAC。

// Requests have different port widths, but we don't want to allow cutting in line.
// Suppose we have requests A > B > C requesting ports --A-, --BB, ---C.
// The correct arbitration is to allow --A- only, not --AC.
// Obviously --A-, BB--, ---C should still be resolved to BBAC.

class Request extends Bundle {
val wen = Bool()
val index = UInt(width = rowBits)
// bankSel是我占住了这个bank
val bankSel = UInt(width = numBanks)
// bankSum是更高优先级的占住了这个bank
val bankSum = UInt(width = numBanks) // OR of all higher priority bankSels
// bankEn是给ram看的,ram的读写使能
val bankEn = UInt(width = numBanks) // ports actually activated by request
val data = Vec(numBanks, UInt(width = codeBits))
}

def req[T <: BankedStoreAddress](b: DecoupledIO[T], write: Bool, d: UInt): Request = {
val beatBytes = if (b.bits.inner) innerBytes else outerBytes
val ports = beatBytes / params.micro.writeBytes
val bankBits = log2Ceil(numBanks / ports)
val words = Seq.tabulate(ports) { i =>
val data = d((i + 1) * 8 * params.micro.writeBytes - 1, i * 8 * params.micro.writeBytes)
data
}
val a = Cat(b.bits.way, b.bits.set, b.bits.beat)
val m = b.bits.mask
val out = Wire(new Request)

val select = UIntToOH(a(bankBits-1, 0), numBanks/ports)
// ready是给这个请求的反馈,是否接收这个请求
val ready = Cat(Seq.tabulate(numBanks/ports) { i => !(out.bankSum((i+1)*ports-1, i*ports) & m).orR } .reverse)
b.ready := ready(a(bankBits-1, 0))

out.wen := write
out.index := a >> bankBits
// Fill: 复制多份,Fill(2, 1010) = 101010101
// FillInterleaved: 复制多份,FillInterleaved(2, 1010) = 11001100
out.bankSel := Mux(b.valid, FillInterleaved(ports, select) & Fill(numBanks/ports, m), UInt(0))
out.bankEn := Mux(b.bits.noop, UInt(0), out.bankSel & FillInterleaved(ports, ready))
out.data := Vec(Seq.fill(numBanks/ports) { words }.flatten)

out
}

val innerData = UInt(0, width = innerBytes*8)
val outerData = UInt(0, width = outerBytes*8)
val W = Bool(true)
val R = Bool(false)

val sinkC_req = req(io.sinkC_adr, W, io.sinkC_dat.data)
val sinkD_req = req(io.sinkD_adr, W, io.sinkD_dat.data)
val sourceC_req = req(io.sourceC_adr, R, outerData)
val sourceD_rreq = req(io.sourceD_radr, R, innerData)
val sourceD_wreq = req(io.sourceD_wadr, W, io.sourceD_wdat.data)

// See the comments above for why this prioritization is used
val reqs = Seq(sinkC_req, sourceC_req, sinkD_req, sourceD_wreq, sourceD_rreq)

// FoldLeft那行,先把sum初始化是0,把sum给req.bankSum,也就是第一个req没有能被挡住的,然后把第一个req的bankSel或上sum给下一个sum,然后再把sum给第二个req.bankSum,这样第一个的bankSel就会把第二个的bankSum拉起来,从而挡住第二个的req。
// Connect priorities; note that even if a request does not go through due to failing
// to obtain a needed subbank, it still blocks overlapping lower priority requests.
reqs.foldLeft(UInt(0)) { case (sum, req) =>
req.bankSum := sum
req.bankSel | sum
}
// regout是读的下下拍的读数据的结果,为什么是下下拍,估摸着时序不够?
// Access the banks
val regout = Vec(cc_banks.zipWithIndex.map { case ((b, omSRAM), i) =>
val en = reqs.map(_.bankEn(i)).reduce(_||_)
val sel = reqs.map(_.bankSel(i))
val wen = PriorityMux(sel, reqs.map(_.wen))
val idx = PriorityMux(sel, reqs.map(_.index))
val data= PriorityMux(sel, reqs.map(_.data(i)))

when (wen && en) { b.write(idx, data) }
RegEnable(b.read(idx, !wen && en), RegNext(!wen && en))
})

val regsel_sourceC = RegNext(RegNext(sourceC_req.bankEn))
val regsel_sourceD = RegNext(RegNext(sourceD_rreq.bankEn))

// grouped:按多长分组,假设x有16个,x.grouped(4) = ((1,2,3,4),(5,6,7,8),(9,a,b,c),(d,e,f,g))
// transpose: 转置
// map:把里面的每个成员都这么干。
val decodeC = regout.zipWithIndex.map {
case (r, i) => Mux(regsel_sourceC(i), r, UInt(0))
}.grouped(outerBytes/params.micro.writeBytes).toList.transpose.map(s => s.reduce(_|_))

io.sourceC_dat.data := Cat(decodeC.reverse)

val decodeD = regout.zipWithIndex.map {
// Intentionally not Mux1H and/or an indexed-mux b/c we want it 0 when !sel to save decode power
case (r, i) => Mux(regsel_sourceD(i), r, UInt(0))
}.grouped(innerBytes/params.micro.writeBytes).toList.transpose.map(s => s.reduce(_|_))

io.sourceD_rdat.data := Cat(decodeD.reverse)

private def banks = cc_banks.map("\"" + _._1.pathName + "\"").mkString(",")
def json: String = s"""{"widthBytes":${params.micro.writeBytes},"mem":[${banks}]}"""
}

directory

Directory用来保存Cache的Tag。

  • 保存Cache的Tag。
  • 提供读写端口,供外部读写。
  • 在reset之后,初始化SRAM

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
class DirectoryEntry(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val dirty = Bool() // true => TRUNK or TIP
// 有四种状态: INVALID BRANCH TRUNK TIP
val state = UInt(width = params.stateBits)
val clients = UInt(width = params.clientBits)
val tag = UInt(width = params.tagBits)
}

class DirectoryWrite(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val set = UInt(width = params.setBits)
val way = UInt(width = params.wayBits)
val data = new DirectoryEntry(params)
}

class DirectoryRead(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val set = UInt(width = params.setBits)
val tag = UInt(width = params.tagBits)
}

class DirectoryResult(params: InclusiveCacheParameters) extends DirectoryEntry(params)
{
val hit = Bool()
val way = UInt(width = params.wayBits)
}

class Directory(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val write = Decoupled(new DirectoryWrite(params)).flip
val read = Valid(new DirectoryRead(params)).flip // sees same-cycle write
val result = Valid(new DirectoryResult(params))
val ready = Bool() // reset complete; can enable access
}

val codeBits = new DirectoryEntry(params).getWidth

val (cc_dir, omSRAM) = DescribedSRAM(
name = "cc_dir",
desc = "Directory RAM",
size = params.cache.sets,
data = Vec(params.cache.ways, UInt(width = codeBits))
)

val write = Queue(io.write, 1) // must inspect contents => max size 1
// a flow Q creates a WaR hazard... this MIGHT not cause a problem
// a pipe Q causes combinational loop through the scheduler

// 复位之后的初始化逻辑
// Wiping the Directory with 0s on reset has ultimate priority
val wipeCount = RegInit(UInt(0, width = params.setBits + 1))
val wipeOff = RegNext(Bool(false), Bool(true)) // don't wipe tags during reset
val wipeDone = wipeCount(params.setBits)
val wipeSet = wipeCount(params.setBits - 1,0)

io.ready := wipeDone
when (!wipeDone && !wipeOff) { wipeCount := wipeCount + UInt(1) }
assert (wipeDone || !io.read.valid)

// Be explicit for dumb 1-port inference
val ren = io.read.valid
val wen = (!wipeDone && !wipeOff) || write.valid
assert (!io.read.valid || wipeDone)

require (codeBits <= 256)

write.ready := !io.read.valid
when (!ren && wen) {
cc_dir.write(
Mux(wipeDone, write.bits.set, wipeSet),
Vec.fill(params.cache.ways) { Mux(wipeDone, write.bits.data.asUInt, UInt(0)) },
UIntToOH(write.bits.way, params.cache.ways).asBools.map(_ || !wipeDone))
}

val ren1 = RegInit(Bool(false))
val ren2 = if (params.micro.dirReg) RegInit(Bool(false)) else ren1
ren2 := ren1
ren1 := ren

// params.dirReg,是个配置,directory要不要打一拍。
val bypass_valid = params.dirReg(write.valid)
val bypass = params.dirReg(write.bits, ren1 && write.valid)
val regout = params.dirReg(cc_dir.read(io.read.bits.set, ren), ren1)
val tag = params.dirReg(RegEnable(io.read.bits.tag, ren), ren1)
val set = params.dirReg(RegEnable(io.read.bits.set, ren), ren1)

// victim: 要evict的way,是在读的时候给出的,如果hit就返回hit的way;如果miss,就随机选一个way把它踢出去,这个逻辑是在MSHR里做的,在这里只要随机选出一个出去就行了,MSHR会判断state从而知道要不要eviction。
// Compute the victim way in case of an evicition
val victimLFSR = LFSR16(params.dirReg(ren))(InclusiveCacheParameters.lfsrBits-1, 0)
val victimSums = Seq.tabulate(params.cache.ways) { i => UInt((1 << InclusiveCacheParameters.lfsrBits)*i / params.cache.ways) }
val victimLTE = Cat(victimSums.map { _ <= victimLFSR }.reverse)
val victimSimp = Cat(UInt(0, width=1), victimLTE(params.cache.ways-1, 1), UInt(1, width=1))
val victimWayOH = victimSimp(params.cache.ways-1,0) & ~(victimSimp >> 1)
val victimWay = OHToUInt(victimWayOH)
assert (!ren2 || victimLTE(0) === UInt(1))
assert (!ren2 || ((victimSimp >> 1) & ~victimSimp) === UInt(0)) // monotone
assert (!ren2 || PopCount(victimWayOH) === UInt(1))

val setQuash = bypass_valid && bypass.set === set
val tagMatch = bypass.data.tag === tag
val wayMatch = bypass.way === victimWay

val ways = Vec(regout.map(d => new DirectoryEntry(params).fromBits(d)))
val hits = Cat(ways.zipWithIndex.map { case (w, i) =>
w.tag === tag && w.state =/= INVALID && (!setQuash || UInt(i) =/= bypass.way)
}.reverse)
val hit = hits.orR()

// bypass有两种,一种是读的命中了写的;另一种是读了,miss了,要踢一条,写的那条命中了要踢的这条,就把写的结果bypass给读的,后面一定会写进去的。
io.result.valid := ren2
io.result.bits := Mux(hit, Mux1H(hits, ways), Mux(setQuash && (tagMatch || wayMatch), bypass.data, Mux1H(victimWayOH, ways)))
io.result.bits.hit := hit || (setQuash && tagMatch && bypass.data.state =/= INVALID)
io.result.bits.way := Mux(hit, OHToUInt(hits), Mux(setQuash && tagMatch, bypass.way, victimWay))

params.ccover(ren2 && setQuash && tagMatch, "DIRECTORY_HIT_BYPASS", "Bypassing write to a directory hit")
params.ccover(ren2 && setQuash && !tagMatch && wayMatch, "DIRECTORY_EVICT_BYPASS", "Bypassing a write to a directory eviction")

def json: String = s"""{"clients":${params.clientBits},"mem":"${cc_dir.pathName}","clean":"${wipeDone.pathName}"}"""
}

requests

是个listBuffer,共3*MSHR个数的队列。每个MSHR有3个队列,分别对应abc,优先级c > b > a。

Queue里面是没有完整的地址的,只记录了tag,因为MSHR都是同set的在排队的。

如果一个mshr被占住,那其他set的就不会被分配到这个mshr了,只会给它相应队列的请求用。

如果有不同的set的请求,会尝试分配新的mshr,如果没有新的mshr,就会被挡在外面了。

MSHR

MSHR里面主要是一个大的状态机,用来处理一个独立的请求。

在理解MSHR之前,首先要理解几个概念。什么是nestB / blockB / nestC / blockC?

1
2
3
4
5
6
7
8
9
blockB: block住下面来的同set的B请求。
nestB: 允许下面来的同set的B请求。在开始还没拿到directory的时候,可能出现既不允许也不block的情况。
如果blockB了,不ready,被挡在sinkB的入口了。
如果nestB了,就可以插队。
如果两个都为0,就进request Queue,放弃这次插队,进了队列就表示不会再插队了,要等没有同set的请求了。
两个都为1,是不可能的,被断言了。

blockC: block住上面来的同set的C请求。
nestC: 允许上面来的同set的C请求。同样可能出现两者都为0的情况,但是不会同时为1。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
class QueuedRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
// prio:指该请求的来源,A=001,B=010,C=100
val prio = Vec(3, Bool()) // A=001, B=010, C=100
// control:control==1 && prio==A 表示是x通道的。优先级是:C > B > X(A & control) > A(control=0)
val control= Bool() // control command
val opcode = UInt(width = 3)
val param = UInt(width = 3)
val size = UInt(width = params.inner.bundle.sizeBits)
val source = UInt(width = params.inner.bundle.sourceBits)
val tag = UInt(width = params.tagBits)
val offset = UInt(width = params.offsetBits)
// put:sinkA/sinkC里由put buffer,表示存在put buffer的哪个位置了。
val put = UInt(width = params.putBits)
}

class FullRequest(params: InclusiveCacheParameters) extends QueuedRequest(params)
{
val set = UInt(width = params.setBits)
}

class AllocateRequest(params: InclusiveCacheParameters) extends FullRequest(params)
{
// repeat:如果下一笔请求和上一笔请求是同地址(同set同tag)的,就没必要再去读directory了,已经再MSHR里有残存的meta了。
val repeat = Bool() // set is the same
}
class ScheduleRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val a = Valid(new SourceARequest(params))
val b = Valid(new SourceBRequest(params))
val c = Valid(new SourceCRequest(params))
val d = Valid(new SourceDRequest(params))
val e = Valid(new SourceERequest(params))
val x = Valid(new SourceXRequest(params))
val dir = Valid(new DirectoryWrite(params))
// reload:MSHR做完事情的前一拍,告诉scheduler可以再分配一个请求,这样MSHR就能连续的处理请求了。因为MSHR也是收到请求的下一拍才会对请求进行处理(打一拍保存在寄存器里)
val reload = Bool() // get next request via allocate (if any)
}

class MSHRStatus(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val set = UInt(width = params.setBits)
val tag = UInt(width = params.tagBits)
val way = UInt(width = params.wayBits)
val blockB = Bool()
val nestB = Bool()
val blockC = Bool()
val nestC = Bool()
}

// nestedwb:收到插队的请求,说明自己被别人插队了,要把自己的directory状态改掉。是由scheduler计算得到的,从bc_mshr和c_mshr写directory的值中计算出来。只有abc加上bc,才是bc插队abc;或者c插队bc;c插队abc这样子。
class NestedWriteback(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
val set = UInt(width = params.setBits)
val tag = UInt(width = params.tagBits)
// b_toN: 下面来的nested probe,可能把我变成N
val b_toN = Bool() // nested Probes may unhit us
// b_toB: 下面来的nested probe,可能把我变成B
val b_toB = Bool() // nested Probes may demote us
// b_clr_dirty: 下面来的nested probe,会把我的dirty清掉
val b_clr_dirty = Bool() // nested Probes clear dirty
// c_set_dirty: 上面来的nested release,会置位我的dirty。
val c_set_dirty = Bool() // nested Releases MAY set dirty
}

sealed trait CacheState
{
val code = UInt(CacheState.index)
CacheState.index = CacheState.index + 1
}

object CacheState
{
var index = 0
}

case object S_INVALID extends CacheState
case object S_BRANCH extends CacheState
case object S_BRANCH_C extends CacheState
case object S_TIP extends CacheState
case object S_TIP_C extends CacheState
case object S_TIP_CD extends CacheState
case object S_TIP_D extends CacheState
case object S_TRUNK_C extends CacheState
case object S_TRUNK_CD extends CacheState

class MSHR(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val allocate = Valid(new AllocateRequest(params)).flip // refills MSHR for next cycle
val directory = Valid(new DirectoryResult(params)).flip // triggers schedule setup
val status = Valid(new MSHRStatus(params))
val schedule = Decoupled(new ScheduleRequest(params))
val sinkc = Valid(new SinkCResponse(params)).flip
// sinkD会发两次过来,first一个,last一个。所以sinkDResponse里面有last信号。
val sinkd = Valid(new SinkDResponse(params)).flip
val sinke = Valid(new SinkEResponse(params)).flip
val nestedwb = new NestedWriteback(params).flip
}

val request_valid = RegInit(Bool(false))
val request = Reg(new FullRequest(params))
val meta_valid = RegInit(Bool(false))
val meta = Reg(new DirectoryResult(params))

// Define which states are valid
when (meta_valid) {
when (meta.state === INVALID) {
assert (!meta.clients.orR)
assert (!meta.dirty)
}
when (meta.state === BRANCH) {
assert (!meta.dirty)
}
when (meta.state === TRUNK) {
assert (meta.clients.orR)
assert ((meta.clients & (meta.clients - UInt(1))) === UInt(0)) // at most one
}
when (meta.state === TIP) {
// noop
}
}

// 状态要全变为true,才表示该entry做完。
// Completed transitions (s_ = scheduled), (w_ = waiting)
// s_rprobe: 发出由release引起的对上面的probe。
val s_rprobe = RegInit(Bool(true)) // B
// w_rprobeackfirst: 正在等待,release 引起的probe的probeack。也就是release的时候发现有数据在L1,就去probe L1,当前正在等待L1的probeack。是最后一个client的第一笔。
val w_rprobeackfirst = RegInit(Bool(true))
// w_rprobeacklast: 最后一个client的最后一笔。
val w_rprobeacklast = RegInit(Bool(true))
// s_release: 发出给下面的release。最后一个client的第一笔就能release了(w_rprobeackfirst)。
val s_release = RegInit(Bool(true)) // CW w_rprobeackfirst
// w_releaseack: 等待由下面发来的releaseack
val w_releaseack = RegInit(Bool(true))
// s_pprobe: 是由acquire引起的probe,对上面的probe。
val s_pprobe = RegInit(Bool(true)) // B
// s_acquire: 处理由上面来的acquire
val s_acquire = RegInit(Bool(true)) // A s_release, s_pprobe [1]
// s_flush: 处理flush请求,安排一个x通道的回应。
val s_flush = RegInit(Bool(true)) // X w_releaseack
// w_grantfirst: 等待由下面来的grant
val w_grantfirst = RegInit(Bool(true))
val w_grantlast = RegInit(Bool(true))
// w_grant: 等待由下面来的grant
val w_grant = RegInit(Bool(true)) // first | last depending on wormhole
// w_pprobeackfirst: 正在等待,L3 probe引起的 L1 probe的probeack。也就是假设下面来了一个probe,发现有数据在上面,就去probe上面,当前正在等待上面回probeack
val w_pprobeackfirst = RegInit(Bool(true))
val w_pprobeacklast = RegInit(Bool(true))
// w_pprobeack: 正在等待,L3 probe引起的 L1 probe的probeack。**和上面两个什么区别??**
val w_pprobeack = RegInit(Bool(true)) // first | last depending on wormhole
// s_probeack: schedule一个probeack,也就是发出probeack。发送给下面probeack。
val s_probeack = RegInit(Bool(true)) // C w_pprobeackfirst (mutually exclusive with next two s_*)
// s_grantack: 发出给上面的grantack
val s_grantack = RegInit(Bool(true)) // E w_grantfirst ... CAN require both outE&inD to service outD
// s_execute: 可以去sourceD走流水了,MSHR该干的干完了。
val s_execute = RegInit(Bool(true)) // D w_pprobeack, w_grant
// w_grantack: 等待由上面来的grantac
val w_grantack = RegInit(Bool(true))
val s_writeback = RegInit(Bool(true)) // W w_*

// [1]: We cannot issue outer Acquire while holding blockB (=> outA can stall)
// However, inB and outC are higher priority than outB, so s_release and s_pprobe
// may be safely issued while blockB. Thus we must NOT try to schedule the
// potentially stuck s_acquire with either of them (scheduler is all or none).

// Meta-data that we discover underway
val sink = Reg(UInt(width = params.outer.bundle.sinkBits))
val gotT = Reg(Bool())
val bad_grant = Reg(Bool())
val probes_done = Reg(UInt(width = params.clientBits))
val probes_toN = Reg(UInt(width = params.clientBits))
val probes_noT = Reg(Bool())

// When a nested transaction completes, update our meta data
when (meta_valid && meta.state =/= INVALID &&
io.nestedwb.set === request.set && io.nestedwb.tag === meta.tag) {
when (io.nestedwb.b_clr_dirty) { meta.dirty := Bool(false) }
when (io.nestedwb.c_set_dirty) { meta.dirty := Bool(true) }
when (io.nestedwb.b_toB) { meta.state := BRANCH }
when (io.nestedwb.b_toN) { meta.hit := Bool(false) }
}

// Scheduler status
io.status.valid := request_valid
io.status.bits.set := request.set
io.status.bits.tag := request.tag
io.status.bits.way := meta.way
io.status.bits.blockB := !meta_valid || ((!w_releaseack || !w_rprobeacklast || !w_pprobeacklast) && !w_grantfirst)
io.status.bits.nestB := meta_valid && w_releaseack && w_rprobeacklast && w_pprobeacklast && !w_grantfirst
// The above rules ensure we will block and not nest an outer probe while still doing our
// own inner probes. Thus every probe wakes exactly one MSHR.
io.status.bits.blockC := !meta_valid
io.status.bits.nestC := meta_valid && (!w_rprobeackfirst || !w_pprobeackfirst || !w_grantfirst)
// The w_grantfirst in nestC is necessary to deal with:
// acquire waiting for grant, inner release gets queued, outer probe -> inner probe -> deadlock
// ... this is possible because the release+probe can be for same set, but different tag

// We can only demand: block, nest, or queue
assert (!io.status.bits.nestB || !io.status.bits.blockB)
assert (!io.status.bits.nestC || !io.status.bits.blockC)

// Scheduler requests
// no_wait: 表示没有等待的了,只有execute和writeback了,这两个可以同时做,应该一拍就结束了。
val no_wait = w_rprobeacklast && w_releaseack && w_grantlast && w_pprobeacklast && w_grantack
// a.valid,等release发出去再发acquire;**为什么要加s_pprobe??**
io.schedule.bits.a.valid := !s_acquire && s_release && s_pprobe
io.schedule.bits.b.valid := !s_rprobe || !s_pprobe
io.schedule.bits.c.valid := (!s_release && w_rprobeackfirst) || (!s_probeack && w_pprobeackfirst)
io.schedule.bits.d.valid := !s_execute && w_pprobeack && w_grant
io.schedule.bits.e.valid := !s_grantack && w_grantfirst
io.schedule.bits.x.valid := !s_flush && w_releaseack
io.schedule.bits.dir.valid := (!s_release && w_rprobeackfirst) || (!s_writeback && no_wait)
io.schedule.bits.reload := no_wait
io.schedule.valid := io.schedule.bits.a.valid || io.schedule.bits.b.valid || io.schedule.bits.c.valid ||
io.schedule.bits.d.valid || io.schedule.bits.e.valid || io.schedule.bits.x.valid ||
io.schedule.bits.dir.valid

// Schedule completions
when (io.schedule.ready) {
// 因为s_rprobe/s_pprobe的优先级是最高的,然后它两又是互斥的,所以,一ready就可以把它两置为true了。
s_rprobe := Bool(true)
when (w_rprobeackfirst) { s_release := Bool(true) }
s_pprobe := Bool(true)
when (s_release && s_pprobe) { s_acquire := Bool(true) }
when (w_releaseack) { s_flush := Bool(true) }
when (w_pprobeackfirst) { s_probeack := Bool(true) }
when (w_grantfirst) { s_grantack := Bool(true) }
when (w_pprobeack && w_grant) { s_execute := Bool(true) }
when (no_wait) { s_writeback := Bool(true) }
// Await the next operation
when (no_wait) {
request_valid := Bool(false)
meta_valid := Bool(false)
}
}

// Resulting meta-data
val final_meta_writeback = Wire(init = meta)

val req_clientBit = params.clientBit(request.source)
val req_needT = needT(request.opcode, request.param)
val req_acquire = request.opcode === AcquireBlock || request.opcode === AcquirePerm
val meta_no_clients = !meta.clients.orR
val req_promoteT = req_acquire && Mux(meta.hit, meta_no_clients && meta.state === TIP, gotT)

when (request.prio(2) && Bool(!params.firstLevel)) { // always a hit
// 如果是releaseData BtoN,会把dirty拉高了,这里可能有问题。
final_meta_writeback.dirty := meta.dirty || request.opcode(0)
final_meta_writeback.state := Mux(request.param =/= TtoT && meta.state === TRUNK, TIP, meta.state)
final_meta_writeback.clients := meta.clients & ~Mux(isToN(request.param), req_clientBit, UInt(0))
final_meta_writeback.hit := Bool(true) // chained requests are hits
} .elsewhen (request.control && Bool(params.control)) { // request.prio(0)
when (meta.hit) {
final_meta_writeback.dirty := Bool(false)
final_meta_writeback.state := INVALID
final_meta_writeback.clients := meta.clients & ~probes_toN
}
final_meta_writeback.hit := Bool(false)
} .otherwise {
final_meta_writeback.dirty := (meta.hit && meta.dirty) || !request.opcode(2)
final_meta_writeback.state := Mux(req_needT,
Mux(req_acquire, TRUNK, TIP),
Mux(!meta.hit, Mux(gotT, Mux(req_acquire, TRUNK, TIP), BRANCH),
MuxLookup(meta.state, UInt(0, width=2), Seq(
INVALID -> BRANCH,
BRANCH -> BRANCH,
TRUNK -> TIP,
TIP -> Mux(meta_no_clients && req_acquire, TRUNK, TIP)))))
final_meta_writeback.clients := Mux(meta.hit, meta.clients & ~probes_toN, UInt(0)) |
Mux(req_acquire, req_clientBit, UInt(0))
final_meta_writeback.tag := request.tag
final_meta_writeback.hit := Bool(true)
}

when (bad_grant) {
// 如果下面给了grant denied。
when (meta.hit) {
// 如果hit了,那就是自己是B,要去请求T权限,这是下面denied了,就把状态保持不变,还是hit,还是dirty=0,还是branch,client把toN的给干掉(这里可能是上面有两个B,要把另一个B干掉,所以要把client改一下,也就是下面的denied不会取消干掉另一个B的动作)。
// upgrade failed (B -> T)
assert (!meta_valid || meta.state === BRANCH)
final_meta_writeback.hit := Bool(true)
final_meta_writeback.dirty := Bool(false)
final_meta_writeback.state := BRANCH
final_meta_writeback.clients := meta.clients & ~probes_toN
} .otherwise {
// 如果miss了,自己就是N,有可能有一个cacheline被踢出去了,所以把该位置填为INVALID。
// failed N -> (T or B)
final_meta_writeback.hit := Bool(false)
final_meta_writeback.dirty := Bool(false)
final_meta_writeback.state := INVALID
final_meta_writeback.clients := UInt(0)
}
}

val invalid = Wire(new DirectoryEntry(params))
invalid.dirty := Bool(false)
invalid.state := INVALID
invalid.clients := UInt(0)
invalid.tag := UInt(0)

// 如果上面请求BtoT,但是它可能被我的probe给打断了,也就是它可能是N了,因此我们要看meta.client是否真的有,这个是最新的状态。如果它没有了,那我就要发NtoT并带数据给它,不然就完蛋了。
// Just because a client says BtoT, by the time we process the request he may be N.
// Therefore, we must consult our own meta-data state to confirm he owns the line still.
val honour_BtoT = meta.hit && (meta.clients & req_clientBit).orR

// 自己发送的请求,有些是要排除掉自己的,比如acquire/get;有些是不要排除掉自己的,比如put。
// The client asking us to act is proof they don't have permissions.
val excluded_client = Mux(meta.hit && request.prio(0) && skipProbeN(request.opcode), req_clientBit, UInt(0))
io.schedule.bits.a.bits.tag := request.tag
io.schedule.bits.a.bits.set := request.set
io.schedule.bits.a.bits.param := Mux(req_needT, Mux(meta.hit, BtoT, NtoT), NtoB)
// block,在sourceA里面决定是发acquire block还是发acquire perm。有两种情况发perm,一种是上面发了perm,另一种是putfullData且size是一条cacheline大小(已经要写完整的cacheline了,下面的数据是什么已经不重要了)。
io.schedule.bits.a.bits.block := request.size =/= UInt(log2Ceil(params.cache.blockBytes)) ||
!(request.opcode === PutFullData || request.opcode === AcquirePerm)
io.schedule.bits.a.bits.source := UInt(0)
io.schedule.bits.b.bits.param := Mux(!s_rprobe, toN, Mux(request.prio(1), request.param, Mux(req_needT, toN, toB)))
io.schedule.bits.b.bits.tag := Mux(!s_rprobe, meta.tag, request.tag)
io.schedule.bits.b.bits.set := request.set
io.schedule.bits.b.bits.clients := meta.clients & ~excluded_client
io.schedule.bits.c.bits.opcode := Mux(meta.dirty, ReleaseData, Release)
io.schedule.bits.c.bits.param := Mux(meta.state === BRANCH, BtoN, TtoN)
io.schedule.bits.c.bits.source := UInt(0)
io.schedule.bits.c.bits.tag := meta.tag
io.schedule.bits.c.bits.set := request.set
io.schedule.bits.c.bits.way := meta.way
io.schedule.bits.c.bits.dirty := meta.dirty
io.schedule.bits.d.bits := request
io.schedule.bits.d.bits.param := Mux(!req_acquire, request.param,
MuxLookup(request.param, Wire(request.param), Seq(
NtoB -> Mux(req_promoteT, NtoT, NtoB),
BtoT -> Mux(honour_BtoT, BtoT, NtoT),
NtoT -> NtoT)))
io.schedule.bits.d.bits.sink := UInt(0)
io.schedule.bits.d.bits.way := meta.way
io.schedule.bits.d.bits.bad := bad_grant
io.schedule.bits.e.bits.sink := sink
io.schedule.bits.x.bits.fail := Bool(false)
io.schedule.bits.dir.bits.set := request.set
io.schedule.bits.dir.bits.way := meta.way
// 会有两次写directory,在release的时候会写一次INVALID(防止插队的请求读到错误的dir),在结束请求的时候写一次。
io.schedule.bits.dir.bits.data := Mux(!s_release, invalid, Wire(new DirectoryEntry(params), init = final_meta_writeback))

// Coverage of state transitions
def cacheState(entry: DirectoryEntry, hit: Bool) = {
val out = Wire(UInt())
val c = entry.clients.orR
val d = entry.dirty
switch (entry.state) {
is (BRANCH) { out := Mux(c, S_BRANCH_C.code, S_BRANCH.code) }
is (TRUNK) { out := Mux(d, S_TRUNK_CD.code, S_TRUNK_C.code) }
is (TIP) { out := Mux(c, Mux(d, S_TIP_CD.code, S_TIP_C.code), Mux(d, S_TIP_D.code, S_TIP.code)) }
is (INVALID) { out := S_INVALID.code }
}
when (!hit) { out := S_INVALID.code }
out
}

val p = !params.lastLevel // can be probed
val c = !params.firstLevel // can be acquired
val m = params.inner.client.clients.exists(!_.supports.probe) // can be written (or read)
val r = params.outer.manager.managers.exists(!_.alwaysGrantsT) // read-only devices exist
val f = params.control // flush control register exists
val cfg = (p, c, m, r, f)
val b = r || p // can reach branch state (via probe downgrade or read-only device)

// The cache must be used for something or we would not be here
require(c || m)

val evict = cacheState(meta, !meta.hit)
val before = cacheState(meta, meta.hit)
val after = cacheState(final_meta_writeback, Bool(true))

def eviction(from: CacheState, cover: Boolean)(implicit sourceInfo: SourceInfo) {
if (cover) {
params.ccover(evict === from.code, s"MSHR_${from}_EVICT", s"State transition from ${from} to evicted ${cfg}")
} else {
assert(!(evict === from.code), s"State transition from ${from} to evicted should be impossible ${cfg}")
}
if (cover && f) {
params.ccover(before === from.code, s"MSHR_${from}_FLUSH", s"State transition from ${from} to flushed ${cfg}")
} else {
assert(!(before === from.code), s"State transition from ${from} to flushed should be impossible ${cfg}")
}
}

def transition(from: CacheState, to: CacheState, cover: Boolean)(implicit sourceInfo: SourceInfo) {
if (cover) {
params.ccover(before === from.code && after === to.code, s"MSHR_${from}_${to}", s"State transition from ${from} to ${to} ${cfg}")
} else {
assert(!(before === from.code && after === to.code), s"State transition from ${from} to ${to} should be impossible ${cfg}")
}
}

when ((!s_release && w_rprobeackfirst) && io.schedule.ready) {
eviction(S_BRANCH, b) // MMIO read to read-only device
eviction(S_BRANCH_C, b && c) // you need children to become C
eviction(S_TIP, true) // MMIO read || clean release can lead to this state
eviction(S_TIP_C, c) // needs two clients || client + mmio || downgrading client
eviction(S_TIP_CD, c) // needs two clients || client + mmio || downgrading client
eviction(S_TIP_D, true) // MMIO write || dirty release lead here
eviction(S_TRUNK_C, c) // acquire for write
eviction(S_TRUNK_CD, c) // dirty release then reacquire
}

when ((!s_writeback && no_wait) && io.schedule.ready) {
transition(S_INVALID, S_BRANCH, b && m) // only MMIO can bring us to BRANCH state
transition(S_INVALID, S_BRANCH_C, b && c) // C state is only possible if there are inner caches
transition(S_INVALID, S_TIP, m) // MMIO read
transition(S_INVALID, S_TIP_C, false) // we would go S_TRUNK_C instead
transition(S_INVALID, S_TIP_CD, false) // acquire does not cause dirty immediately
transition(S_INVALID, S_TIP_D, m) // MMIO write
transition(S_INVALID, S_TRUNK_C, c) // acquire
transition(S_INVALID, S_TRUNK_CD, false) // acquire does not cause dirty immediately

transition(S_BRANCH, S_INVALID, b && p) // probe can do this (flushes run as evictions)
transition(S_BRANCH, S_BRANCH_C, b && c) // acquire
transition(S_BRANCH, S_TIP, b && m) // prefetch write
transition(S_BRANCH, S_TIP_C, false) // we would go S_TRUNK_C instead
transition(S_BRANCH, S_TIP_CD, false) // acquire does not cause dirty immediately
transition(S_BRANCH, S_TIP_D, b && m) // MMIO write
transition(S_BRANCH, S_TRUNK_C, b && c) // acquire
transition(S_BRANCH, S_TRUNK_CD, false) // acquire does not cause dirty immediately

transition(S_BRANCH_C, S_INVALID, b && c && p)
transition(S_BRANCH_C, S_BRANCH, b && c) // clean release (optional)
transition(S_BRANCH_C, S_TIP, b && c && m) // prefetch write
transition(S_BRANCH_C, S_TIP_C, false) // we would go S_TRUNK_C instead
transition(S_BRANCH_C, S_TIP_D, b && c && m) // MMIO write
transition(S_BRANCH_C, S_TIP_CD, false) // going dirty means we must shoot down clients
transition(S_BRANCH_C, S_TRUNK_C, b && c) // acquire
transition(S_BRANCH_C, S_TRUNK_CD, false) // acquire does not cause dirty immediately

transition(S_TIP, S_INVALID, p)
transition(S_TIP, S_BRANCH, p) // losing TIP only possible via probe
transition(S_TIP, S_BRANCH_C, false) // we would go S_TRUNK_C instead
transition(S_TIP, S_TIP_C, false) // we would go S_TRUNK_C instead
transition(S_TIP, S_TIP_D, m) // direct dirty only via MMIO write
transition(S_TIP, S_TIP_CD, false) // acquire does not make us dirty immediately
transition(S_TIP, S_TRUNK_C, c) // acquire
transition(S_TIP, S_TRUNK_CD, false) // acquire does not make us dirty immediately

transition(S_TIP_C, S_INVALID, c && p)
transition(S_TIP_C, S_BRANCH, c && p) // losing TIP only possible via probe
transition(S_TIP_C, S_BRANCH_C, c && p) // losing TIP only possible via probe
transition(S_TIP_C, S_TIP, c) // probed while MMIO read || clean release (optional)
transition(S_TIP_C, S_TIP_D, c && m) // direct dirty only via MMIO write
transition(S_TIP_C, S_TIP_CD, false) // going dirty means we must shoot down clients
transition(S_TIP_C, S_TRUNK_C, c) // acquire
transition(S_TIP_C, S_TRUNK_CD, false) // acquire does not make us immediately dirty

transition(S_TIP_D, S_INVALID, p)
transition(S_TIP_D, S_BRANCH, p) // losing D is only possible via probe
transition(S_TIP_D, S_BRANCH_C, p && c) // probed while acquire shared
transition(S_TIP_D, S_TIP, p) // probed while MMIO read || outer probe.toT (optional)
transition(S_TIP_D, S_TIP_C, false) // we would go S_TRUNK_C instead
transition(S_TIP_D, S_TIP_CD, false) // we would go S_TRUNK_CD instead
transition(S_TIP_D, S_TRUNK_C, p && c) // probed while acquired
transition(S_TIP_D, S_TRUNK_CD, c) // acquire

transition(S_TIP_CD, S_INVALID, c && p)
transition(S_TIP_CD, S_BRANCH, c && p) // losing D is only possible via probe
transition(S_TIP_CD, S_BRANCH_C, c && p) // losing D is only possible via probe
transition(S_TIP_CD, S_TIP, c && p) // probed while MMIO read || outer probe.toT (optional)
transition(S_TIP_CD, S_TIP_C, false) // we would go S_TRUNK_C instead
transition(S_TIP_CD, S_TIP_D, c) // MMIO write || clean release (optional)
transition(S_TIP_CD, S_TRUNK_C, c && p) // probed while acquire
transition(S_TIP_CD, S_TRUNK_CD, c) // acquire

transition(S_TRUNK_C, S_INVALID, c && p)
transition(S_TRUNK_C, S_BRANCH, c && p) // losing TIP only possible via probe
transition(S_TRUNK_C, S_BRANCH_C, c && p) // losing TIP only possible via probe
transition(S_TRUNK_C, S_TIP, c) // MMIO read || clean release (optional)
transition(S_TRUNK_C, S_TIP_C, c) // bounce shared
transition(S_TRUNK_C, S_TIP_D, c) // dirty release
transition(S_TRUNK_C, S_TIP_CD, c) // dirty bounce shared
transition(S_TRUNK_C, S_TRUNK_CD, c) // dirty bounce

transition(S_TRUNK_CD, S_INVALID, c && p)
transition(S_TRUNK_CD, S_BRANCH, c && p) // losing D only possible via probe
transition(S_TRUNK_CD, S_BRANCH_C, c && p) // losing D only possible via probe
transition(S_TRUNK_CD, S_TIP, c && p) // probed while MMIO read || outer probe.toT (optional)
transition(S_TRUNK_CD, S_TIP_C, false) // we would go S_TRUNK_C instead
transition(S_TRUNK_CD, S_TIP_D, c) // dirty release
transition(S_TRUNK_CD, S_TIP_CD, c) // bounce shared
transition(S_TRUNK_CD, S_TRUNK_C, c && p) // probed while acquire
}

// Handle response messages
// clientBit,一个client可以包括多个source,所以这里是看source是否在这个client的range内。
val probe_bit = params.clientBit(io.sinkc.bits.source)
val last_probe = (probes_done | probe_bit) === (meta.clients & ~excluded_client)
val probe_toN = isToN(io.sinkc.bits.param)
// 这里只判了sinkC.valid,因为在sinkC模块里只有probeAck会拉高它,而release不是走这条路,release是会去走allocate,去scheduler,判断是否能插队,如果能就插队,不能就进request queue。
if (!params.firstLevel) when (io.sinkc.valid) {
params.ccover( probe_toN && io.schedule.bits.b.bits.param === toB, "MSHR_PROBE_FULL", "Client downgraded to N when asked only to do B")
params.ccover(!probe_toN && io.schedule.bits.b.bits.param === toB, "MSHR_PROBE_HALF", "Client downgraded to B when asked only to do B")
// Caution: the probe matches us only in set.
// We would never allow an outer probe to nest until both w_[rp]probeack complete, so
// it is safe to just unguardedly update the probe FSM.
probes_done := probes_done | probe_bit
probes_toN := probes_toN | Mux(probe_toN, probe_bit, UInt(0))
probes_noT := probes_noT || io.sinkc.bits.param =/= TtoT
w_rprobeackfirst := w_rprobeackfirst || last_probe
w_rprobeacklast := w_rprobeacklast || (last_probe && io.sinkc.bits.last)
w_pprobeackfirst := w_pprobeackfirst || last_probe
w_pprobeacklast := w_pprobeacklast || (last_probe && io.sinkc.bits.last)
// Allow wormhole routing from sinkC if the first request beat has offset 0
// 为什么要加上request.offset==0?正常的cacheline操作都是offset==0,而那些put/get/amo是有可能从中间开始操作的。如果是offset==0,那在接收到第一笔的时候就能进入下一个状态去走流水了,因为后面的请求不可能越过它进行操作了;而如果offset!=0的时候,就必须要等到last才能往后走了。(见前面bankedStore的noop注释)
val set_pprobeack = last_probe && (io.sinkc.bits.last || request.offset === UInt(0))
w_pprobeack := w_pprobeack || set_pprobeack
params.ccover(!set_pprobeack && w_rprobeackfirst, "MSHR_PROBE_SERIAL", "Sequential routing of probe response data")
params.ccover( set_pprobeack && w_rprobeackfirst, "MSHR_PROBE_WORMHOLE", "Wormhole routing of probe response data")
// However, meta-data updates need to be done more cautiously
when (meta.state =/= INVALID && io.sinkc.bits.tag === meta.tag && io.sinkc.bits.data) { meta.dirty := Bool(true) } // !!!
}
when (io.sinkd.valid) {
when (io.sinkd.bits.opcode === Grant || io.sinkd.bits.opcode === GrantData) {
sink := io.sinkd.bits.sink
w_grantfirst := Bool(true)
w_grantlast := io.sinkd.bits.last
// Record if we need to prevent taking ownership
bad_grant := io.sinkd.bits.denied
// Allow wormhole routing for requests whose first beat has offset 0
w_grant := request.offset === UInt(0) || io.sinkd.bits.last
params.ccover(io.sinkd.bits.opcode === GrantData && request.offset === UInt(0), "MSHR_GRANT_WORMHOLE", "Wormhole routing of grant response data")
params.ccover(io.sinkd.bits.opcode === GrantData && request.offset =/= UInt(0), "MSHR_GRANT_SERIAL", "Sequential routing of grant response data")
gotT := io.sinkd.bits.param === toT
}
.elsewhen (io.sinkd.bits.opcode === ReleaseAck) {
w_releaseack := Bool(true)
}
}
when (io.sinke.valid) {
w_grantack := Bool(true)
}

// Bootstrap new requests
val allocate_as_full = Wire(new FullRequest(params), init = io.allocate.bits)
val new_meta = Mux(io.allocate.valid && io.allocate.bits.repeat, final_meta_writeback, io.directory.bits)
val new_request = Mux(io.allocate.valid, allocate_as_full, request)
val new_needT = needT(new_request.opcode, new_request.param)
val new_clientBit = params.clientBit(new_request.source)
val new_skipProbe = Mux(skipProbeN(new_request.opcode), new_clientBit, UInt(0))

val prior = cacheState(final_meta_writeback, Bool(true))
def bypass(from: CacheState, cover: Boolean)(implicit sourceInfo: SourceInfo) {
if (cover) {
params.ccover(prior === from.code, s"MSHR_${from}_BYPASS", s"State bypass transition from ${from} ${cfg}")
} else {
assert(!(prior === from.code), s"State bypass from ${from} should be impossible ${cfg}")
}
}

when (io.allocate.valid && io.allocate.bits.repeat) {
bypass(S_INVALID, f || p) // Can lose permissions (probe/flush)
bypass(S_BRANCH, b) // MMIO read to read-only device
bypass(S_BRANCH_C, b && c) // you need children to become C
bypass(S_TIP, true) // MMIO read || clean release can lead to this state
bypass(S_TIP_C, c) // needs two clients || client + mmio || downgrading client
bypass(S_TIP_CD, c) // needs two clients || client + mmio || downgrading client
bypass(S_TIP_D, true) // MMIO write || dirty release lead here
bypass(S_TRUNK_C, c) // acquire for write
bypass(S_TRUNK_CD, c) // dirty release then reacquire
}

when (io.allocate.valid) {
assert (!request_valid || (no_wait && io.schedule.fire()))
request_valid := Bool(true)
request := io.allocate.bits
}

// Create execution plan
when (io.directory.valid || (io.allocate.valid && io.allocate.bits.repeat)) {
meta_valid := Bool(true)
meta := new_meta
probes_done := UInt(0)
probes_toN := UInt(0)
probes_noT := Bool(false)
gotT := Bool(false)
bad_grant := Bool(false)

// These should already be either true or turning true
// We clear them here explicitly to simplify the mux tree
s_rprobe := Bool(true)
w_rprobeackfirst := Bool(true)
w_rprobeacklast := Bool(true)
s_release := Bool(true)
w_releaseack := Bool(true)
s_pprobe := Bool(true)
s_acquire := Bool(true)
s_flush := Bool(true)
w_grantfirst := Bool(true)
w_grantlast := Bool(true)
w_grant := Bool(true)
w_pprobeackfirst := Bool(true)
w_pprobeacklast := Bool(true)
w_pprobeack := Bool(true)
s_probeack := Bool(true)
s_grantack := Bool(true)
s_execute := Bool(true)
w_grantack := Bool(true)
s_writeback := Bool(true)

// For C channel requests (ie: Release[Data])
when (new_request.prio(2) && Bool(!params.firstLevel)) {
s_execute := Bool(false)
// Do we need to go dirty?
when (new_request.opcode(0) && !new_meta.dirty) {
s_writeback := Bool(false)
}
// Does our state change?
when (isToB(new_request.param) && new_meta.state === TRUNK) {
s_writeback := Bool(false)
}
// Do our clients change?
when (isToN(new_request.param) && (new_meta.clients & new_clientBit) =/= UInt(0)) {
s_writeback := Bool(false)
}
assert (new_meta.hit)
}
// For X channel requests (ie: flush)
.elsewhen (new_request.control && Bool(params.control)) { // new_request.prio(0)
s_flush := Bool(false)
// Do we need to actually do something?
when (new_meta.hit) {
s_release := Bool(false)
w_releaseack := Bool(false)
// Do we need to shoot-down inner caches?
when (Bool(!params.firstLevel) && (new_meta.clients =/= UInt(0))) {
s_rprobe := Bool(false)
w_rprobeackfirst := Bool(false)
w_rprobeacklast := Bool(false)
}
}
}
// For A channel requests
.otherwise { // new_request.prio(0) && !new_request.control
s_execute := Bool(false)
// Do we need an eviction?
when (!new_meta.hit && new_meta.state =/= INVALID) {
s_release := Bool(false)
w_releaseack := Bool(false)
// Do we need to shoot-down inner caches?
when (Bool(!params.firstLevel) & (new_meta.clients =/= UInt(0))) {
s_rprobe := Bool(false)
w_rprobeackfirst := Bool(false)
w_rprobeacklast := Bool(false)
}
}
// Do we need an acquire?
when (!new_meta.hit || (new_meta.state === BRANCH && new_needT)) {
s_acquire := Bool(false)
w_grantfirst := Bool(false)
w_grantlast := Bool(false)
w_grant := Bool(false)
s_grantack := Bool(false)
s_writeback := Bool(false)
}
// Do we need a probe?
when (Bool(!params.firstLevel) && (new_meta.hit &&
(new_needT || new_meta.state === TRUNK) &&
(new_meta.clients & ~new_skipProbe) =/= UInt(0))) {
s_pprobe := Bool(false)
w_pprobeackfirst := Bool(false)
w_pprobeacklast := Bool(false)
w_pprobeack := Bool(false)
s_writeback := Bool(false)
}
// Do we need a grantack?
when (new_request.opcode === AcquireBlock || new_request.opcode === AcquirePerm) {
w_grantack := Bool(false)
s_writeback := Bool(false)
}
// Becomes dirty?
when (!new_request.opcode(2) && new_meta.hit && !new_meta.dirty) {
s_writeback := Bool(false)
}
}
}
}

scheduler

scheduler主要负责调度请求。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
class Scheduler(params: InclusiveCacheParameters) extends Module
{
val io = new Bundle {
val in = TLBundle(params.inner.bundle).flip
val out = TLBundle(params.outer.bundle)
// Way permissions
val ways = Vec(params.allClients, UInt(width = params.cache.ways)).flip
val divs = Vec(params.allClients, UInt(width = InclusiveCacheParameters.lfsrBits + 1)).flip
// Control port
val req = Decoupled(new SinkXRequest(params)).flip
val resp = Decoupled(new SourceXRequest(params))
}

val sourceA = Module(new SourceA(params))
val sourceB = Module(new SourceB(params))
val sourceC = Module(new SourceC(params))
val sourceD = Module(new SourceD(params))
val sourceE = Module(new SourceE(params))
val sourceX = Module(new SourceX(params))

io.out.a <> sourceA.io.a
io.out.c <> sourceC.io.c
io.out.e <> sourceE.io.e
io.in.b <> sourceB.io.b
io.in.d <> sourceD.io.d
io.resp <> sourceX.io.x

val sinkA = Module(new SinkA(params))
val sinkC = Module(new SinkC(params))
val sinkD = Module(new SinkD(params))
val sinkE = Module(new SinkE(params))
val sinkX = Module(new SinkX(params))

sinkA.io.a <> io.in.a
sinkC.io.c <> io.in.c
sinkE.io.e <> io.in.e
sinkD.io.d <> io.out.d
sinkX.io.x <> io.req

io.out.b.ready := Bool(true) // disconnected

val directory = Module(new Directory(params))
val bankedStore = Module(new BankedStore(params))
// 3*mshrs个队列。a排a的队,c排c的队,b排b的队。
val requests = Module(new ListBuffer(ListBufferParameters(new QueuedRequest(params), 3*params.mshrs, params.secondary, false)))
// 为什么是一个bc_mshr和一个c_mshr?概率比较小,一个就够了,而且都是存插队的。
val mshrs = Seq.fill(params.mshrs) { Module(new MSHR(params)) }
val abc_mshrs = mshrs.init.init
val bc_mshr = mshrs.init.last
val c_mshr = mshrs.last
val nestedwb = Wire(new NestedWriteback(params))

// Deliver messages from Sinks to MSHRs
mshrs.zipWithIndex.foreach { case (m, i) =>
m.io.sinkc.valid := sinkC.io.resp.valid && sinkC.io.resp.bits.set === m.io.status.bits.set
m.io.sinkd.valid := sinkD.io.resp.valid && sinkD.io.resp.bits.source === UInt(i)
m.io.sinke.valid := sinkE.io.resp.valid && sinkE.io.resp.bits.sink === UInt(i)
m.io.sinkc.bits := sinkC.io.resp.bits
m.io.sinkd.bits := sinkD.io.resp.bits
m.io.sinke.bits := sinkE.io.resp.bits
m.io.nestedwb := nestedwb
}

// If the pre-emption BC or C MSHR have a matching set, the normal MSHR must be blocked
val mshr_stall_abc = abc_mshrs.map { m =>
(bc_mshr.io.status.valid && m.io.status.bits.set === bc_mshr.io.status.bits.set) ||
( c_mshr.io.status.valid && m.io.status.bits.set === c_mshr.io.status.bits.set)
}
val mshr_stall_bc =
c_mshr.io.status.valid && bc_mshr.io.status.bits.set === c_mshr.io.status.bits.set
val mshr_stall_c = Bool(false)
val mshr_stall = mshr_stall_abc :+ mshr_stall_bc :+ mshr_stall_c


val stall_abc = (mshr_stall_abc zip abc_mshrs) map { case (s, m) => s && m.io.status.valid }
if (!params.lastLevel || !params.firstLevel)
params.ccover(stall_abc.reduce(_||_), "SCHEDULER_ABC_INTERLOCK", "ABC MSHR interlocked due to pre-emption")
if (!params.lastLevel)
params.ccover(mshr_stall_bc && bc_mshr.io.status.valid, "SCHEDULER_BC_INTERLOCK", "BC MSHR interlocked due to pre-emption")

// Consider scheduling an MSHR only if all the resources it requires are available
val mshr_request = Cat((mshrs zip mshr_stall).map { case (m, s) =>
m.io.schedule.valid && !s &&
(sourceA.io.req.ready || !m.io.schedule.bits.a.valid) &&
(sourceB.io.req.ready || !m.io.schedule.bits.b.valid) &&
(sourceC.io.req.ready || !m.io.schedule.bits.c.valid) &&
(sourceD.io.req.ready || !m.io.schedule.bits.d.valid) &&
(sourceE.io.req.ready || !m.io.schedule.bits.e.valid) &&
(sourceX.io.req.ready || !m.io.schedule.bits.x.valid) &&
(directory.io.write.ready || !m.io.schedule.bits.dir.valid)
}.reverse)

// Round-robin arbitration of MSHRs
val robin_filter = RegInit(UInt(0, width = params.mshrs))
val robin_request = Cat(mshr_request, mshr_request & robin_filter)
val mshr_selectOH2 = ~(leftOR(robin_request) << 1) & robin_request
val mshr_selectOH = mshr_selectOH2(2*params.mshrs-1, params.mshrs) | mshr_selectOH2(params.mshrs-1, 0)
val mshr_select = OHToUInt(mshr_selectOH)
val schedule = Mux1H(mshr_selectOH, mshrs.map(_.io.schedule.bits))
val scheduleTag = Mux1H(mshr_selectOH, mshrs.map(_.io.status.bits.tag))
val scheduleSet = Mux1H(mshr_selectOH, mshrs.map(_.io.status.bits.set))

// When an MSHR wins the schedule, it has lowest priority next time
when (mshr_request.orR()) { robin_filter := ~rightOR(mshr_selectOH) }

// Fill in which MSHR sends the request
schedule.a.bits.source := mshr_select
// c.source为什么填0,因为c通道的probeAck是用地址寻址的,而不是用source寻址。在第77行,接收的时候去到哪个MSRH的。
schedule.c.bits.source := Mux(schedule.c.bits.opcode(1), mshr_select, UInt(0)) // only set for Release[Data] not ProbeAck[Data]
schedule.d.bits.sink := mshr_select

sourceA.io.req := schedule.a
sourceB.io.req := schedule.b
sourceC.io.req := schedule.c
sourceD.io.req := schedule.d
sourceE.io.req := schedule.e
sourceX.io.req := schedule.x
directory.io.write := schedule.dir

// Forward meta-data changes from nested transaction completion
val select_c = mshr_selectOH(params.mshrs-1)
val select_bc = mshr_selectOH(params.mshrs-2)
nestedwb.set := Mux(select_c, c_mshr.io.status.bits.set, bc_mshr.io.status.bits.set)
nestedwb.tag := Mux(select_c, c_mshr.io.status.bits.tag, bc_mshr.io.status.bits.tag)
nestedwb.b_toN := select_bc && bc_mshr.io.schedule.bits.dir.valid && bc_mshr.io.schedule.bits.dir.bits.data.state === MetaData.INVALID
nestedwb.b_toB := select_bc && bc_mshr.io.schedule.bits.dir.valid && bc_mshr.io.schedule.bits.dir.bits.data.state === MetaData.BRANCH
// selec_bc一定是clr_dirty吗?不可能set_dirty?因为nestB只能进bc_mshr,nestC只能进c_mshr。而alloc的b可以进abc/bc,alloc的c能进bc/c,alloc的a能进abc。
nestedwb.b_clr_dirty := select_bc && bc_mshr.io.schedule.bits.dir.valid
nestedwb.c_set_dirty := select_c && c_mshr.io.schedule.bits.dir.valid && c_mshr.io.schedule.bits.dir.bits.data.dirty

// Pick highest priority request
val request = Wire(Decoupled(new FullRequest(params)))
request.valid := directory.io.ready && (sinkA.io.req.valid || sinkX.io.req.valid || sinkC.io.req.valid)
request.bits := Mux(sinkC.io.req.valid, sinkC.io.req.bits,
Mux(sinkX.io.req.valid, sinkX.io.req.bits, sinkA.io.req.bits))
sinkC.io.req.ready := directory.io.ready && request.ready
sinkX.io.req.ready := directory.io.ready && request.ready && !sinkC.io.req.valid
sinkA.io.req.ready := directory.io.ready && request.ready && !sinkC.io.req.valid && !sinkX.io.req.valid

// If no MSHR has been assigned to this set, we need to allocate one
val setMatches = Cat(mshrs.map { m => m.io.status.valid && m.io.status.bits.set === request.bits.set }.reverse)
val alloc = !setMatches.orR() // NOTE: no matches also means no BC or C pre-emption on this set
// If a same-set MSHR says that requests of this type must be blocked (for bounded time), do it
val blockB = Mux1H(setMatches, mshrs.map(_.io.status.bits.blockB)) && request.bits.prio(1)
val blockC = Mux1H(setMatches, mshrs.map(_.io.status.bits.blockC)) && request.bits.prio(2)
// If a same-set MSHR says that requests of this type must be handled out-of-band, use special BC|C MSHR
// ... these special MSHRs interlock the MSHR that said it should be pre-empted.
val nestB = Mux1H(setMatches, mshrs.map(_.io.status.bits.nestB)) && request.bits.prio(1)
val nestC = Mux1H(setMatches, mshrs.map(_.io.status.bits.nestC)) && request.bits.prio(2)
// Prevent priority inversion; we may not queue to MSHRs beyond our level
// 第一个含义是:c的可以进c的,第二个含义是:不是a的可以进bc,第三个含义是:所有的都能进。
val prioFilter = Cat(request.bits.prio(2), !request.bits.prio(0), ~UInt(0, width = params.mshrs-2))
// 把比我优先级高的给排除掉了。不能去比我优先级高的排队。
val lowerMatches = setMatches & prioFilter
// If we match an MSHR <= our priority that neither blocks nor nests us, queue to it.
// 为什么要区分block和queue这两种,都是挡住吧?因为在某些时刻,需要把他挡在接口上,不能进queue,等到mshr状况明确了,再处理他,因为如果进queue了,mshr就没法知道queue里的东西能不能nest了。具体见scenaro2。
val queue = lowerMatches.orR() && !nestB && !nestC && !blockB && !blockC

if (!params.lastLevel) {
params.ccover(request.valid && blockB, "SCHEDULER_BLOCKB", "Interlock B request while resolving set conflict")
params.ccover(request.valid && nestB, "SCHEDULER_NESTB", "Priority escalation from channel B")
}
if (!params.firstLevel) {
params.ccover(request.valid && blockC, "SCHEDULER_BLOCKC", "Interlock C request while resolving set conflict")
params.ccover(request.valid && nestC, "SCHEDULER_NESTC", "Priority escalation from channel C")
}
params.ccover(request.valid && queue, "SCHEDULER_SECONDARY", "Enqueue secondary miss")

// It might happen that lowerMatches has >1 bit if the two special MSHRs are in-use
// We want to Q to the highest matching priority MSHR.
// lowerMatches1,加1一般是one hot的,这里是把有可能两bit的搞成1bit。
val lowerMatches1 =
Mux(lowerMatches(params.mshrs-1), UInt(1 << (params.mshrs-1)),
Mux(lowerMatches(params.mshrs-2), UInt(1 << (params.mshrs-2)),
lowerMatches))

// If this goes to the scheduled MSHR, it may need to be bypassed
// Alternatively, the MSHR may be refilled from a request queued in the ListBuffer
val selected_requests = Cat(mshr_selectOH, mshr_selectOH, mshr_selectOH) & requests.io.valid
// a_pop只是代表a队列有没有东西,不是代表是否pop。
val a_pop = selected_requests((0 + 1) * params.mshrs - 1, 0 * params.mshrs).orR()
val b_pop = selected_requests((1 + 1) * params.mshrs - 1, 1 * params.mshrs).orR()
val c_pop = selected_requests((2 + 1) * params.mshrs - 1, 2 * params.mshrs).orR()
// bypassMatches, 如果是c通道的请求,就判断c通道没有东西的时候才能bypass。
val bypassMatches = (mshr_selectOH & lowerMatches1).orR() &&
Mux(c_pop || request.bits.prio(2), !c_pop, Mux(b_pop || request.bits.prio(1), !b_pop, !a_pop))
val may_pop = a_pop || b_pop || c_pop
// bypass什么意思?request bypass给mshr
val bypass = request.valid && queue && bypassMatches
val will_reload = schedule.reload && (may_pop || bypass)
val will_pop = schedule.reload && may_pop && !bypass

params.ccover(mshr_selectOH.orR && bypass, "SCHEDULER_BYPASS", "Bypass new request directly to conflicting MSHR")
params.ccover(mshr_selectOH.orR && will_reload, "SCHEDULER_RELOAD", "Back-to-back service of two requests")
params.ccover(mshr_selectOH.orR && will_pop, "SCHEDULER_POP", "Service of a secondary miss")

// Repeat the above logic, but without the fan-in
mshrs.zipWithIndex.foreach { case (m, i) =>
val sel = mshr_selectOH(i)
m.io.schedule.ready := sel
val a_pop = requests.io.valid(params.mshrs * 0 + i)
val b_pop = requests.io.valid(params.mshrs * 1 + i)
val c_pop = requests.io.valid(params.mshrs * 2 + i)
val bypassMatches = lowerMatches1(i) &&
Mux(c_pop || request.bits.prio(2), !c_pop, Mux(b_pop || request.bits.prio(1), !b_pop, !a_pop))
val may_pop = a_pop || b_pop || c_pop
val bypass = request.valid && queue && bypassMatches
val will_reload = m.io.schedule.bits.reload && (may_pop || bypass)
m.io.allocate.bits := Mux(bypass, Wire(new QueuedRequest(params), init = request.bits), requests.io.data)
m.io.allocate.bits.set := m.io.status.bits.set
m.io.allocate.bits.repeat := m.io.allocate.bits.tag === m.io.status.bits.tag
// 只有reload的时候才会从队列里拿一个出来,因为上一个同set的还没处理完,他也不能从队列出来。
m.io.allocate.valid := sel && will_reload
}

// Determine which of the queued requests to pop (supposing will_pop)
// 选出a_pop/b_pop/c_pop中优先级高的。
val prio_requests = ~(~requests.io.valid | (requests.io.valid >> params.mshrs) | (requests.io.valid >> 2*params.mshrs))
val pop_index = OHToUInt(Cat(mshr_selectOH, mshr_selectOH, mshr_selectOH) & prio_requests)
requests.io.pop.valid := will_pop
requests.io.pop.bits := pop_index

// 判断队列出来的是否可以repeat,同set的只能进到这个指定的MSHR,如果同tag就可以repeat,就不用去读directory,如果不同tag,就不repeat,就要去读directory。
// Reload from the Directory if the next MSHR operation changes tags
val lb_tag_mismatch = scheduleTag =/= requests.io.data.tag
val mshr_uses_directory_assuming_no_bypass = schedule.reload && may_pop && lb_tag_mismatch
val mshr_uses_directory_for_lb = will_pop && lb_tag_mismatch
val mshr_uses_directory = will_reload && scheduleTag =/= Mux(bypass, request.bits.tag, requests.io.data.tag)

// Is there an MSHR free for this request?
val mshr_validOH = Cat(mshrs.map(_.io.status.valid).reverse)
val mshr_free = (~mshr_validOH & prioFilter).orR()

// Fanout the request to the appropriate handler (if any)
val bypassQueue = schedule.reload && bypassMatches
val request_alloc_cases =
(alloc && !mshr_uses_directory_assuming_no_bypass && mshr_free) ||
(nestB && !mshr_uses_directory_assuming_no_bypass && !bc_mshr.io.status.valid && !c_mshr.io.status.valid) ||
(nestC && !mshr_uses_directory_assuming_no_bypass && !c_mshr.io.status.valid)
request.ready := request_alloc_cases || (queue && (bypassQueue || requests.io.push.ready))
val alloc_uses_directory = request.valid && request_alloc_cases

// directory的读是scheduler发起的,在进mshr的同时发出,mshr只是在等结果。
// When a request goes through, it will need to hit the Directory
directory.io.read.valid := mshr_uses_directory || alloc_uses_directory
directory.io.read.bits.set := Mux(mshr_uses_directory_for_lb, scheduleSet, request.bits.set)
directory.io.read.bits.tag := Mux(mshr_uses_directory_for_lb, requests.io.data.tag, request.bits.tag)

// Enqueue the request if not bypassed directly into an MSHR
requests.io.push.valid := request.valid && queue && !bypassQueue
requests.io.push.bits.data := request.bits
requests.io.push.bits.index := Mux1H(
request.bits.prio, Seq(
OHToUInt(lowerMatches1 << params.mshrs*0),
OHToUInt(lowerMatches1 << params.mshrs*1),
OHToUInt(lowerMatches1 << params.mshrs*2)))

// 只有不valid的才会被正常allocate(也就是269到272),而前面reload的MSHR的valid还没有拉低,所以两个不会同时发生。
// 选择一个mshr,prioFilter把权限不够的给去掉
val mshr_insertOH = ~(leftOR(~mshr_validOH) << 1) & ~mshr_validOH & prioFilter
(mshr_insertOH.asBools zip mshrs) map { case (s, m) =>
// 后面的条件是,有人在抢directory读的接口,所以这边不能alloc。如果这边不能alloc,那request会被挡在接口上,因为request.ready会和这个条件有关,在250行。
when (request.valid && alloc && s && !mshr_uses_directory_assuming_no_bypass) {
m.io.allocate.valid := Bool(true)
m.io.allocate.bits := request.bits
m.io.allocate.bits.repeat := Bool(false)
}
}

when (request.valid && nestB && !bc_mshr.io.status.valid && !c_mshr.io.status.valid && !mshr_uses_directory_assuming_no_bypass) {
bc_mshr.io.allocate.valid := Bool(true)
bc_mshr.io.allocate.bits := request.bits
bc_mshr.io.allocate.bits.repeat := Bool(false)
assert (!request.bits.prio(0))
}
bc_mshr.io.allocate.bits.prio(0) := Bool(false)

when (request.valid && nestC && !c_mshr.io.status.valid && !mshr_uses_directory_assuming_no_bypass) {
c_mshr.io.allocate.valid := Bool(true)
c_mshr.io.allocate.bits := request.bits
c_mshr.io.allocate.bits.repeat := Bool(false)
assert (!request.bits.prio(0))
assert (!request.bits.prio(1))
}
c_mshr.io.allocate.bits.prio(0) := Bool(false)
c_mshr.io.allocate.bits.prio(1) := Bool(false)

// Fanout the result of the Directory lookup
// 在记读directory的结果,应该给谁,之前是谁抢到了directory的读端口。
val dirTarget = Mux(alloc, mshr_insertOH, Mux(nestB, UInt(1 << (params.mshrs-2)), UInt(1 << (params.mshrs-1))))
val directoryFanout = params.dirReg(RegNext(Mux(mshr_uses_directory, mshr_selectOH, Mux(alloc_uses_directory, dirTarget, UInt(0)))))
mshrs.zipWithIndex.foreach { case (m, i) =>
m.io.directory.valid := directoryFanout(i)
m.io.directory.bits := directory.io.result.bits
}

// MSHR response meta-data fetch
// 用sinkC的set来查MSHR的set,找到相应的way。
sinkC.io.way :=
Mux(bc_mshr.io.status.valid && bc_mshr.io.status.bits.set === sinkC.io.set,
bc_mshr.io.status.bits.way,
Mux1H(abc_mshrs.map(m => m.io.status.valid && m.io.status.bits.set === sinkC.io.set),
abc_mshrs.map(_.io.status.bits.way)))
sinkD.io.way := Vec(mshrs.map(_.io.status.bits.way))(sinkD.io.source)
sinkD.io.set := Vec(mshrs.map(_.io.status.bits.set))(sinkD.io.source)

// Beat buffer connections between components
sinkA.io.pb_pop <> sourceD.io.pb_pop
sourceD.io.pb_beat := sinkA.io.pb_beat
sinkC.io.rel_pop <> sourceD.io.rel_pop
sourceD.io.rel_beat := sinkC.io.rel_beat

// BankedStore ports
bankedStore.io.sinkC_adr <> sinkC.io.bs_adr
bankedStore.io.sinkC_dat := sinkC.io.bs_dat
bankedStore.io.sinkD_adr <> sinkD.io.bs_adr
bankedStore.io.sinkD_dat := sinkD.io.bs_dat
bankedStore.io.sourceC_adr <> sourceC.io.bs_adr
bankedStore.io.sourceD_radr <> sourceD.io.bs_radr
bankedStore.io.sourceD_wadr <> sourceD.io.bs_wadr
bankedStore.io.sourceD_wdat := sourceD.io.bs_wdat
sourceC.io.bs_dat := bankedStore.io.sourceC_dat
sourceD.io.bs_rdat := bankedStore.io.sourceD_rdat

// SourceD data hazard interlock
sourceD.io.evict_req := sourceC.io.evict_req
sourceD.io.grant_req := sinkD .io.grant_req
sourceC.io.evict_safe := sourceD.io.evict_safe
sinkD .io.grant_safe := sourceD.io.grant_safe

private def afmt(x: AddressSet) = s"""{"base":${x.base},"mask":${x.mask}}"""
private def addresses = params.inner.manager.managers.flatMap(_.address).map(afmt _).mkString(",")
private def setBits = params.addressMapping.drop(params.offsetBits).take(params.setBits).mkString(",")
private def tagBits = params.addressMapping.drop(params.offsetBits + params.setBits).take(params.tagBits).mkString(",")
private def simple = s""""reset":"${reset.pathName}","tagBits":[${tagBits}],"setBits":[${setBits}],"blockBytes":${params.cache.blockBytes},"ways":${params.cache.ways}"""
def json: String = s"""{"addresses":[${addresses}],${simple},"directory":${directory.json},"subbanks":${bankedStore.json}}"""
}

configs

memCycles: Int // 是从L2到memory的latency是多少,在parameter.scala里面计算的。是假定在这么多cycle内DDR能回我多少个transaction,从而我就需要多少个MSHR来接收,也就是MSHR至少需要多少个才能满足DDR的需求,超过这个值才能让MSHR不变成瓶颈。

parameter里117行。50ns是外面DDR的延时,800MHz是L2C的频率。所以是40个cycle的latency。(如果L2C是1.8G,那大概90个cycle)

scenario

scenario 1

MSHR中如果w_grantfirst就不能nestedB。

scenario 2

假设有一个acquire进到MSHR里面,在meta_data还没读出来的时候(也就是不知道directory状态的时候),会把blockC拉高,这样在scheduler里面会把C通道的请求都挡在接口上,等到blockC拉低,再来判断C通道的请求是不是可以nestC,如果可以就nest,如果不行就进queue。

如果没有这个blockC,而是让C通道的请求直接进queue的话,那有可能出现问题。比如在meta_data没准备好的时候来了个releaseData,发现不能nest,就进queue,等MSHR查出directory之后,releaseData已经进到queue里排队了,不会再来nest了,MSHR就处理acquire,发probe上去,上面回NtoN,从而忽略了releaseData,最新的data就留在了queue里,没拿到,导致问题。

也就是说一个请求的过程中,会出现三种情况:

  • block:把外面的请求挡在接口上,不能进queue,也不能进MSHR,这种最为严格,因为此时状态还不确定,没法做判断,挡住最保险
  • nest:可以被插队,当前状态已经比较明确了,并且可以被别人插队
  • 非block非nest:不能被插队,当前状态也比较明确了,但是当前请求比外面的请求优先级高,就先做当前请求,让外面的请求进queue排队处理。