介绍rocket chip中的inclusive cache。

整体框架

Inclusive Cache中，每个Bank会对应一个scheduler。每个scheduler会有一套独立的处理。

Scheduler的框架如下图所示。包含如下几个部分。其中红线表示内部请求req通道，蓝线表示数据通道。

sinkA：接收上级A通道的请求，转为req发给MSHR处理
sourceB：接收MSHR的命令，发送B通道的请求给上级B通道。
sinkC：接收上级C通道的请求，包括release和probeAck(Data)。如果是release请求就转化为req发给MSHR处理；如果是probeAck(Data)就发送命令给MSHR，数据写入BankedStore。
sourceD：接收MSHR的命令，读写BankedStore，发送resp给上级D通道。
sinkE：接收上级E通道的请求，转发给MSHR
sinkX：接收上级CMO的请求，转为req发给MSHR
sourceA：接收MSHR的命令，发送请求给下级通道
sinkB：因为inclusive cache不支持作为中间级cache，所以没有sinkB
sourceC：接收MSHR的命令，从BankedStore读取数据，发送请求给下级C通道
sinkD：接收下级D通道来的命令和数据，一方面把数据写入BankedStore，另一方面发送命令给MSHR
sourceE：接收MSHR的命令，转发到下级E通道
sourceX：接收MSHR的resp，转发
BankedStore：cache的data部分
Directory：保存directory的结构
Requests：保存请求，里面是个listBuffer，有3MSHR个数的队列，每个队列对应MSHR abc。
MSHR：每个req的处理，里面是个大状态机。
scheduler：内部有不少逻辑，用来处理req的调度

ListBuffer

先来看一下inclusive cache里用到的一个特殊的数据结构，叫ListBuffer，简而言之，它就是个buffer，用来保存数据的，那为什么叫listBuffer呢，来看下下面这张图。

这个ListBuffer里共有8个entry，可以被用来保存数据。而且共有2个queue，每个queue是一个单独的链表。在初始状态，所有的entry都是unused，没被使用，是空的。外面可以申请一个entry，将其挂入queue0中。假设外面对queue0申请了3个entry，对queue1申请了2个entry，就变成了如下图的数据结构。

也就是说，ListBuffer可以有entries个空间用来保存数据，有queues个队列用来排队，所有的queue共用entries个的存储空间，而每个queue又是独立排队的。

case class ListBufferParameters[T <: Data](gen: T, queues: Int, entries: Int, bypass: Boolean)
{
  val queueBits = log2Up(queues)
  val entryBits = log2Up(entries)
}

class ListBufferPush[T <: Data](params: ListBufferParameters[T]) extends GenericParameterizedBundle(params)
{
  val index = UInt(width = params.queueBits)
  val data  = params.gen.asOutput
}

这样就能理解以上代码中：

T ：表示一个数据结构，也就是要保存的数据结构，作为参数传递进来
queues：表示里面有多少个队列，需要独立排队的
entries：表示里面有多少个entries，用来保存数据的。
index：表示要push到那个queue里面去排队

class ListBuffer[T <: Data](params: ListBufferParameters[T]) extends Module
{
  val io = new Bundle {
    // 输入，要push进来的数据
    // ready，如果里面没有entries了，就把ready拉低
    // push is visible on the same cycle; flow queues
    val push  = Decoupled(new ListBufferPush(params)).flip
    // 表示哪些队列里有数据
    val valid = UInt(width = params.queues)
    // 将某个队列pop出去的请求
    val pop   = Valid(UInt(width = params.queueBits)).flip
    // pop出去的数据，在pop请求当拍有效，因为是寄存器保存的，数据已经放在head保存了，只是选择出来而已
    val data  = params.gen.asOutput
  }

val valid = RegInit(UInt(0, width=params.queues))
val head  = Mem(params.queues, UInt(width = params.entryBits))
val tail  = Mem(params.queues, UInt(width = params.entryBits))
val used  = RegInit(UInt(0, width=params.entries))
val next  = Mem(params.entries, UInt(width = params.entryBits))
val data  = Mem(params.entries, params.gen)

上面代码构建了ListBuffer的主要数据结构，分别是：

valid：各个队列是否有数据
head：各个队列的头指针，里面保存的是该队列第一个数据的序号
tail：各个队列的尾指针，里面保存的是该队列最后一个数据的序号
used：所有entry是否被使用的标识。
data：所有entry保存的数据内容。
next：最难理解的是这个结构，它是所有entry指向下一个指针的链表。

以上面的图作为例子，假设有2个队列，8个entry，展示下各数据结构的内容。

从上图可以看出，由head就能拿到queue0的第一个数据的index，再用这个index查找next，得到1，也就是0的下一个entry是1，再去next查1的下一个，得到2，然后发现2已经和tail里的值相等了，说明已经是尾巴了，从而完成链表的索引，得到了上面图的链表结构。

1 2	val freeOH = ~(leftOR(~used) << 1) & ~used val freeIdx = OHToUInt(freeOH)

leftOR: 从低往高找，遇到1就停，把剩下的高位全置一。比如：

b0101 -> b1111
b1010 -> b1110
b1000 -> b1000

OHToUINT: return the bit position of the sole high bit of the input bitvector。assume exactly one high bit. results undefined otherwise.

b0100 -> 2.U

上面这两句举例来说明。

used	~used	leftOR(~used)	leftOR(~used)<<1	~(leftOR(~used)<<1)	~(leftOR(~used)<<1) & ~used	OHToUInt(freeOH)
0101	1010	1110	1100	0011	0010	1
1010	0101	1111	1110	0001	0001	0

可以看出，freeIdx就是拿到used中最低位为0的位索引，也就是拿到一个可用的位置。

  val valid_set = Wire(init = UInt(0, width=params.queues))
  val valid_clr = Wire(init = UInt(0, width=params.queues))
  val used_set  = Wire(init = UInt(0, width=params.entries))
  val used_clr  = Wire(init = UInt(0, width=params.entries))

  val push_tail = tail.read(io.push.bits.index)
  val push_valid = valid(io.push.bits.index)
  
  // 输出的ready是只要used里面有0，就为1；如果used全1了，ready就为0。也就是只要有空闲位置就返回1。
  io.push.ready := !used.andR()
  // fire 函数也就是ready和valid相与
  when (io.push.fire()) {
    // valid_set是 1<< io.push.bits.index，也就是新push进来的bitmap，假设index是1，那valid_set是15'b10
    valid_set := UIntToOH(io.push.bits.index, params.queues)
    // used_set是 freeOH，本次用掉了哪一bit。
    used_set := freeOH
    // 将io.push.bits.data放入data里面的freeIdx位置
    data.write(freeIdx, io.push.bits.data)
    when (push_valid) {
      // 如果valid已经是高了，就把freeIdx写入next的push_tail位置
      next.write(push_tail, freeIdx)
    } .otherwise {
      // 如果valid不是高，就把freeIdx写入head的io.push.bits.index位置
      head.write(io.push.bits.index, freeIdx)
    }
    // 将freeIdx写入tail的io.push.bits.index位置。tail放的是push.index到used_index的关系。
    tail.write(io.push.bits.index, freeIdx)
  }

  val pop_head = head.read(io.pop.bits)
  val pop_valid = valid(io.pop.bits)

  // params.bypass=0。io.data直接取data得pop_head位置，io.valid直接等于valid。
  // 可以看出，head里面存了第一个由index到Idx的对应关系，由pop.bits查head拿到Idx，再由Idx查data拿到输出的data。
  // Bypass push data to the peek port
  io.data := (if (!params.bypass) data.read(pop_head) else Mux(!pop_valid, io.push.bits.data, data.read(pop_head)))
  io.valid := (if (!params.bypass) valid else (valid | valid_set))

  // 检查，pop了一个不valid的请求，就报错。
  // It is an error to pop something that is not valid
  assert (!io.pop.fire() || (io.valid)(io.pop.bits))

  when (io.pop.fire()) {
    used_clr := UIntToOH(pop_head, params.entries)
    when (pop_head === tail.read(io.pop.bits)) {
      valid_clr := UIntToOH(io.pop.bits, params.queues)
    }
    head.write(io.pop.bits, Mux(io.push.fire() && push_valid && push_tail === pop_head, freeIdx, next.read(pop_head)))
  }

  // params.bypass=0，所以used就是(used & ~used_clr) | used_set，就是计算当前used
  // Empty bypass changes no state
  when (Bool(!params.bypass) || !io.pop.valid || pop_valid) {
    used  := (used  & ~used_clr)  | used_set
    valid := (valid & ~valid_clr) | valid_set
  }
}

最后再来看看里面的数据结构，head、tail、next是怎么组织起来的。举例说明。

初始used为0，那最低可用的就是freeIdx为0，valid也为全0
收到一个push请求，index是1，source是40
下一拍。70行data[0]的值被写为40。因为valid[1]为0，所以push_valid为0，走74行分支，head[1]的值被写为0。76行tail[1]的值为写为0。valid[1]的值被写为1，表示index为1已有一个请求了。
又收到一个push请求，index还是1，source是80
下一拍。70行data[0]的值被写为40。因为valid[1]为1了，所以push_valid为1，走72行分支，next[0]的值为写为1，其中的0是由tail[push_index(1)]读出来的，也就是图中线a；其中的1是当前freeIdx；可以看出next其实是保存了上一个freeIdx到下一个freeIdx的关系。76行tail[1]的值为写为1。
又收到一个请求，处理和上面类似。
收到一个pop请求，index为1。直接从head[1]里面取到used的index，也就是0，再从used[0]里面拿到40，输出的数据就拿到了。
此时还需要更新head的值，根据used的index 0（线b），从next[0]里面拿出下一个used的index（线c），也就是1，再把这个1写入head[pop_index(1)]里面。这样就完成了数据结构处理。
当再收到一个pop请求时，同样，从head[1]里面就能取到used[1]的值80了。

总结一下三个数据结构。

head是根据push/pop index来作为地址索引的，head里面的内容是used的地址，根据这个地址作为索引，可以拿到保存的数据。

tail是根据push/pop index来作为地址索引的，tail里面的内容是最后一个used的地址，根据这个可以判断是否该index的请求都处理完了。

next是根据上一个used地址来作为地址索引的，next里面的内容是下一个used的地址，每pop出一个请求，需要将下一个used的地址写入head，以便于下一次pop请求。

也就是说，该模块里面可以根据index来保存一个链表来保存数据。所有的数据共享一个内存data。head、tail、next保存了index到data索引之间的关系。

sinkA

sinkA里面就做了下面几个事情：

接收上面A通道来的请求，转化为req发给MSHR
如果有数据，就存入putBuffer里
接收sourceD来的读数据请求，从putBuffer里把数据给sourceD

// sinkA里putBuffer的entry内容
class PutBufferAEntry(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val data = UInt(width = params.inner.bundle.dataBits)
  val mask = UInt(width = params.inner.bundle.dataBits/8)
  val corrupt = Bool()
}

class PutBufferPop(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  // putBuffer的queue index
  val index = UInt(width = params.putBits)
  // 是不是最后一个，由sourceD计算得到的，对于sinkA来说是输入
  val last = Bool()
}

class SinkA(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    // 输出给MSHR的req，输出
    val req = Decoupled(new FullRequest(params))
    // 上面来的a通道请求，输入
    val a = Decoupled(new TLBundleA(params.inner.bundle)).flip
    // for use by SourceD:
    // sourceD过来的读请求，里面有要读的queue index，输入。
    val pb_pop  = Decoupled(new PutBufferPop(params)).flip
    // 输出，给sourceD的数据，当拍有效。会由sourceD写入bankedStore
    val pb_beat = new PutBufferAEntry(params)
  }

  // No restrictions on the type of buffer
  val a = params.micro.innerBuf.a(io.a)

  // putBuffer也是个listBuffer，奇怪的是为什么在外面又维护了个list，其实和里面应该一样。
  val putbuffer = Module(new ListBuffer(ListBufferParameters(new PutBufferAEntry(params), params.putLists, params.putBeats, false)))
  val lists = RegInit(UInt(0, width = params.putLists))

  val lists_set = Wire(init = UInt(0, width = params.putLists))
  val lists_clr = Wire(init = UInt(0, width = params.putLists))
  lists := (lists | lists_set) & ~lists_clr

  val free = !lists.andR()
  // 取反的findFirstOne
  val freeOH = ~(leftOR(~lists) << 1) & ~lists
  val freeIdx = OHToUInt(freeOH)

  val first = params.inner.first(a)
  val hasData = params.inner.hasData(a.bits)

  // We need to split the A input to three places:
  //   If it is the first beat, it must go to req
  //   If it has Data, it must go to the putbuffer
  //   If it has Data AND is the first beat, it must claim a list

  val req_block = first && !io.req.ready
  val buf_block = hasData && !putbuffer.io.push.ready
  val set_block = hasData && first && !free

  params.ccover(a.valid && req_block, "SINKA_REQ_STALL", "No MSHR available to sink request")
  params.ccover(a.valid && buf_block, "SINKA_BUF_STALL", "No space in putbuffer for beat")
  params.ccover(a.valid && set_block, "SINKA_SET_STALL", "No space in putbuffer for request")

  a.ready := !req_block && !buf_block && !set_block
  io.req.valid := a.valid && first && !buf_block && !set_block
  // 只有hasData才会进putbuffer,只有put才会进。
  putbuffer.io.push.valid := a.valid && hasData && !req_block && !set_block
  when (a.valid && first && hasData && !req_block && !buf_block) { lists_set := freeOH }

  val (tag, set, offset) = params.parseAddress(a.bits.address)
  // 由于是burst，也就是所有burst的数据被排成了一个队，共用一个put（也就是index），这里暂存了下put，因为burst不会interleave，所以不会有问题。
  val put = Mux(first, freeIdx, RegEnable(freeIdx, first))

  io.req.bits.prio   := Vec(UInt(1, width=3).asBools)
  io.req.bits.control:= Bool(false)
  io.req.bits.opcode := a.bits.opcode
  io.req.bits.param  := a.bits.param
  io.req.bits.size   := a.bits.size
  io.req.bits.source := a.bits.source
  io.req.bits.offset := offset
  io.req.bits.set    := set
  io.req.bits.tag    := tag
  io.req.bits.put    := put

  putbuffer.io.push.bits.index := put
  putbuffer.io.push.bits.data.data    := a.bits.data
  putbuffer.io.push.bits.data.mask    := a.bits.mask
  putbuffer.io.push.bits.data.corrupt := a.bits.corrupt

  // Grant access to pop the data
  putbuffer.io.pop.bits := io.pb_pop.bits.index
  putbuffer.io.pop.valid := io.pb_pop.fire()
  io.pb_pop.ready := putbuffer.io.valid(io.pb_pop.bits.index)
  io.pb_beat := putbuffer.io.data

  when (io.pb_pop.fire() && io.pb_pop.bits.last) {
    lists_clr := UIntToOH(io.pb_pop.bits.index, params.putLists)
  }
}

sourceB

sourceB干的事情比较简单，只需要把从MSHR来的请求，转化为向上级B通道的probe就行了。

唯一的数据结构就是remain这个寄存器，用来保存本次probe需要probe哪几个client，在req来的时候置上remain，每发出一个probe就把其中的那bit清零。

class SourceBRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val param   = UInt(width = 3)
  val tag     = UInt(width = params.tagBits)
  val set     = UInt(width = params.setBits)
  val clients = UInt(width = params.clientBits)
}

class SourceB(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val req = Decoupled(new SourceBRequest(params)).flip
    val b = Decoupled(new TLBundleB(params.inner.bundle))
  }

  if (params.firstLevel) {
    // Tie off unused ports
    io.req.ready := Bool(true)
    io.b.valid := Bool(false)
  } else {
    // 用来保存，需要probe的client
    val remain = RegInit(UInt(0, width=params.clientBits))
    val remain_set = Wire(init = UInt(0, width=params.clientBits))
    val remain_clr = Wire(init = UInt(0, width=params.clientBits))
    remain := (remain | remain_set) & ~remain_clr

    // 只要本次probe还没结束，就不能接收新的req
    val busy = remain.orR()
    val todo = Mux(busy, remain, io.req.bits.clients)
    // 找到下一个待发送的client
    val next = ~(leftOR(todo) << 1) & todo

    if (params.clientBits > 1) {
      params.ccover(PopCount(remain) > UInt(1), "SOURCEB_MULTI_PROBE", "Had to probe more than one client")
    }

    assert (!io.req.valid || io.req.bits.clients =/= UInt(0))

    io.req.ready := !busy
    when (io.req.fire()) { remain_set := io.req.bits.clients }

    // No restrictions on the type of buffer used here
    val b = Wire(io.b)
    io.b <> params.micro.innerBuf.b(b)

    // 会连续发送probe出去
    b.valid := busy || io.req.valid
    // 只有probe被接收，才会切换将next指向的当前remain清除，然后指向下一个
    when (b.fire()) { remain_clr := next }
    params.ccover(b.valid && !b.ready, "SOURCEB_STALL", "Backpressured when issuing a probe")

    val tag = Mux(!busy, io.req.bits.tag, RegEnable(io.req.bits.tag, io.req.fire()))
    val set = Mux(!busy, io.req.bits.set, RegEnable(io.req.bits.set, io.req.fire()))
    val param = Mux(!busy, io.req.bits.param, RegEnable(io.req.bits.param, io.req.fire()))

    b.bits.opcode  := TLMessages.Probe
    b.bits.param   := param
    b.bits.size    := UInt(params.offsetBits)
    b.bits.source  := params.clientSource(next)
    b.bits.address := params.expandAddress(tag, set, UInt(0))
    b.bits.mask    := ~UInt(0, width = params.inner.manager.beatBytes)
    b.bits.data    := UInt(0)
  }
}

sinkC

sinkC会根据上面来的C通道请求的类型来走不同分支。如果是probeAck(Data)就走左侧处理流程，如果是release(Data)就走右侧处理流程。

先看左侧probeAck(Data)处理流程。

将resp发送给MSHRs
去MSHRs通过set查询出要写入的way
如果有数据，将数据写入bankedStore。不会进putBuffer了，因为它已经是最老的了，已经分配过MSHR了，不需要排队了。

再看右侧release(Data)处理流程。

发送请求给scheduler
release的数据要进putBuffer，因为它要去排队
接收sourceD的读取数据的请求，把数据发送给sourceD，让它写入bankedStore

class SinkC(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val req = Decoupled(new FullRequest(params)) // Release
    val resp = Valid(new SinkCResponse(params)) // ProbeAck
    val c = Decoupled(new TLBundleC(params.inner.bundle)).flip
    // Find 'way' via MSHR CAM lookup
    val set = UInt(width = params.setBits)
    val way = UInt(width = params.wayBits).flip
    // ProbeAck write-back
    val bs_adr = Decoupled(new BankedStoreInnerAddress(params))
    val bs_dat = new BankedStoreInnerPoison(params)
    // SourceD sideband
    val rel_pop  = Decoupled(new PutBufferPop(params)).flip
    val rel_beat = new PutBufferCEntry(params)
  }

  if (params.firstLevel) {
    // Tie off unused ports
    io.req.valid := Bool(false)
    io.resp.valid := Bool(false)
    io.c.ready := Bool(true)
    io.set := UInt(0)
    io.bs_adr.valid := Bool(false)
    io.rel_pop.ready := Bool(true)
  } else {
    // No restrictions on the type of buffer
    val c = params.micro.innerBuf.c(io.c)

    val (tag, set, offset) = params.parseAddress(c.bits.address)
    val (first, last, _, beat) = params.inner.count(c)
    val hasData = params.inner.hasData(c.bits)
    val raw_resp = c.bits.opcode === TLMessages.ProbeAck || c.bits.opcode === TLMessages.ProbeAckData
    val resp = Mux(c.valid, raw_resp, RegEnable(raw_resp, c.valid))

    // Handling of C is broken into two cases:
    //   ProbeAck
    //     if hasData, must be written to BankedStore
    //     if last beat, trigger resp
    //   Release
    //     if first beat, trigger req
    //     if hasData, go to putBuffer
    //     if hasData && first beat, must claim a list

    assert (!(c.valid && c.bits.corrupt), "Data poisoning unavailable")

    io.set := Mux(c.valid, set, RegEnable(set, c.valid)) // finds us the way

    // 很遥远的地方来的数据，用个Queue打一拍，再送往bankedStore
    // Cut path from inner C to the BankedStore SRAM setup
    //   ... this makes it easier to layout the L2 data banks far away
    val bs_adr = Wire(io.bs_adr)
    io.bs_adr <> Queue(bs_adr, 1, pipe=true)
    io.bs_dat.data   := RegEnable(c.bits.data,    bs_adr.fire())
    bs_adr.valid     := resp && (!first || (c.valid && hasData))
    // noop，占位的意思。因为在burst来了之后，bs_adr.valid会持续拉高，这个时候是不能真的写ram的，所以noop取c.valid的反，标识不能取写sram。
    bs_adr.bits.noop := !c.valid
    bs_adr.bits.way  := io.way
    bs_adr.bits.set  := io.set
    bs_adr.bits.beat := Mux(c.valid, beat, RegEnable(beat + bs_adr.ready.asUInt, c.valid))
    bs_adr.bits.mask := ~UInt(0, width = params.innerMaskBits)
    params.ccover(bs_adr.valid && !bs_adr.ready, "SINKC_SRAM_STALL", "Data SRAM busy")

    io.resp.valid := resp && c.valid && (first || last) && (!hasData || bs_adr.ready)
    io.resp.bits.last   := last
    io.resp.bits.set    := set
    io.resp.bits.tag    := tag
    io.resp.bits.source := c.bits.source
    io.resp.bits.param  := c.bits.param
    io.resp.bits.data   := hasData

    val putbuffer = Module(new ListBuffer(ListBufferParameters(new PutBufferCEntry(params), params.relLists, params.relBeats, false)))
    val lists = RegInit(UInt(0, width = params.relLists))

    val lists_set = Wire(init = UInt(0, width = params.relLists))
    val lists_clr = Wire(init = UInt(0, width = params.relLists))
    lists := (lists | lists_set) & ~lists_clr

    val free = !lists.andR()
    val freeOH = ~(leftOR(~lists) << 1) & ~lists
    val freeIdx = OHToUInt(freeOH)

    val req_block = first && !io.req.ready
    val buf_block = hasData && !putbuffer.io.push.ready
    val set_block = hasData && first && !free

    params.ccover(c.valid && !raw_resp && req_block, "SINKC_REQ_STALL", "No MSHR available to sink request")
    params.ccover(c.valid && !raw_resp && buf_block, "SINKC_BUF_STALL", "No space in putbuffer for beat")
    params.ccover(c.valid && !raw_resp && set_block, "SINKC_SET_STALL", "No space in putbuffer for request")

    c.ready := Mux(raw_resp, !hasData || bs_adr.ready, !req_block && !buf_block && !set_block)

    io.req.valid := !resp && c.valid && first && !buf_block && !set_block
    putbuffer.io.push.valid := !resp && c.valid && hasData && !req_block && !set_block
    when (!resp && c.valid && first && hasData && !req_block && !buf_block) { lists_set := freeOH }

    val put = Mux(first, freeIdx, RegEnable(freeIdx, first))

    io.req.bits.prio   := Vec(UInt(4, width=3).asBools)
    io.req.bits.control:= Bool(false)
    io.req.bits.opcode := c.bits.opcode
    io.req.bits.param  := c.bits.param
    io.req.bits.size   := c.bits.size
    io.req.bits.source := c.bits.source
    io.req.bits.offset := offset
    io.req.bits.set    := set
    io.req.bits.tag    := tag
    io.req.bits.put    := put

    putbuffer.io.push.bits.index := put
    putbuffer.io.push.bits.data.data    := c.bits.data
    putbuffer.io.push.bits.data.corrupt := c.bits.corrupt

    // Grant access to pop the data
    putbuffer.io.pop.bits := io.rel_pop.bits.index
    putbuffer.io.pop.valid := io.rel_pop.fire()
    io.rel_pop.ready := putbuffer.io.valid(io.rel_pop.bits.index)
    io.rel_beat := putbuffer.io.data

    when (io.rel_pop.fire() && io.rel_pop.bits.last) {
      lists_clr := UIntToOH(io.rel_pop.bits.index, params.relLists)
    }
  }
}

sourceD

sourceD干的事情如下。

接收MSHR来的请求。
维护一个流水，共7级
发读请求给bankedStore
发读请求给sinkA/sinkC
回应上级D通道
发送写请求给bankedStore
留三级stage给bypass用
响应sourceC / sinkD 过来的查询请求

class SourceD(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val req = Decoupled(new SourceDRequest(params)).flip
    val d = Decoupled(new TLBundleD(params.inner.bundle))
    // Put data from SinkA
    val pb_pop = Decoupled(new PutBufferPop(params))
    val pb_beat = new PutBufferAEntry(params).flip
    // Release data from SinkC
    val rel_pop  = Decoupled(new PutBufferPop(params))
    val rel_beat = new PutBufferCEntry(params).flip
    // Access to the BankedStore
    val bs_radr = Decoupled(new BankedStoreInnerAddress(params))
    val bs_rdat = new BankedStoreInnerDecoded(params).flip
    val bs_wadr = Decoupled(new BankedStoreInnerAddress(params))
    val bs_wdat = new BankedStoreInnerPoison(params)
    // Is it safe to evict/replace this way?
    val evict_req  = new SourceDHazard(params).flip
    val evict_safe = Bool()
    val grant_req  = new SourceDHazard(params).flip
    val grant_safe = Bool()
  }

  val beatBytes = params.inner.manager.beatBytes
  val writeBytes = params.micro.writeBytes

  val s1_valid = Wire(Bool())
  val s2_valid = Wire(Bool())
  val s3_valid = Wire(Bool())
  val s2_ready = Wire(Bool())
  val s3_ready = Wire(Bool())
  val s4_ready = Wire(Bool())

  ////////////////////////////////////// STAGE 1 //////////////////////////////////////
  // Reform the request beats

  val busy = RegInit(Bool(false))
  // s1_block_r：上一拍的SRAM读出来，bankedSTtore fire了，但是s2_ready没拉高，所以该请求被挡在了s1，但是因为已经读过bandedStore了，所以不需要再读了，就通过这个信号把读bankedStore的valid给拉低。
  val s1_block_r = RegInit(Bool(false))
  // s1_counter：burst的计数器
  val s1_counter = RegInit(UInt(0, width = params.innerBeatBits))
  val s1_req_reg = RegEnable(io.req.bits, !busy && io.req.valid)
  val s1_req = Mux(!busy, io.req.bits, s1_req_reg)
  // s1_x_bypass：s2/3/4是否要bypass给s1
  val s1_x_bypass = Wire(UInt(width = beatBytes/writeBytes)) // might go from high=>low during stall
  // s1_latch_bypass：流水线在流动？**TODO**
  val s1_latch_bypass = RegNext(!(busy || io.req.valid) || s2_ready)
  // s1_bypass：update unless，流水线流动了就用实时算出来的，不然就用寄存器里的
  val s1_bypass = Mux(s1_latch_bypass, s1_x_bypass, RegEnable(s1_x_bypass, s1_latch_bypass))
  // s1_mask：这个是给bankedStore的，如果是bypass就不需要去读了
  val s1_mask = MaskGen(s1_req.offset, s1_req.size, beatBytes, writeBytes) & ~s1_bypass
  val s1_grant = (s1_req.opcode === AcquireBlock && s1_req.param === BtoT) || s1_req.opcode === AcquirePerm
  // s1_need_r：是否需要去读bankedStore。
  val s1_need_r = s1_mask.orR && s1_req.prio(0) && s1_req.opcode =/= Hint && !s1_grant &&
                  (s1_req.opcode =/= PutFullData || s1_req.size < UInt(log2Ceil(writeBytes)))
  // s1_valid_r：去读bankedStore的有效
  val s1_valid_r = (busy || io.req.valid) && s1_need_r && !s1_block_r
  // s1_need_pb: 是否需要putBuffer的数据，可能是sinkA，也可能是sinkC来的。
  val s1_need_pb = Mux(s1_req.prio(0), !s1_req.opcode(2), s1_req.opcode(0)) // hasData
  // s1_single：一拍就结束的请求，没有数据
  val s1_single = Mux(s1_req.prio(0), s1_req.opcode === Hint || s1_grant, s1_req.opcode === Release)
  // s1_retires：把数据保存在s3后面几拍，用来做bypass
  val s1_retires = !s1_single // retire all operations with data in s3 for bypass (saves energy)
  // Alternatively: val s1_retires = s1_need_pb // retire only updates for bypass (less backpressure from WB)
  // s1_beats1：最后一个burst的序号，如果4个burst，那就是3。
  val s1_beats1 = Mux(s1_single, UInt(0), UIntToOH1(s1_req.size, log2Up(params.cache.blockBytes)) >> log2Ceil(beatBytes))
  // s1_beat：第几个burst，前面offset是说可能从中间读起。
  val s1_beat = (s1_req.offset >> log2Ceil(beatBytes)) | s1_counter
  val s1_last = s1_counter === s1_beats1
  val s1_first = s1_counter === UInt(0)

  params.ccover(s1_block_r, "SOURCED_1_SRAM_HOLD", "SRAM read-out successful, but stalled by stage 2")
  params.ccover(!s1_latch_bypass, "SOURCED_1_BYPASS_HOLD", "Bypass match successful, but stalled by stage 2")
  params.ccover((busy || io.req.valid) && !s1_need_r, "SOURCED_1_NO_MODIFY", "Transaction servicable without SRAM")

  io.bs_radr.valid     := s1_valid_r
  io.bs_radr.bits.noop := Bool(false)
  io.bs_radr.bits.way  := s1_req.way
  io.bs_radr.bits.set  := s1_req.set
  io.bs_radr.bits.beat := s1_beat
  io.bs_radr.bits.mask := s1_mask

  params.ccover(io.bs_radr.valid && !io.bs_radr.ready, "SOURCED_1_READ_STALL", "Data readout stalled")

  // Make a queue to catch BS readout during stalls
  val queue = Module(new Queue(io.bs_rdat, 3, flow=true))
  queue.io.enq.valid := RegNext(RegNext(io.bs_radr.fire()))
  queue.io.enq.bits := io.bs_rdat
  assert (!queue.io.enq.valid || queue.io.enq.ready)

  params.ccover(!queue.io.enq.ready, "SOURCED_1_QUEUE_FULL", "Filled SRAM skidpad queue completely")

  when (io.bs_radr.fire()) { s1_block_r := Bool(true) }
  // 这里没有ready是因为，scheduler判断了只有ready了才会把valid选出来。
  when (io.req.valid) { busy := Bool(true) }
  when (s1_valid && s2_ready) {
    s1_counter := s1_counter + UInt(1)
    s1_block_r := Bool(false)
    when (s1_last) {
      s1_counter := UInt(0)
      busy := Bool(false)
    }
  }

  params.ccover(s1_valid && !s2_ready, "SOURCED_1_STALL", "Stage 1 pipeline blocked")

  io.req.ready := !busy
  s1_valid := (busy || io.req.valid) && (!s1_valid_r || io.bs_radr.ready)
  
    ////////////////////////////////////// STAGE 2 //////////////////////////////////////
  // Fetch the request data

  val s2_latch = s1_valid && s2_ready
  val s2_full = RegInit(Bool(false))
  val s2_valid_pb = RegInit(Bool(false))
  val s2_beat = RegEnable(s1_beat, s2_latch)
  val s2_bypass = RegEnable(s1_bypass, s2_latch)
  val s2_req = RegEnable(s1_req, s2_latch)
  val s2_last = RegEnable(s1_last, s2_latch)
  val s2_need_r = RegEnable(s1_need_r, s2_latch)
  val s2_need_pb = RegEnable(s1_need_pb, s2_latch)
  val s2_retires = RegEnable(s1_retires, s2_latch)
  // s2_need_d：需要d通道发请求，s1_need_bp表示是写的请求，取反表示是读的请求，是读的请求就需要回多拍d通道。s1_first表示是single或者写的请求，只要回一拍。
  val s2_need_d = RegEnable(!s1_need_pb || s1_first, s2_latch)
  val s2_pdata_raw = Wire(new PutBufferACEntry(params))
  // putBuffer是用寄存器搭的，所以当拍就能拿到数据，157行能立即拿到数据？
  val s2_pdata = s2_pdata_raw holdUnless s2_valid_pb

  s2_pdata_raw.data    := Mux(s2_req.prio(0), io.pb_beat.data, io.rel_beat.data)
  s2_pdata_raw.mask    := Mux(s2_req.prio(0), io.pb_beat.mask, ~UInt(0, width = params.inner.manager.beatBytes))
  s2_pdata_raw.corrupt := Mux(s2_req.prio(0), io.pb_beat.corrupt, io.rel_beat.corrupt)

  io.pb_pop.valid := s2_valid_pb && s2_req.prio(0)
  io.pb_pop.bits.index := s2_req.put
  io.pb_pop.bits.last  := s2_last
  io.rel_pop.valid := s2_valid_pb && !s2_req.prio(0)
  io.rel_pop.bits.index := s2_req.put
  io.rel_pop.bits.last  := s2_last

  params.ccover(io.pb_pop.valid && !io.pb_pop.ready, "SOURCED_2_PUTA_STALL", "Channel A put buffer was not ready in time")
  if (!params.firstLevel)
    params.ccover(io.rel_pop.valid && !io.rel_pop.ready, "SOURCED_2_PUTC_STALL", "Channel C put buffer was not ready in time")

  val pb_ready = Mux(s2_req.prio(0), io.pb_pop.ready, io.rel_pop.ready)
  when (pb_ready) { s2_valid_pb := Bool(false) }
  when (s2_valid && s3_ready) { s2_full := Bool(false) }
  when (s2_latch) { s2_valid_pb := s1_need_pb }
  when (s2_latch) { s2_full := Bool(true) }

  params.ccover(s2_valid && !s3_ready, "SOURCED_2_STALL", "Stage 2 pipeline blocked")

  s2_valid := s2_full && (!s2_valid_pb || pb_ready)
  s2_ready := !s2_full || (s3_ready && (!s2_valid_pb || pb_ready))
  
    ////////////////////////////////////// STAGE 3 //////////////////////////////////////
  // Send D response

  val s3_latch = s2_valid && s3_ready
  val s3_full = RegInit(Bool(false))
  val s3_valid_d = RegInit(Bool(false))
  val s3_beat = RegEnable(s2_beat, s3_latch)
  val s3_bypass = RegEnable(s2_bypass, s3_latch)
  val s3_req = RegEnable(s2_req, s3_latch)
  val s3_adjusted_opcode = Mux(s3_req.bad, Get, s3_req.opcode) // kill update when denied
  val s3_last = RegEnable(s2_last, s3_latch)
  val s3_pdata = RegEnable(s2_pdata, s3_latch)
  val s3_need_pb = RegEnable(s2_need_pb, s3_latch)
  val s3_retires = RegEnable(s2_retires, s3_latch)
  val s3_need_r = RegEnable(s2_need_r, s3_latch)
  val s3_need_bs = s3_need_pb
  val s3_acq = s3_req.opcode === AcquireBlock || s3_req.opcode === AcquirePerm

  // Collect s3's data from either the BankedStore or bypass
  // NOTE: we use the s3_bypass passed down from s1_bypass, because s2-s4 were guarded by the hazard checks and not stale
  val s3_bypass_data = Wire(UInt())
  // 8个8个切
  def chunk(x: UInt): Seq[UInt] = Seq.tabulate(beatBytes/writeBytes) { i => x((i+1)*writeBytes*8-1, i*writeBytes*8) }
  // 一位一位在切
  def chop (x: UInt): Seq[Bool] = Seq.tabulate(beatBytes/writeBytes) { i => x(i) }
  def bypass(sel: UInt, x: UInt, y: UInt) =
    (chop(sel) zip (chunk(x) zip chunk(y))) .map { case (s, (x, y)) => Mux(s, x, y) } .asUInt
  // s3_rdata：会从bypass的data和bankedStore里读出来的部分数据（前面读的时候会把bypass的数据给mask掉），做拼接。
  // s1只是用来判断是否可以bypass，s3才是真的bypass数据。而在s1_bypass判断的时候用的是s2/3/4，而s3用的时候用的是4/5/6，就流水流起来了。
  val s3_rdata = bypass(s3_bypass, s3_bypass_data, queue.io.deq.bits.data)

  // Lookup table for response codes
  val grant = Mux(s3_req.param === BtoT, Grant, GrantData)
  val resp_opcode = Vec(Seq(AccessAck, AccessAck, AccessAckData, AccessAckData, AccessAckData, HintAck, grant, Grant))

  // No restrictions on the type of buffer used here
  val d = Wire(io.d)
  io.d <> params.micro.innerBuf.d(d)

  // 有多个beat，就会发多个请求给sourceD？？是的。
  d.valid := s3_valid_d
  d.bits.opcode  := Mux(s3_req.prio(0), resp_opcode(s3_req.opcode), ReleaseAck)
  d.bits.param   := Mux(s3_req.prio(0) && s3_acq, Mux(s3_req.param =/= NtoB, toT, toB), UInt(0))
  d.bits.size    := s3_req.size
  d.bits.source  := s3_req.source
  d.bits.sink    := s3_req.sink
  d.bits.denied  := s3_req.bad
  d.bits.data    := s3_rdata
  d.bits.corrupt := s3_req.bad && d.bits.opcode(0)

  queue.io.deq.ready := s3_valid && s4_ready && s3_need_r
  assert (!s3_full || !s3_need_r || queue.io.deq.valid)

  when (d.ready) { s3_valid_d := Bool(false) }
  when (s3_valid && s4_ready) { s3_full := Bool(false) }
  when (s3_latch) { s3_valid_d := s2_need_d }
  when (s3_latch) { s3_full := Bool(true) }

  params.ccover(s3_valid && !s4_ready, "SOURCED_3_STALL", "Stage 3 pipeline blocked")

  s3_valid := s3_full && (!s3_valid_d || d.ready)
  s3_ready := !s3_full || (s4_ready && (!s3_valid_d || d.ready))
  ////////////////////////////////////// STAGE 4 //////////////////////////////////////
  // Writeback updated data

  val s4_latch = s3_valid && s3_retires && s4_ready
  val s4_full = RegInit(Bool(false))
  val s4_beat = RegEnable(s3_beat, s4_latch)
  val s4_need_r = RegEnable(s3_need_r, s4_latch)
  val s4_need_bs = RegEnable(s3_need_bs, s4_latch)
  val s4_need_pb = RegEnable(s3_need_pb, s4_latch)
  val s4_req = RegEnable(s3_req, s4_latch)
  val s4_adjusted_opcode = RegEnable(s3_adjusted_opcode, s4_latch)
  val s4_pdata = RegEnable(s3_pdata, s4_latch)
  val s4_rdata = RegEnable(s3_rdata, s4_latch)

  val atomics = Module(new Atomics(params.inner.bundle))
  atomics.io.write     := s4_req.prio(2)
  atomics.io.a.opcode  := s4_adjusted_opcode
  atomics.io.a.param   := s4_req.param
  atomics.io.a.size    := UInt(0)
  atomics.io.a.source  := UInt(0)
  atomics.io.a.address := UInt(0)
  atomics.io.a.mask    := s4_pdata.mask
  atomics.io.a.data    := s4_pdata.data
  atomics.io.data_in   := s4_rdata

  io.bs_wadr.valid := s4_full && s4_need_bs
  io.bs_wadr.bits.noop := Bool(false)
  io.bs_wadr.bits.way  := s4_req.way
  io.bs_wadr.bits.set  := s4_req.set
  io.bs_wadr.bits.beat := s4_beat
  io.bs_wadr.bits.mask := Cat(s4_pdata.mask.asBools.grouped(writeBytes).map(_.reduce(_||_)).toList.reverse)
  io.bs_wdat.data := atomics.io.data_out
  assert (!(s4_full && s4_need_pb && s4_pdata.corrupt), "Data poisoning unsupported")

  params.ccover(io.bs_wadr.valid && !io.bs_wadr.ready, "SOURCED_4_WRITEBACK_STALL", "Data writeback stalled")
  params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === MIN,  "SOURCED_4_ATOMIC_MIN",  "Evaluated a signed minimum atomic")
  params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === MAX,  "SOURCED_4_ATOMIC_MAX",  "Evaluated a signed maximum atomic")
  params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === MINU, "SOURCED_4_ATOMIC_MINU", "Evaluated an unsigned minimum atomic")
  params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === MAXU, "SOURCED_4_ATOMIC_MAXU", "Evaluated an unsigned minimum atomic")
  params.ccover(s4_req.prio(0) && s4_req.opcode === ArithmeticData && s4_req.param === ADD,  "SOURCED_4_ATOMIC_ADD",  "Evaluated an addition atomic")
  params.ccover(s4_req.prio(0) && s4_req.opcode === LogicalData    && s4_req.param === XOR,  "SOURCED_4_ATOMIC_XOR",  "Evaluated a bitwise XOR atomic")
  params.ccover(s4_req.prio(0) && s4_req.opcode === LogicalData    && s4_req.param === OR,   "SOURCED_4_ATOMIC_OR",   "Evaluated a bitwise OR atomic")
  params.ccover(s4_req.prio(0) && s4_req.opcode === LogicalData    && s4_req.param === AND,  "SOURCED_4_ATOMIC_AND",  "Evaluated a bitwise AND atomic")
  params.ccover(s4_req.prio(0) && s4_req.opcode === LogicalData    && s4_req.param === SWAP, "SOURCED_4_ATOMIC_SWAP", "Evaluated a bitwise SWAP atomic")

  when (io.bs_wadr.ready || !s4_need_bs) { s4_full := Bool(false) }
  when (s4_latch) { s4_full := Bool(true) }

  s4_ready := !s3_retires || !s4_full || io.bs_wadr.ready || !s4_need_bs

  ////////////////////////////////////// RETIRED //////////////////////////////////////

  // Record for bypass the last three retired writebacks
  // We need 3 slots to collect what was in s2, s3, s4 when the request was in s1
  // ... you can't rely on s4 being full if bubbles got introduced between s1 and s2
  val retire = s4_full && (io.bs_wadr.ready || !s4_need_bs)

  val s5_req  = RegEnable(s4_req,  retire)
  val s5_beat = RegEnable(s4_beat, retire)
  val s5_dat  = RegEnable(atomics.io.data_out, retire)

  val s6_req  = RegEnable(s5_req,  retire)
  val s6_beat = RegEnable(s5_beat, retire)
  val s6_dat  = RegEnable(s5_dat,  retire)

  val s7_dat  = RegEnable(s6_dat,  retire)

  ////////////////////////////////////// BYPASSS //////////////////////////////////////

  // Manually retime this circuit to pull a register stage forward
  val pre_s3_req  = Mux(s3_latch, s2_req,  s3_req)
  val pre_s4_req  = Mux(s4_latch, s3_req,  s4_req)
  val pre_s5_req  = Mux(retire,   s4_req,  s5_req)
  val pre_s6_req  = Mux(retire,   s5_req,  s6_req)
  val pre_s3_beat = Mux(s3_latch, s2_beat, s3_beat)
  val pre_s4_beat = Mux(s4_latch, s3_beat, s4_beat)
  val pre_s5_beat = Mux(retire,   s4_beat, s5_beat)
  val pre_s6_beat = Mux(retire,   s5_beat, s6_beat)
  val pre_s5_dat  = Mux(retire,   atomics.io.data_out, s5_dat)
  val pre_s6_dat  = Mux(retire,   s5_dat,  s6_dat)
  val pre_s7_dat  = Mux(retire,   s6_dat,  s7_dat)
  val pre_s4_full = s4_latch || (!(io.bs_wadr.ready || !s4_need_bs) && s4_full)

  val pre_s3_4_match  = pre_s4_req.set === pre_s3_req.set && pre_s4_req.way === pre_s3_req.way && pre_s4_beat === pre_s3_beat && pre_s4_full
  val pre_s3_5_match  = pre_s5_req.set === pre_s3_req.set && pre_s5_req.way === pre_s3_req.way && pre_s5_beat === pre_s3_beat
  val pre_s3_6_match  = pre_s6_req.set === pre_s3_req.set && pre_s6_req.way === pre_s3_req.way && pre_s6_beat === pre_s3_beat

  val pre_s3_4_bypass = Mux(pre_s3_4_match, MaskGen(pre_s4_req.offset, pre_s4_req.size, beatBytes, writeBytes), UInt(0))
  val pre_s3_5_bypass = Mux(pre_s3_5_match, MaskGen(pre_s5_req.offset, pre_s5_req.size, beatBytes, writeBytes), UInt(0))
  val pre_s3_6_bypass = Mux(pre_s3_6_match, MaskGen(pre_s6_req.offset, pre_s6_req.size, beatBytes, writeBytes), UInt(0))

  s3_bypass_data :=
    bypass(RegNext(pre_s3_4_bypass), atomics.io.data_out, RegNext(
    bypass(pre_s3_5_bypass, pre_s5_dat,
    bypass(pre_s3_6_bypass, pre_s6_dat,
                            pre_s7_dat))))

  // Detect which parts of s1 will be bypassed from later pipeline stages (s1-s4)
  // Note: we also bypass from reads ahead in the pipeline to save power
  val s1_2_match  = s2_req.set === s1_req.set && s2_req.way === s1_req.way && s2_beat === s1_beat && s2_full && s2_retires
  val s1_3_match  = s3_req.set === s1_req.set && s3_req.way === s1_req.way && s3_beat === s1_beat && s3_full && s3_retires
  val s1_4_match  = s4_req.set === s1_req.set && s4_req.way === s1_req.way && s4_beat === s1_beat && s4_full

  for (i <- 0 until 8) {
    val cover = UInt(i)
    val s2 = s1_2_match === cover(0)
    val s3 = s1_3_match === cover(1)
    val s4 = s1_4_match === cover(2)
    params.ccover(io.req.valid && s2 && s3 && s4, "SOURCED_BYPASS_CASE_" + i, "Bypass data from all subsets of pipeline stages")
  }

  val s1_2_bypass = Mux(s1_2_match, MaskGen(s2_req.offset, s2_req.size, beatBytes, writeBytes), UInt(0))
  val s1_3_bypass = Mux(s1_3_match, MaskGen(s3_req.offset, s3_req.size, beatBytes, writeBytes), UInt(0))
  val s1_4_bypass = Mux(s1_4_match, MaskGen(s4_req.offset, s4_req.size, beatBytes, writeBytes), UInt(0))

  s1_x_bypass := s1_2_bypass | s1_3_bypass | s1_4_bypass

  ////////////////////////////////////// HAZARDS //////////////////////////////////////

  // SinkC, SourceC, and SinkD can never interfer with each other because their operation
  // is fully contained with an execution plan of an MSHR. That MSHR owns the entire set, so
  // there is no way for a data race.

  // However, SourceD is special. We allow it to run ahead after the MSHR and scheduler have
  // released control of a set+way. This is necessary to allow single cycle occupancy for
  // hits. Thus, we need to be careful about data hazards between SourceD and the other ports
  // of the BankedStore. We can at least compare to registers 's1_req_reg', because the first
  // cycle of SourceD falls within the occupancy of the MSHR's plan.

  // Must ReleaseData=> be interlocked? RaW hazard
  io.evict_safe :=
    (!busy    || io.evict_req.way =/= s1_req_reg.way || io.evict_req.set =/= s1_req_reg.set) &&
    (!s2_full || io.evict_req.way =/= s2_req.way     || io.evict_req.set =/= s2_req.set) &&
    (!s3_full || io.evict_req.way =/= s3_req.way     || io.evict_req.set =/= s3_req.set) &&
    (!s4_full || io.evict_req.way =/= s4_req.way     || io.evict_req.set =/= s4_req.set)

  // Must =>GrantData be interlocked? WaR hazard
  io.grant_safe :=
    (!busy    || io.grant_req.way =/= s1_req_reg.way || io.grant_req.set =/= s1_req_reg.set) &&
    (!s2_full || io.grant_req.way =/= s2_req.way     || io.grant_req.set =/= s2_req.set) &&
    (!s3_full || io.grant_req.way =/= s3_req.way     || io.grant_req.set =/= s3_req.set) &&
    (!s4_full || io.grant_req.way =/= s4_req.way     || io.grant_req.set =/= s4_req.set)

  // SourceD cannot overlap with SinkC b/c the only way inner caches could become
  // dirty such that they want to put data in via SinkC is if we Granted them permissions,
  // which must flow through the SourecD pipeline.
}

sinkE

就是个消息的转发，什么也没干。

class SinkEResponse(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val sink = UInt(width = params.inner.bundle.sinkBits)
}

class SinkE(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val resp = Valid(new SinkEResponse(params))
    val e = Decoupled(new TLBundleE(params.inner.bundle)).flip
  }

  if (params.firstLevel) {
    // Tie off unused ports
    io.resp.valid := Bool(false)
    io.e.ready := Bool(true)
  } else {
    // No restrictions on buffer
    val e = params.micro.innerBuf.e(io.e)

    e.ready := Bool(true)
    io.resp.valid := e.valid
    io.resp.bits.sink := e.bits.sink
  }
}

sinkX

也是个消息的转发，把接收到的请求，适配到a通道的请求类型上，并把control标记上，表示是从X通道来的请求。

为什么不在前级转？没准很远的地方来的，可以节省线

class SinkXRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val address = UInt(width = params.inner.bundle.addressBits)
}

class SinkX(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val req = Decoupled(new FullRequest(params))
    val x = Decoupled(new SinkXRequest(params)).flip
  }

  val x = Queue(io.x, 1)
  val (tag, set, offset) = params.parseAddress(x.bits.address)

  x.ready := io.req.ready
  io.req.valid := x.valid
  params.ccover(x.valid && !x.ready, "SINKX_STALL", "Backpressure when accepting a control message")

  io.req.bits.prio   := Vec(UInt(1, width=3).asBools) // same prio as A
  io.req.bits.control:= Bool(true)
  io.req.bits.opcode := UInt(0)
  io.req.bits.param  := UInt(0)
  io.req.bits.size   := UInt(params.offsetBits)
  // The source does not matter, because a flush command never allocates a way.
  // However, it must be a legal source, otherwise assertions might spuriously fire.
  io.req.bits.source := UInt(params.inner.client.clients.map(_.sourceId.start).min)
  io.req.bits.offset := UInt(0)
  io.req.bits.set    := set
  io.req.bits.tag    := tag
}

sourceA

也就转下消息类型。

class SourceARequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val tag    = UInt(width = params.tagBits)
  val set    = UInt(width = params.setBits)
  val param  = UInt(width = 3)
  val source = UInt(width = params.outer.bundle.sourceBits)
  // 表示是acquireBlock还是acquirePerm
  val block  = Bool()
}

class SourceA(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val req = Decoupled(new SourceARequest(params)).flip
    val a = Decoupled(new TLBundleA(params.outer.bundle))
  }

  // ready must be a register, because we derive valid from ready
  require (!params.micro.outerBuf.a.pipe && params.micro.outerBuf.a.isDefined)

  val a = Wire(io.a)
  io.a <> params.micro.outerBuf.a(a)

  io.req.ready := a.ready
  a.valid := io.req.valid
  params.ccover(a.valid && !a.ready, "SOURCEA_STALL", "Backpressured when issuing an Acquire")

  a.bits.opcode  := Mux(io.req.bits.block, TLMessages.AcquireBlock, TLMessages.AcquirePerm)
  a.bits.param   := io.req.bits.param
  a.bits.size    := UInt(params.offsetBits)
  a.bits.source  := io.req.bits.source
  a.bits.address := params.expandAddress(io.req.bits.tag, io.req.bits.set, UInt(0))
  a.bits.mask    := ~UInt(0, width = params.outer.manager.beatBytes)
  a.bits.data    := UInt(0)
}

sourceC

sourceC干的事情如下：

接收MSHR过来的请求
查询sourceD，是否有hazard的情况，如果有，就等待，如果没有，就可以往下走
在s1的时候去bankedStore读取数据
在s3的时候把数据拿回来，送到queue里面
由queue再发到下级C端口上

class SourceC(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val req = Decoupled(new SourceCRequest(params)).flip
    val c = Decoupled(new TLBundleC(params.outer.bundle))
    // BankedStore port
    val bs_adr = Decoupled(new BankedStoreOuterAddress(params))
    val bs_dat = new BankedStoreOuterDecoded(params).flip
    // RaW hazard
    val evict_req = new SourceDHazard(params)
    // 是release走完了MSHR，到了sourceC，要往下踢之前，会去查sourceD里是否还有没做完的同地址的请求，因为这个同地址的请求可能往bankedStore里写数据（且是更老的），那release的数据就不是新的，就要等sourceD的这个请求做完了，才能让release走。
    val evict_safe = Bool().flip
  }

  // We ignore the depth and pipe is useless here (we have to provision for worst-case=stall)
  require (!params.micro.outerBuf.c.pipe)

  val beatBytes = params.outer.manager.beatBytes
  val beats = params.cache.blockBytes / beatBytes
  // flow：满了还能进，只要出口还能出。pipe：空了还能出，只要入口有进来
  val flow = params.micro.outerBuf.c.flow
  // 为什么是加3，因为是用的下面的ready来握手的，中间还有3拍，所以当下面ready拉下来之后，还有三拍数据从cache里读出来，所以要3个空间来存数据。
  val queue = Module(new Queue(io.c.bits, beats + 3 + (if (flow) 0 else 1), flow = flow))

  // queue.io.count is far too slow
  val fillBits = log2Up(beats + 4)
  val fill = RegInit(UInt(0, width = fillBits))
  val room = RegInit(Bool(true))
  // 只要有入队，或者有出队
  when (queue.io.enq.fire() =/= queue.io.deq.fire()) {
    // fill：就是queue里的counter
    fill := fill + Mux(queue.io.enq.fire(), UInt(1), ~UInt(0, width = fillBits))
    // room：就是queue的counter <= 1，为什么是小于等于1？是小于等于1的时候，才能装下这个burst，才ready。或者可以理解为有空位的意思。之所以有这个问题是因为我没意识到数据是burtst的。
    room := fill === UInt(0) || ((fill === UInt(1) || fill === UInt(2)) && !queue.io.enq.fire())
  }
  assert (room === queue.io.count <= UInt(1))

  val busy = RegInit(Bool(false))
  val beat = RegInit(UInt(0, width = params.outerBeatBits))
  val last = beat.andR
  val req  = Mux(!busy, io.req.bits, RegEnable(io.req.bits, !busy && io.req.valid))
  // 在有请求，且有空位（可以放得下一组burst数据），且dirty（需要evict），的时候才want_data
  val want_data = busy || (io.req.valid && room && io.req.bits.dirty)

  io.req.ready := !busy && room

  io.evict_req.set := req.set
  io.evict_req.way := req.way

  // 只有第一个beat的时候，才要判断evict_safe。
  io.bs_adr.valid := (beat.orR || io.evict_safe) && want_data
  io.bs_adr.bits.noop := Bool(false)
  io.bs_adr.bits.way  := req.way
  io.bs_adr.bits.set  := req.set
  io.bs_adr.bits.beat := beat
  io.bs_adr.bits.mask := ~UInt(0, width = params.outerMaskBits)

  params.ccover(io.req.valid && io.req.bits.dirty && room && !io.evict_safe, "SOURCEC_HAZARD", "Prevented Eviction data hazard with backpressure")
  params.ccover(io.bs_adr.valid && !io.bs_adr.ready, "SOURCEC_SRAM_STALL", "Data SRAM busy")

  when (io.req.valid && room && io.req.bits.dirty) { busy := Bool(true) }
  when (io.bs_adr.fire()) {
    when (last) { busy := Bool(false) }
    beat := beat + UInt(1)
  }

  // 为什么有S2/S3，因为banked store里面读数据打了一拍，这里就是跟拍子。
  val s2_latch = Mux(want_data, io.bs_adr.fire(), io.req.fire())
  val s2_valid = RegNext(s2_latch)
  val s2_req = RegEnable(req, s2_latch)
  val s2_beat = RegEnable(beat, s2_latch)
  val s2_last = RegEnable(last, s2_latch)

  val s3_latch = s2_valid
  val s3_valid = RegNext(s3_latch)
  val s3_req = RegEnable(s2_req, s3_latch)
  val s3_beat = RegEnable(s2_beat, s3_latch)
  val s3_last = RegEnable(s2_last, s3_latch)

  val c = Wire(io.c)
  c.valid        := s3_valid
  c.bits.opcode  := s3_req.opcode
  c.bits.param   := s3_req.param
  c.bits.size    := UInt(params.offsetBits)
  c.bits.source  := s3_req.source
  c.bits.address := params.expandAddress(s3_req.tag, s3_req.set, UInt(0))
  c.bits.data    := io.bs_dat.data
  c.bits.corrupt := Bool(false)

  // We never accept at the front-end unless we're sure things will fit
  assert(!c.valid || c.ready)
  params.ccover(!c.ready, "SOURCEC_QUEUE_FULL", "Eviction queue fully utilized")

  queue.io.enq <> c
  io.c <> queue.io.deq
}

sinkD

和sinkC的probeAck类似，直接写入bankedStore。

hazard就是如果有相同set/way的在sourceD走流水，就暂缓一下往bankedStore写，因为没有buffer，会挡在口子上。用grant_safe来控制。

class SinkD(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val resp = Valid(new SinkDResponse(params)) // Grant or ReleaseAck
    val d = Decoupled(new TLBundleD(params.outer.bundle)).flip
    // Lookup the set+way from MSHRs
    val source = UInt(width = params.outer.bundle.sourceBits)
    val way    = UInt(width = params.wayBits).flip
    val set    = UInt(width = params.setBits).flip
    // Banked Store port
    val bs_adr = Decoupled(new BankedStoreOuterAddress(params))
    val bs_dat = new BankedStoreOuterPoison(params)
    // WaR hazard
    val grant_req = new SourceDHazard(params)
    val grant_safe = Bool().flip
  }

  // No restrictions on buffer
  val d = params.micro.outerBuf.d(io.d)

  val (first, last, _, beat) = params.outer.count(d)
  val hasData = params.outer.hasData(d.bits)

  io.source := Mux(d.valid, d.bits.source, RegEnable(d.bits.source, d.valid))
  io.grant_req.way := io.way
  io.grant_req.set := io.set

  // 即使没有数据，也会往bankedStore送请求，不过会用noop来控制不往ram里写，从而保持正确的data ordering。
  // Also send Grant(NoData) to BS to ensure correct data ordering
  io.resp.valid := (first || last) && d.fire()
  d.ready := io.bs_adr.ready && (!first || io.grant_safe)
  io.bs_adr.valid := !first || (d.valid && io.grant_safe)
  params.ccover(d.valid && first && !io.grant_safe, "SINKD_HAZARD", "Prevented Grant data hazard with backpressure")
  params.ccover(io.bs_adr.valid && !io.bs_adr.ready, "SINKD_SRAM_STALL", "Data SRAM busy")

  io.resp.bits.last   := last
  io.resp.bits.opcode := d.bits.opcode
  io.resp.bits.param  := d.bits.param
  io.resp.bits.source := d.bits.source
  io.resp.bits.sink   := d.bits.sink
  io.resp.bits.denied := d.bits.denied

  io.bs_adr.bits.noop := !d.valid || !hasData
  io.bs_adr.bits.way  := io.way
  io.bs_adr.bits.set  := io.set
  io.bs_adr.bits.beat := Mux(d.valid, beat, RegEnable(beat + io.bs_adr.ready.asUInt, d.valid))
  io.bs_adr.bits.mask := ~UInt(0, width = params.outerMaskBits)
  io.bs_dat.data      := d.bits.data

  assert (!(d.valid && d.bits.corrupt && !d.bits.denied), "Data poisoning unsupported")
}

sourceE

啥也没有，直接转消息。

class SourceERequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val sink = UInt(width = params.outer.bundle.sinkBits)
}

class SourceE(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val req = Decoupled(new SourceERequest(params)).flip
    val e = Decoupled(new TLBundleE(params.outer.bundle))
  }

  // ready must be a register, because we derive valid from ready
  require (!params.micro.outerBuf.e.pipe && params.micro.outerBuf.e.isDefined)

  val e = Wire(io.e)
  io.e <> params.micro.outerBuf.e(e)

  io.req.ready := e.ready
  e.valid := io.req.valid

  e.bits.sink := io.req.bits.sink

  // we can't cover valid+!ready, because no backpressure on E is common
}

sourceX

啥也没有，直接发response。

// The control port response source
class SourceXRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val fail = Bool()
}

class SourceX(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val req = Decoupled(new SourceXRequest(params)).flip
    val x = Decoupled(new SourceXRequest(params))
  }

  val x = Wire(io.x) // ready must not depend on valid
  io.x <> Queue(x, 1)

  io.req.ready := x.ready
  x.valid := io.req.valid
  params.ccover(x.valid && !x.ready, "SOURCEX_STALL", "Backpressure when sending a control message")

  x.bits := io.req.bits
}

bankedStore

bankedStore是用来保存数据的，内部又分了多个子bank。它主要处理的事情是：

保存cache的数据
接收sinkC / sourceD / sourceC / sinkD 来的读写数据请求
处理请求的优先级：sinkC > SourceC > sinkD > sourceDw > sourceDr
处理数据位宽的转换，内外的数据位宽都可变，bank数根据位宽参数变化

该模块两大比较关键的点。

为什么这么设置优先级？遗留问题
bank的组织方式。一条cacheline是横着放的，会被放在不同bank上，如果burst来的话，就会先访问bank0，再bank1，这样再来一个请求就能流水起来访问了。

abstract class BankedStoreAddress(val inner: Boolean, params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  // noop是给burst用的，第一个burst来的时候，会占住下一个burst的bank，从而挡住年轻的访问。因为访问都是流水的，挡住了下一个burst，就一定挡住了他后面burst的bank。
  val noop = Bool() // do not actually use the SRAMs, just block their use
  val way  = UInt(width = params.wayBits)
  val set  = UInt(width = params.setBits)
  // beat是burst的个数
  // innerBytes: L1->L2的bytes宽度。
  // outerBytes: L2->Memory的bytes宽度。这两个都是相对于L2 CACHE来说的。
  val beat = UInt(width = if (inner) params.innerBeatBits else params.outerBeatBits)
  val mask = UInt(width = if (inner) params.innerMaskBits else params.outerMaskBits)
}

trait BankedStoreRW
{
  val write = Bool()
}

class BankedStoreOuterAddress(params: InclusiveCacheParameters) extends BankedStoreAddress(false, params)
class BankedStoreInnerAddress(params: InclusiveCacheParameters) extends BankedStoreAddress(true, params)
class BankedStoreInnerAddressRW(params: InclusiveCacheParameters) extends BankedStoreInnerAddress(params) with BankedStoreRW

abstract class BankedStoreData(val inner: Boolean, params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val data = UInt(width = (if (inner) params.inner.manager.beatBytes else params.outer.manager.beatBytes)*8)
}

class BankedStoreOuterData(params: InclusiveCacheParameters) extends BankedStoreData(false, params)
class BankedStoreInnerData(params: InclusiveCacheParameters) extends BankedStoreData(true,  params)
class BankedStoreInnerPoison(params: InclusiveCacheParameters) extends BankedStoreInnerData(params)
class BankedStoreOuterPoison(params: InclusiveCacheParameters) extends BankedStoreOuterData(params)
class BankedStoreInnerDecoded(params: InclusiveCacheParameters) extends BankedStoreInnerData(params)
class BankedStoreOuterDecoded(params: InclusiveCacheParameters) extends BankedStoreOuterData(params)

class BankedStore(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val sinkC_adr = Decoupled(new BankedStoreInnerAddress(params)).flip
    val sinkC_dat = new BankedStoreInnerPoison(params).flip
    val sinkD_adr = Decoupled(new BankedStoreOuterAddress(params)).flip
    val sinkD_dat = new BankedStoreOuterPoison(params).flip
    val sourceC_adr = Decoupled(new BankedStoreOuterAddress(params)).flip
    val sourceC_dat = new BankedStoreOuterDecoded(params)
    val sourceD_radr = Decoupled(new BankedStoreInnerAddress(params)).flip
    val sourceD_rdat = new BankedStoreInnerDecoded(params)
    val sourceD_wadr = Decoupled(new BankedStoreInnerAddress(params)).flip
    val sourceD_wdat = new BankedStoreInnerPoison(params).flip
  }
  // innerBytes: L1->L2的bytes宽度。
  val innerBytes = params.inner.manager.beatBytes
  // outerBytes: L2->Memory的bytes宽度。这两个都是相对于L2 CACHE来说的。
  val outerBytes = params.outer.manager.beatBytes
  // rowBytes: CACHE的宽度，所有bank加起来的宽度
  // portFactor：并行访问的port的个数，它会决定bankedstore的bank个数，？？具体是基于什么来设置的还不明白？？
  val rowBytes = params.micro.portFactor * max(innerBytes, outerBytes)
  require (rowBytes < params.cache.sizeBytes)
  // rowEntries：CACHE的深度。
  val rowEntries = params.cache.sizeBytes / rowBytes
  // rowBits：CACHE的深度的index宽度。
  val rowBits = log2Ceil(rowEntries)
  // writeBytes：小bank的粒度，ECC的粒度
  // numBanks: BANK个数
  val numBanks = rowBytes / params.micro.writeBytes
  val codeBits = 8*params.micro.writeBytes

  val cc_banks = Seq.tabulate(numBanks) {
    i =>
      DescribedSRAM(
        name = s"cc_banks_$i",
        desc = "Banked Store",
        size = rowEntries,
        data = UInt(width = codeBits)
      )
  }
  // These constraints apply on the port priorities:
  //  sourceC > sinkD     outgoing Release > incoming Grant      (we start eviction+refill concurrently)
  //  sinkC > sourceC     incoming ProbeAck > outgoing ProbeAck  (we delay probeack writeback by 1 cycle for QoR)
  //  sinkC > sourceDr    incoming ProbeAck > SourceD read       (we delay probeack writeback by 1 cycle for QoR)
  //  sourceDw > sourceDr modified data visible on next cycle    (needed to ensure SourceD forward progress)
  //  sinkC > sourceC     inner ProbeAck > outer ProbeAck        (make wormhole routing possible [not yet implemented])
  //  sinkC&D > sourceD*  beat arrival > beat read|update        (make wormhole routing possible [not yet implemented])

  // Combining these restrictions yields a priority scheme of:
  //  sinkC > sourceC > sinkD > sourceDw > sourceDr
  //          ^^^^^^^^^^^^^^^ outer interface

  // 假设有3个请求，A > B > C , A要访问 - - A - （第2个bank），B要访问 - - B B （第2/3个bank），C 要访问 - - - C （第3个bank），因为入口和出口位宽可能不一样，所以访问粒度也可能不一样，这样是不能让AC先走，虽然 AC是访问不同bank，因为B在C前面，所以只能是 - - A -， - - B B ， - - - C 这样走。
  // 而如果是 - - A - ， BB - -, - - - C，的情况，就可以一次走掉 BBAC。

  // Requests have different port widths, but we don't want to allow cutting in line.
  // Suppose we have requests A > B > C requesting ports --A-, --BB, ---C.
  // The correct arbitration is to allow --A- only, not --AC.
  // Obviously --A-, BB--, ---C should still be resolved to BBAC.

  class Request extends Bundle {
    val wen      = Bool()
    val index    = UInt(width = rowBits)
    // bankSel是我占住了这个bank
    val bankSel  = UInt(width = numBanks)
    // bankSum是更高优先级的占住了这个bank
    val bankSum  = UInt(width = numBanks) // OR of all higher priority bankSels
    // bankEn是给ram看的，ram的读写使能
    val bankEn   = UInt(width = numBanks) // ports actually activated by request
    val data     = Vec(numBanks, UInt(width = codeBits))
  }

  def req[T <: BankedStoreAddress](b: DecoupledIO[T], write: Bool, d: UInt): Request = {
    val beatBytes = if (b.bits.inner) innerBytes else outerBytes
    val ports = beatBytes / params.micro.writeBytes
    val bankBits = log2Ceil(numBanks / ports)
    val words = Seq.tabulate(ports) { i =>
      val data = d((i + 1) * 8 * params.micro.writeBytes - 1, i * 8 * params.micro.writeBytes)
      data
    }
    val a = Cat(b.bits.way, b.bits.set, b.bits.beat)
    val m = b.bits.mask
    val out = Wire(new Request)

    val select = UIntToOH(a(bankBits-1, 0), numBanks/ports)
    // ready是给这个请求的反馈，是否接收这个请求
    val ready  = Cat(Seq.tabulate(numBanks/ports) { i => !(out.bankSum((i+1)*ports-1, i*ports) & m).orR } .reverse)
    b.ready := ready(a(bankBits-1, 0))

    out.wen      := write
    out.index    := a >> bankBits
    // Fill: 复制多份，Fill(2, 1010) = 101010101
    // FillInterleaved: 复制多份，FillInterleaved(2, 1010) = 11001100
    out.bankSel  := Mux(b.valid, FillInterleaved(ports, select) & Fill(numBanks/ports, m), UInt(0))
    out.bankEn   := Mux(b.bits.noop, UInt(0), out.bankSel & FillInterleaved(ports, ready))
    out.data     := Vec(Seq.fill(numBanks/ports) { words }.flatten)

    out
  }

  val innerData = UInt(0, width = innerBytes*8)
  val outerData = UInt(0, width = outerBytes*8)
  val W = Bool(true)
  val R = Bool(false)

  val sinkC_req    = req(io.sinkC_adr,    W, io.sinkC_dat.data)
  val sinkD_req    = req(io.sinkD_adr,    W, io.sinkD_dat.data)
  val sourceC_req  = req(io.sourceC_adr,  R, outerData)
  val sourceD_rreq = req(io.sourceD_radr, R, innerData)
  val sourceD_wreq = req(io.sourceD_wadr, W, io.sourceD_wdat.data)

  // See the comments above for why this prioritization is used
  val reqs = Seq(sinkC_req, sourceC_req, sinkD_req, sourceD_wreq, sourceD_rreq)

  // FoldLeft那行，先把sum初始化是0，把sum给req.bankSum，也就是第一个req没有能被挡住的，然后把第一个req的bankSel或上sum给下一个sum，然后再把sum给第二个req.bankSum，这样第一个的bankSel就会把第二个的bankSum拉起来，从而挡住第二个的req。
  // Connect priorities; note that even if a request does not go through due to failing
  // to obtain a needed subbank, it still blocks overlapping lower priority requests.
  reqs.foldLeft(UInt(0)) { case (sum, req) =>
    req.bankSum := sum
    req.bankSel | sum
  }
  // regout是读的下下拍的读数据的结果，为什么是下下拍，估摸着时序不够？
  // Access the banks
  val regout = Vec(cc_banks.zipWithIndex.map { case ((b, omSRAM), i) =>
    val en  = reqs.map(_.bankEn(i)).reduce(_||_)
    val sel = reqs.map(_.bankSel(i))
    val wen = PriorityMux(sel, reqs.map(_.wen))
    val idx = PriorityMux(sel, reqs.map(_.index))
    val data= PriorityMux(sel, reqs.map(_.data(i)))

    when (wen && en) { b.write(idx, data) }
    RegEnable(b.read(idx, !wen && en), RegNext(!wen && en))
  })

  val regsel_sourceC = RegNext(RegNext(sourceC_req.bankEn))
  val regsel_sourceD = RegNext(RegNext(sourceD_rreq.bankEn))

  // grouped：按多长分组，假设x有16个，x.grouped(4) = ((1,2,3,4),(5,6,7,8),(9,a,b,c),(d,e,f,g))
  // transpose: 转置
  // map：把里面的每个成员都这么干。
  val decodeC = regout.zipWithIndex.map {
    case (r, i) => Mux(regsel_sourceC(i), r, UInt(0))
  }.grouped(outerBytes/params.micro.writeBytes).toList.transpose.map(s => s.reduce(_|_))

  io.sourceC_dat.data := Cat(decodeC.reverse)

  val decodeD = regout.zipWithIndex.map {
    // Intentionally not Mux1H and/or an indexed-mux b/c we want it 0 when !sel to save decode power
    case (r, i) => Mux(regsel_sourceD(i), r, UInt(0))
  }.grouped(innerBytes/params.micro.writeBytes).toList.transpose.map(s => s.reduce(_|_))

  io.sourceD_rdat.data := Cat(decodeD.reverse)

  private def banks = cc_banks.map("\"" + _._1.pathName + "\"").mkString(",")
  def json: String = s"""{"widthBytes":${params.micro.writeBytes},"mem":[${banks}]}"""
}

requests

是个listBuffer，共3*MSHR个数的队列。每个MSHR有3个队列，分别对应abc，优先级c > b > a。

Queue里面是没有完整的地址的，只记录了tag，因为MSHR都是同set的在排队的。

如果一个mshr被占住，那其他set的就不会被分配到这个mshr了，只会给它相应队列的请求用。

如果有不同的set的请求，会尝试分配新的mshr，如果没有新的mshr，就会被挡在外面了。

MSHR

MSHR里面主要是一个大的状态机，用来处理一个独立的请求。

在理解MSHR之前，首先要理解几个概念。什么是nestB / blockB / nestC / blockC？

blockB: block住下面来的同set的B请求。
nestB: 允许下面来的同set的B请求。在开始还没拿到directory的时候，可能出现既不允许也不block的情况。
如果blockB了，不ready，被挡在sinkB的入口了。
如果nestB了，就可以插队。
如果两个都为0，就进request Queue，放弃这次插队，进了队列就表示不会再插队了，要等没有同set的请求了。
两个都为1，是不可能的，被断言了。

blockC: block住上面来的同set的C请求。
nestC: 允许上面来的同set的C请求。同样可能出现两者都为0的情况，但是不会同时为1。

class QueuedRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  // prio：指该请求的来源，A=001，B=010，C=100
  val prio   = Vec(3, Bool()) // A=001, B=010, C=100
  // control：control==1 && prio==A 表示是x通道的。优先级是：C > B > X(A & control) > A(control=0)
  val control= Bool() // control command
  val opcode = UInt(width = 3)
  val param  = UInt(width = 3)
  val size   = UInt(width = params.inner.bundle.sizeBits)
  val source = UInt(width = params.inner.bundle.sourceBits)
  val tag    = UInt(width = params.tagBits)
  val offset = UInt(width = params.offsetBits)
  // put：sinkA/sinkC里由put buffer，表示存在put buffer的哪个位置了。
  val put    = UInt(width = params.putBits)
}

class FullRequest(params: InclusiveCacheParameters) extends QueuedRequest(params)
{
  val set = UInt(width = params.setBits)
}

class AllocateRequest(params: InclusiveCacheParameters) extends FullRequest(params)
{
  // repeat：如果下一笔请求和上一笔请求是同地址（同set同tag）的，就没必要再去读directory了，已经再MSHR里有残存的meta了。
  val repeat = Bool() // set is the same
}
class ScheduleRequest(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val a = Valid(new SourceARequest(params))
  val b = Valid(new SourceBRequest(params))
  val c = Valid(new SourceCRequest(params))
  val d = Valid(new SourceDRequest(params))
  val e = Valid(new SourceERequest(params))
  val x = Valid(new SourceXRequest(params))
  val dir = Valid(new DirectoryWrite(params))
  // reload：MSHR做完事情的前一拍，告诉scheduler可以再分配一个请求，这样MSHR就能连续的处理请求了。因为MSHR也是收到请求的下一拍才会对请求进行处理（打一拍保存在寄存器里）
  val reload = Bool() // get next request via allocate (if any)
}

class MSHRStatus(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val set = UInt(width = params.setBits)
  val tag = UInt(width = params.tagBits)
  val way = UInt(width = params.wayBits)
  val blockB = Bool()
  val nestB  = Bool()
  val blockC = Bool()
  val nestC  = Bool()
}

// nestedwb：收到插队的请求，说明自己被别人插队了，要把自己的directory状态改掉。是由scheduler计算得到的，从bc_mshr和c_mshr写directory的值中计算出来。只有abc加上bc，才是bc插队abc；或者c插队bc；c插队abc这样子。
class NestedWriteback(params: InclusiveCacheParameters) extends InclusiveCacheBundle(params)
{
  val set = UInt(width = params.setBits)
  val tag = UInt(width = params.tagBits)
  // b_toN: 下面来的nested probe，可能把我变成N
  val b_toN       = Bool() // nested Probes may unhit us
  // b_toB: 下面来的nested probe，可能把我变成B
  val b_toB       = Bool() // nested Probes may demote us
  // b_clr_dirty: 下面来的nested probe，会把我的dirty清掉
  val b_clr_dirty = Bool() // nested Probes clear dirty
  // c_set_dirty: 上面来的nested release，会置位我的dirty。
  val c_set_dirty = Bool() // nested Releases MAY set dirty
}

sealed trait CacheState
{
  val code = UInt(CacheState.index)
  CacheState.index = CacheState.index + 1
}

object CacheState
{
  var index = 0
}

case object S_INVALID  extends CacheState
case object S_BRANCH   extends CacheState
case object S_BRANCH_C extends CacheState
case object S_TIP      extends CacheState
case object S_TIP_C    extends CacheState
case object S_TIP_CD   extends CacheState
case object S_TIP_D    extends CacheState
case object S_TRUNK_C  extends CacheState
case object S_TRUNK_CD extends CacheState

class MSHR(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val allocate  = Valid(new AllocateRequest(params)).flip // refills MSHR for next cycle
    val directory = Valid(new DirectoryResult(params)).flip // triggers schedule setup
    val status    = Valid(new MSHRStatus(params))
    val schedule  = Decoupled(new ScheduleRequest(params))
    val sinkc     = Valid(new SinkCResponse(params)).flip
    // sinkD会发两次过来，first一个，last一个。所以sinkDResponse里面有last信号。
    val sinkd     = Valid(new SinkDResponse(params)).flip
    val sinke     = Valid(new SinkEResponse(params)).flip
    val nestedwb  = new NestedWriteback(params).flip
  }

  val request_valid = RegInit(Bool(false))
  val request = Reg(new FullRequest(params))
  val meta_valid = RegInit(Bool(false))
  val meta = Reg(new DirectoryResult(params))

  // Define which states are valid
  when (meta_valid) {
    when (meta.state === INVALID) {
      assert (!meta.clients.orR)
      assert (!meta.dirty)
    }
    when (meta.state === BRANCH) {
      assert (!meta.dirty)
    }
    when (meta.state === TRUNK) {
      assert (meta.clients.orR)
      assert ((meta.clients & (meta.clients - UInt(1))) === UInt(0)) // at most one
    }
    when (meta.state === TIP) {
      // noop
    }
  }

  // 状态要全变为true，才表示该entry做完。
  // Completed transitions (s_ = scheduled), (w_ = waiting)
  // s_rprobe: 发出由release引起的对上面的probe。
  val s_rprobe         = RegInit(Bool(true)) // B
  // w_rprobeackfirst: 正在等待，release 引起的probe的probeack。也就是release的时候发现有数据在L1，就去probe L1，当前正在等待L1的probeack。是最后一个client的第一笔。
  val w_rprobeackfirst = RegInit(Bool(true))
  // w_rprobeacklast: 最后一个client的最后一笔。
  val w_rprobeacklast  = RegInit(Bool(true))
  // s_release: 发出给下面的release。最后一个client的第一笔就能release了（w_rprobeackfirst）。
  val s_release        = RegInit(Bool(true)) // CW w_rprobeackfirst
  // w_releaseack: 等待由下面发来的releaseack
  val w_releaseack     = RegInit(Bool(true))
  // s_pprobe: 是由acquire引起的probe，对上面的probe。
  val s_pprobe         = RegInit(Bool(true)) // B
  // s_acquire: 处理由上面来的acquire
  val s_acquire        = RegInit(Bool(true)) // A  s_release, s_pprobe [1]
  // s_flush: 处理flush请求，安排一个x通道的回应。
  val s_flush          = RegInit(Bool(true)) // X  w_releaseack
  // w_grantfirst: 等待由下面来的grant
  val w_grantfirst     = RegInit(Bool(true))
  val w_grantlast      = RegInit(Bool(true))
  // w_grant: 等待由下面来的grant
  val w_grant          = RegInit(Bool(true)) // first | last depending on wormhole
  // w_pprobeackfirst: 正在等待，L3 probe引起的 L1 probe的probeack。也就是假设下面来了一个probe，发现有数据在上面，就去probe上面，当前正在等待上面回probeack
  val w_pprobeackfirst = RegInit(Bool(true))
  val w_pprobeacklast  = RegInit(Bool(true))
  // w_pprobeack: 正在等待，L3 probe引起的 L1 probe的probeack。**和上面两个什么区别？？**
  val w_pprobeack      = RegInit(Bool(true)) // first | last depending on wormhole
  // s_probeack: schedule一个probeack，也就是发出probeack。发送给下面probeack。
  val s_probeack       = RegInit(Bool(true)) // C  w_pprobeackfirst (mutually exclusive with next two s_*)
  // s_grantack: 发出给上面的grantack
  val s_grantack       = RegInit(Bool(true)) // E  w_grantfirst ... CAN require both outE&inD to service outD
  // s_execute: 可以去sourceD走流水了，MSHR该干的干完了。
  val s_execute        = RegInit(Bool(true)) // D  w_pprobeack, w_grant
  // w_grantack: 等待由上面来的grantac
  val w_grantack       = RegInit(Bool(true))
  val s_writeback      = RegInit(Bool(true)) // W  w_*

  // [1]: We cannot issue outer Acquire while holding blockB (=> outA can stall)
  // However, inB and outC are higher priority than outB, so s_release and s_pprobe
  // may be safely issued while blockB. Thus we must NOT try to schedule the
  // potentially stuck s_acquire with either of them (scheduler is all or none).

  // Meta-data that we discover underway
  val sink = Reg(UInt(width = params.outer.bundle.sinkBits))
  val gotT = Reg(Bool())
  val bad_grant = Reg(Bool())
  val probes_done = Reg(UInt(width = params.clientBits))
  val probes_toN = Reg(UInt(width = params.clientBits))
  val probes_noT = Reg(Bool())

  // When a nested transaction completes, update our meta data
  when (meta_valid && meta.state =/= INVALID &&
        io.nestedwb.set === request.set && io.nestedwb.tag === meta.tag) {
    when (io.nestedwb.b_clr_dirty) { meta.dirty := Bool(false) }
    when (io.nestedwb.c_set_dirty) { meta.dirty := Bool(true) }
    when (io.nestedwb.b_toB) { meta.state := BRANCH }
    when (io.nestedwb.b_toN) { meta.hit := Bool(false) }
  }

  // Scheduler status
  io.status.valid := request_valid
  io.status.bits.set    := request.set
  io.status.bits.tag    := request.tag
  io.status.bits.way    := meta.way
  io.status.bits.blockB := !meta_valid || ((!w_releaseack || !w_rprobeacklast || !w_pprobeacklast) && !w_grantfirst)
  io.status.bits.nestB  := meta_valid && w_releaseack && w_rprobeacklast && w_pprobeacklast && !w_grantfirst
  // The above rules ensure we will block and not nest an outer probe while still doing our
  // own inner probes. Thus every probe wakes exactly one MSHR.
  io.status.bits.blockC := !meta_valid
  io.status.bits.nestC  := meta_valid && (!w_rprobeackfirst || !w_pprobeackfirst || !w_grantfirst)
  // The w_grantfirst in nestC is necessary to deal with:
  //   acquire waiting for grant, inner release gets queued, outer probe -> inner probe -> deadlock
  // ... this is possible because the release+probe can be for same set, but different tag

  // We can only demand: block, nest, or queue
  assert (!io.status.bits.nestB || !io.status.bits.blockB)
  assert (!io.status.bits.nestC || !io.status.bits.blockC)

  // Scheduler requests
  // no_wait: 表示没有等待的了，只有execute和writeback了，这两个可以同时做，应该一拍就结束了。
  val no_wait = w_rprobeacklast && w_releaseack && w_grantlast && w_pprobeacklast && w_grantack
  // a.valid，等release发出去再发acquire；**为什么要加s_pprobe？？**
  io.schedule.bits.a.valid := !s_acquire && s_release && s_pprobe
  io.schedule.bits.b.valid := !s_rprobe || !s_pprobe
  io.schedule.bits.c.valid := (!s_release && w_rprobeackfirst) || (!s_probeack && w_pprobeackfirst)
  io.schedule.bits.d.valid := !s_execute && w_pprobeack && w_grant
  io.schedule.bits.e.valid := !s_grantack && w_grantfirst
  io.schedule.bits.x.valid := !s_flush && w_releaseack
  io.schedule.bits.dir.valid := (!s_release && w_rprobeackfirst) || (!s_writeback && no_wait)
  io.schedule.bits.reload := no_wait
  io.schedule.valid := io.schedule.bits.a.valid || io.schedule.bits.b.valid || io.schedule.bits.c.valid ||
                       io.schedule.bits.d.valid || io.schedule.bits.e.valid || io.schedule.bits.x.valid ||
                       io.schedule.bits.dir.valid

  // Schedule completions
  when (io.schedule.ready) {
    // 因为s_rprobe/s_pprobe的优先级是最高的，然后它两又是互斥的，所以，一ready就可以把它两置为true了。
                                    s_rprobe     := Bool(true)
    when (w_rprobeackfirst)       { s_release    := Bool(true) }
                                    s_pprobe     := Bool(true)
    when (s_release && s_pprobe)  { s_acquire    := Bool(true) }
    when (w_releaseack)           { s_flush      := Bool(true) }
    when (w_pprobeackfirst)       { s_probeack   := Bool(true) }
    when (w_grantfirst)           { s_grantack   := Bool(true) }
    when (w_pprobeack && w_grant) { s_execute    := Bool(true) }
    when (no_wait)                { s_writeback  := Bool(true) }
    // Await the next operation
    when (no_wait) {
      request_valid := Bool(false)
      meta_valid := Bool(false)
    }
  }

  // Resulting meta-data
  val final_meta_writeback = Wire(init = meta)

  val req_clientBit = params.clientBit(request.source)
  val req_needT = needT(request.opcode, request.param)
  val req_acquire = request.opcode === AcquireBlock || request.opcode === AcquirePerm
  val meta_no_clients = !meta.clients.orR
  val req_promoteT = req_acquire && Mux(meta.hit, meta_no_clients && meta.state === TIP, gotT)

  when (request.prio(2) && Bool(!params.firstLevel)) { // always a hit
    // 如果是releaseData BtoN，会把dirty拉高了，这里可能有问题。
    final_meta_writeback.dirty   := meta.dirty || request.opcode(0)
    final_meta_writeback.state   := Mux(request.param =/= TtoT && meta.state === TRUNK, TIP, meta.state)
    final_meta_writeback.clients := meta.clients & ~Mux(isToN(request.param), req_clientBit, UInt(0))
    final_meta_writeback.hit     := Bool(true) // chained requests are hits
  } .elsewhen (request.control && Bool(params.control)) { // request.prio(0)
    when (meta.hit) {
      final_meta_writeback.dirty   := Bool(false)
      final_meta_writeback.state   := INVALID
      final_meta_writeback.clients := meta.clients & ~probes_toN
    }
    final_meta_writeback.hit := Bool(false)
  } .otherwise {
    final_meta_writeback.dirty := (meta.hit && meta.dirty) || !request.opcode(2)
    final_meta_writeback.state := Mux(req_needT,
                                    Mux(req_acquire, TRUNK, TIP),
                                    Mux(!meta.hit, Mux(gotT, Mux(req_acquire, TRUNK, TIP), BRANCH),
                                      MuxLookup(meta.state, UInt(0, width=2), Seq(
                                        INVALID -> BRANCH,
                                        BRANCH  -> BRANCH,
                                        TRUNK   -> TIP,
                                        TIP     -> Mux(meta_no_clients && req_acquire, TRUNK, TIP)))))
    final_meta_writeback.clients := Mux(meta.hit, meta.clients & ~probes_toN, UInt(0)) |
                                    Mux(req_acquire, req_clientBit, UInt(0))
    final_meta_writeback.tag := request.tag
    final_meta_writeback.hit := Bool(true)
  }

  when (bad_grant) {
    // 如果下面给了grant denied。
    when (meta.hit) {
      // 如果hit了，那就是自己是B，要去请求T权限，这是下面denied了，就把状态保持不变，还是hit，还是dirty=0，还是branch，client把toN的给干掉（这里可能是上面有两个B，要把另一个B干掉，所以要把client改一下，也就是下面的denied不会取消干掉另一个B的动作）。
      // upgrade failed (B -> T)
      assert (!meta_valid || meta.state === BRANCH)
      final_meta_writeback.hit     := Bool(true)
      final_meta_writeback.dirty   := Bool(false)
      final_meta_writeback.state   := BRANCH
      final_meta_writeback.clients := meta.clients & ~probes_toN
    } .otherwise {
      // 如果miss了，自己就是N，有可能有一个cacheline被踢出去了，所以把该位置填为INVALID。
      // failed N -> (T or B)
      final_meta_writeback.hit     := Bool(false)
      final_meta_writeback.dirty   := Bool(false)
      final_meta_writeback.state   := INVALID
      final_meta_writeback.clients := UInt(0)
    }
  }

  val invalid = Wire(new DirectoryEntry(params))
  invalid.dirty   := Bool(false)
  invalid.state   := INVALID
  invalid.clients := UInt(0)
  invalid.tag     := UInt(0)

  // 如果上面请求BtoT，但是它可能被我的probe给打断了，也就是它可能是N了，因此我们要看meta.client是否真的有，这个是最新的状态。如果它没有了，那我就要发NtoT并带数据给它，不然就完蛋了。
  // Just because a client says BtoT, by the time we process the request he may be N.
  // Therefore, we must consult our own meta-data state to confirm he owns the line still.
  val honour_BtoT = meta.hit && (meta.clients & req_clientBit).orR

  // 自己发送的请求，有些是要排除掉自己的，比如acquire/get；有些是不要排除掉自己的，比如put。
  // The client asking us to act is proof they don't have permissions.
  val excluded_client = Mux(meta.hit && request.prio(0) && skipProbeN(request.opcode), req_clientBit, UInt(0))
  io.schedule.bits.a.bits.tag     := request.tag
  io.schedule.bits.a.bits.set     := request.set
  io.schedule.bits.a.bits.param   := Mux(req_needT, Mux(meta.hit, BtoT, NtoT), NtoB)
  // block，在sourceA里面决定是发acquire block还是发acquire perm。有两种情况发perm，一种是上面发了perm，另一种是putfullData且size是一条cacheline大小（已经要写完整的cacheline了，下面的数据是什么已经不重要了）。
  io.schedule.bits.a.bits.block   := request.size =/= UInt(log2Ceil(params.cache.blockBytes)) ||
                                     !(request.opcode === PutFullData || request.opcode === AcquirePerm)
  io.schedule.bits.a.bits.source  := UInt(0)
  io.schedule.bits.b.bits.param   := Mux(!s_rprobe, toN, Mux(request.prio(1), request.param, Mux(req_needT, toN, toB)))
  io.schedule.bits.b.bits.tag     := Mux(!s_rprobe, meta.tag, request.tag)
  io.schedule.bits.b.bits.set     := request.set
  io.schedule.bits.b.bits.clients := meta.clients & ~excluded_client
  io.schedule.bits.c.bits.opcode  := Mux(meta.dirty, ReleaseData, Release)
  io.schedule.bits.c.bits.param   := Mux(meta.state === BRANCH, BtoN, TtoN)
  io.schedule.bits.c.bits.source  := UInt(0)
  io.schedule.bits.c.bits.tag     := meta.tag
  io.schedule.bits.c.bits.set     := request.set
  io.schedule.bits.c.bits.way     := meta.way
  io.schedule.bits.c.bits.dirty   := meta.dirty
  io.schedule.bits.d.bits         := request
  io.schedule.bits.d.bits.param   := Mux(!req_acquire, request.param,
                                       MuxLookup(request.param, Wire(request.param), Seq(
                                         NtoB -> Mux(req_promoteT, NtoT, NtoB),
                                         BtoT -> Mux(honour_BtoT,  BtoT, NtoT),
                                         NtoT -> NtoT)))
  io.schedule.bits.d.bits.sink    := UInt(0)
  io.schedule.bits.d.bits.way     := meta.way
  io.schedule.bits.d.bits.bad     := bad_grant
  io.schedule.bits.e.bits.sink    := sink
  io.schedule.bits.x.bits.fail    := Bool(false)
  io.schedule.bits.dir.bits.set   := request.set
  io.schedule.bits.dir.bits.way   := meta.way
  // 会有两次写directory，在release的时候会写一次INVALID（防止插队的请求读到错误的dir），在结束请求的时候写一次。
  io.schedule.bits.dir.bits.data  := Mux(!s_release, invalid, Wire(new DirectoryEntry(params), init = final_meta_writeback))

  // Coverage of state transitions
  def cacheState(entry: DirectoryEntry, hit: Bool) = {
    val out = Wire(UInt())
    val c = entry.clients.orR
    val d = entry.dirty
    switch (entry.state) {
      is (BRANCH)  { out := Mux(c, S_BRANCH_C.code, S_BRANCH.code) }
      is (TRUNK)   { out := Mux(d, S_TRUNK_CD.code, S_TRUNK_C.code) }
      is (TIP)     { out := Mux(c, Mux(d, S_TIP_CD.code, S_TIP_C.code), Mux(d, S_TIP_D.code, S_TIP.code)) }
      is (INVALID) { out := S_INVALID.code }
    }
    when (!hit) { out := S_INVALID.code }
    out
  }

  val p = !params.lastLevel  // can be probed
  val c = !params.firstLevel // can be acquired
  val m = params.inner.client.clients.exists(!_.supports.probe)   // can be written (or read)
  val r = params.outer.manager.managers.exists(!_.alwaysGrantsT) // read-only devices exist
  val f = params.control     // flush control register exists
  val cfg = (p, c, m, r, f)
  val b = r || p // can reach branch state (via probe downgrade or read-only device)

  // The cache must be used for something or we would not be here
  require(c || m)

  val evict = cacheState(meta, !meta.hit)
  val before = cacheState(meta, meta.hit)
  val after  = cacheState(final_meta_writeback, Bool(true))

  def eviction(from: CacheState, cover: Boolean)(implicit sourceInfo: SourceInfo) {
    if (cover) {
      params.ccover(evict === from.code, s"MSHR_${from}_EVICT", s"State transition from ${from} to evicted ${cfg}")
    } else {
      assert(!(evict === from.code), s"State transition from ${from} to evicted should be impossible ${cfg}")
    }
    if (cover && f) {
      params.ccover(before === from.code, s"MSHR_${from}_FLUSH", s"State transition from ${from} to flushed ${cfg}")
    } else {
      assert(!(before === from.code), s"State transition from ${from} to flushed should be impossible ${cfg}")
    }
  }

  def transition(from: CacheState, to: CacheState, cover: Boolean)(implicit sourceInfo: SourceInfo) {
    if (cover) {
      params.ccover(before === from.code && after === to.code, s"MSHR_${from}_${to}", s"State transition from ${from} to ${to} ${cfg}")
    } else {
      assert(!(before === from.code && after === to.code), s"State transition from ${from} to ${to} should be impossible ${cfg}")
    }
  }

  when ((!s_release && w_rprobeackfirst) && io.schedule.ready) {
    eviction(S_BRANCH,    b)      // MMIO read to read-only device
    eviction(S_BRANCH_C,  b && c) // you need children to become C
    eviction(S_TIP,       true)   // MMIO read || clean release can lead to this state
    eviction(S_TIP_C,     c)      // needs two clients || client + mmio || downgrading client
    eviction(S_TIP_CD,    c)      // needs two clients || client + mmio || downgrading client
    eviction(S_TIP_D,     true)   // MMIO write || dirty release lead here
    eviction(S_TRUNK_C,   c)      // acquire for write
    eviction(S_TRUNK_CD,  c)      // dirty release then reacquire
  }

  when ((!s_writeback && no_wait) && io.schedule.ready) {
    transition(S_INVALID,  S_BRANCH,   b && m) // only MMIO can bring us to BRANCH state
    transition(S_INVALID,  S_BRANCH_C, b && c) // C state is only possible if there are inner caches
    transition(S_INVALID,  S_TIP,      m)      // MMIO read
    transition(S_INVALID,  S_TIP_C,    false)  // we would go S_TRUNK_C instead
    transition(S_INVALID,  S_TIP_CD,   false)  // acquire does not cause dirty immediately
    transition(S_INVALID,  S_TIP_D,    m)      // MMIO write
    transition(S_INVALID,  S_TRUNK_C,  c)      // acquire
    transition(S_INVALID,  S_TRUNK_CD, false)  // acquire does not cause dirty immediately

    transition(S_BRANCH,   S_INVALID,  b && p) // probe can do this (flushes run as evictions)
    transition(S_BRANCH,   S_BRANCH_C, b && c) // acquire
    transition(S_BRANCH,   S_TIP,      b && m) // prefetch write
    transition(S_BRANCH,   S_TIP_C,    false)  // we would go S_TRUNK_C instead
    transition(S_BRANCH,   S_TIP_CD,   false)  // acquire does not cause dirty immediately
    transition(S_BRANCH,   S_TIP_D,    b && m) // MMIO write
    transition(S_BRANCH,   S_TRUNK_C,  b && c) // acquire
    transition(S_BRANCH,   S_TRUNK_CD, false)  // acquire does not cause dirty immediately

    transition(S_BRANCH_C, S_INVALID,  b && c && p)
    transition(S_BRANCH_C, S_BRANCH,   b && c)      // clean release (optional)
    transition(S_BRANCH_C, S_TIP,      b && c && m) // prefetch write
    transition(S_BRANCH_C, S_TIP_C,    false)       // we would go S_TRUNK_C instead
    transition(S_BRANCH_C, S_TIP_D,    b && c && m) // MMIO write
    transition(S_BRANCH_C, S_TIP_CD,   false)       // going dirty means we must shoot down clients
    transition(S_BRANCH_C, S_TRUNK_C,  b && c)      // acquire
    transition(S_BRANCH_C, S_TRUNK_CD, false)       // acquire does not cause dirty immediately

    transition(S_TIP,      S_INVALID,  p)
    transition(S_TIP,      S_BRANCH,   p)      // losing TIP only possible via probe
    transition(S_TIP,      S_BRANCH_C, false)  // we would go S_TRUNK_C instead
    transition(S_TIP,      S_TIP_C,    false)  // we would go S_TRUNK_C instead
    transition(S_TIP,      S_TIP_D,    m)      // direct dirty only via MMIO write
    transition(S_TIP,      S_TIP_CD,   false)  // acquire does not make us dirty immediately
    transition(S_TIP,      S_TRUNK_C,  c)      // acquire
    transition(S_TIP,      S_TRUNK_CD, false)  // acquire does not make us dirty immediately

    transition(S_TIP_C,    S_INVALID,  c && p)
    transition(S_TIP_C,    S_BRANCH,   c && p) // losing TIP only possible via probe
    transition(S_TIP_C,    S_BRANCH_C, c && p) // losing TIP only possible via probe
    transition(S_TIP_C,    S_TIP,      c)      // probed while MMIO read || clean release (optional)
    transition(S_TIP_C,    S_TIP_D,    c && m) // direct dirty only via MMIO write
    transition(S_TIP_C,    S_TIP_CD,   false)  // going dirty means we must shoot down clients
    transition(S_TIP_C,    S_TRUNK_C,  c)      // acquire
    transition(S_TIP_C,    S_TRUNK_CD, false)  // acquire does not make us immediately dirty

    transition(S_TIP_D,    S_INVALID,  p)
    transition(S_TIP_D,    S_BRANCH,   p)      // losing D is only possible via probe
    transition(S_TIP_D,    S_BRANCH_C, p && c) // probed while acquire shared
    transition(S_TIP_D,    S_TIP,      p)      // probed while MMIO read || outer probe.toT (optional)
    transition(S_TIP_D,    S_TIP_C,    false)  // we would go S_TRUNK_C instead
    transition(S_TIP_D,    S_TIP_CD,   false)  // we would go S_TRUNK_CD instead
    transition(S_TIP_D,    S_TRUNK_C,  p && c) // probed while acquired
    transition(S_TIP_D,    S_TRUNK_CD, c)      // acquire

    transition(S_TIP_CD,   S_INVALID,  c && p)
    transition(S_TIP_CD,   S_BRANCH,   c && p) // losing D is only possible via probe
    transition(S_TIP_CD,   S_BRANCH_C, c && p) // losing D is only possible via probe
    transition(S_TIP_CD,   S_TIP,      c && p) // probed while MMIO read || outer probe.toT (optional)
    transition(S_TIP_CD,   S_TIP_C,    false)  // we would go S_TRUNK_C instead
    transition(S_TIP_CD,   S_TIP_D,    c)      // MMIO write || clean release (optional)
    transition(S_TIP_CD,   S_TRUNK_C,  c && p) // probed while acquire
    transition(S_TIP_CD,   S_TRUNK_CD, c)      // acquire

    transition(S_TRUNK_C,  S_INVALID,  c && p)
    transition(S_TRUNK_C,  S_BRANCH,   c && p) // losing TIP only possible via probe
    transition(S_TRUNK_C,  S_BRANCH_C, c && p) // losing TIP only possible via probe
    transition(S_TRUNK_C,  S_TIP,      c)      // MMIO read || clean release (optional)
    transition(S_TRUNK_C,  S_TIP_C,    c)      // bounce shared
    transition(S_TRUNK_C,  S_TIP_D,    c)      // dirty release
    transition(S_TRUNK_C,  S_TIP_CD,   c)      // dirty bounce shared
    transition(S_TRUNK_C,  S_TRUNK_CD, c)      // dirty bounce

    transition(S_TRUNK_CD, S_INVALID,  c && p)
    transition(S_TRUNK_CD, S_BRANCH,   c && p) // losing D only possible via probe
    transition(S_TRUNK_CD, S_BRANCH_C, c && p) // losing D only possible via probe
    transition(S_TRUNK_CD, S_TIP,      c && p) // probed while MMIO read || outer probe.toT (optional)
    transition(S_TRUNK_CD, S_TIP_C,    false)  // we would go S_TRUNK_C instead
    transition(S_TRUNK_CD, S_TIP_D,    c)      // dirty release
    transition(S_TRUNK_CD, S_TIP_CD,   c)      // bounce shared
    transition(S_TRUNK_CD, S_TRUNK_C,  c && p) // probed while acquire
  }

  // Handle response messages
  // clientBit，一个client可以包括多个source，所以这里是看source是否在这个client的range内。
  val probe_bit = params.clientBit(io.sinkc.bits.source)
  val last_probe = (probes_done | probe_bit) === (meta.clients & ~excluded_client)
  val probe_toN = isToN(io.sinkc.bits.param)
  // 这里只判了sinkC.valid，因为在sinkC模块里只有probeAck会拉高它，而release不是走这条路，release是会去走allocate，去scheduler，判断是否能插队，如果能就插队，不能就进request queue。
  if (!params.firstLevel) when (io.sinkc.valid) {
    params.ccover( probe_toN && io.schedule.bits.b.bits.param === toB, "MSHR_PROBE_FULL", "Client downgraded to N when asked only to do B")
    params.ccover(!probe_toN && io.schedule.bits.b.bits.param === toB, "MSHR_PROBE_HALF", "Client downgraded to B when asked only to do B")
    // Caution: the probe matches us only in set.
    // We would never allow an outer probe to nest until both w_[rp]probeack complete, so
    // it is safe to just unguardedly update the probe FSM.
    probes_done := probes_done | probe_bit
    probes_toN := probes_toN | Mux(probe_toN, probe_bit, UInt(0))
    probes_noT := probes_noT || io.sinkc.bits.param =/= TtoT
    w_rprobeackfirst := w_rprobeackfirst || last_probe
    w_rprobeacklast := w_rprobeacklast || (last_probe && io.sinkc.bits.last)
    w_pprobeackfirst := w_pprobeackfirst || last_probe
    w_pprobeacklast := w_pprobeacklast || (last_probe && io.sinkc.bits.last)
    // Allow wormhole routing from sinkC if the first request beat has offset 0
    // 为什么要加上request.offset==0？正常的cacheline操作都是offset==0，而那些put/get/amo是有可能从中间开始操作的。如果是offset==0，那在接收到第一笔的时候就能进入下一个状态去走流水了，因为后面的请求不可能越过它进行操作了；而如果offset！=0的时候，就必须要等到last才能往后走了。（见前面bankedStore的noop注释）
    val set_pprobeack = last_probe && (io.sinkc.bits.last || request.offset === UInt(0))
    w_pprobeack := w_pprobeack || set_pprobeack
    params.ccover(!set_pprobeack && w_rprobeackfirst, "MSHR_PROBE_SERIAL", "Sequential routing of probe response data")
    params.ccover( set_pprobeack && w_rprobeackfirst, "MSHR_PROBE_WORMHOLE", "Wormhole routing of probe response data")
    // However, meta-data updates need to be done more cautiously
    when (meta.state =/= INVALID && io.sinkc.bits.tag === meta.tag && io.sinkc.bits.data) { meta.dirty := Bool(true) } // !!!
  }
  when (io.sinkd.valid) {
    when (io.sinkd.bits.opcode === Grant || io.sinkd.bits.opcode === GrantData) {
      sink := io.sinkd.bits.sink
      w_grantfirst := Bool(true)
      w_grantlast := io.sinkd.bits.last
      // Record if we need to prevent taking ownership
      bad_grant := io.sinkd.bits.denied
      // Allow wormhole routing for requests whose first beat has offset 0
      w_grant := request.offset === UInt(0) || io.sinkd.bits.last
      params.ccover(io.sinkd.bits.opcode === GrantData && request.offset === UInt(0), "MSHR_GRANT_WORMHOLE", "Wormhole routing of grant response data")
      params.ccover(io.sinkd.bits.opcode === GrantData && request.offset =/= UInt(0), "MSHR_GRANT_SERIAL", "Sequential routing of grant response data")
      gotT := io.sinkd.bits.param === toT
    }
    .elsewhen (io.sinkd.bits.opcode === ReleaseAck) {
      w_releaseack := Bool(true)
    }
  }
  when (io.sinke.valid) {
    w_grantack := Bool(true)
  }

  // Bootstrap new requests
  val allocate_as_full = Wire(new FullRequest(params), init = io.allocate.bits)
  val new_meta = Mux(io.allocate.valid && io.allocate.bits.repeat, final_meta_writeback, io.directory.bits)
  val new_request = Mux(io.allocate.valid, allocate_as_full, request)
  val new_needT = needT(new_request.opcode, new_request.param)
  val new_clientBit = params.clientBit(new_request.source)
  val new_skipProbe = Mux(skipProbeN(new_request.opcode), new_clientBit, UInt(0))

  val prior = cacheState(final_meta_writeback, Bool(true))
  def bypass(from: CacheState, cover: Boolean)(implicit sourceInfo: SourceInfo) {
    if (cover) {
      params.ccover(prior === from.code, s"MSHR_${from}_BYPASS", s"State bypass transition from ${from} ${cfg}")
    } else {
      assert(!(prior === from.code), s"State bypass from ${from} should be impossible ${cfg}")
    }
  }

  when (io.allocate.valid && io.allocate.bits.repeat) {
    bypass(S_INVALID,   f || p) // Can lose permissions (probe/flush)
    bypass(S_BRANCH,    b)      // MMIO read to read-only device
    bypass(S_BRANCH_C,  b && c) // you need children to become C
    bypass(S_TIP,       true)   // MMIO read || clean release can lead to this state
    bypass(S_TIP_C,     c)      // needs two clients || client + mmio || downgrading client
    bypass(S_TIP_CD,    c)      // needs two clients || client + mmio || downgrading client
    bypass(S_TIP_D,     true)   // MMIO write || dirty release lead here
    bypass(S_TRUNK_C,   c)      // acquire for write
    bypass(S_TRUNK_CD,  c)      // dirty release then reacquire
  }

  when (io.allocate.valid) {
    assert (!request_valid || (no_wait && io.schedule.fire()))
    request_valid := Bool(true)
    request := io.allocate.bits
  }

  // Create execution plan
  when (io.directory.valid || (io.allocate.valid && io.allocate.bits.repeat)) {
    meta_valid := Bool(true)
    meta := new_meta
    probes_done := UInt(0)
    probes_toN := UInt(0)
    probes_noT := Bool(false)
    gotT := Bool(false)
    bad_grant := Bool(false)

    // These should already be either true or turning true
    // We clear them here explicitly to simplify the mux tree
    s_rprobe         := Bool(true)
    w_rprobeackfirst := Bool(true)
    w_rprobeacklast  := Bool(true)
    s_release        := Bool(true)
    w_releaseack     := Bool(true)
    s_pprobe         := Bool(true)
    s_acquire        := Bool(true)
    s_flush          := Bool(true)
    w_grantfirst     := Bool(true)
    w_grantlast      := Bool(true)
    w_grant          := Bool(true)
    w_pprobeackfirst := Bool(true)
    w_pprobeacklast  := Bool(true)
    w_pprobeack      := Bool(true)
    s_probeack       := Bool(true)
    s_grantack       := Bool(true)
    s_execute        := Bool(true)
    w_grantack       := Bool(true)
    s_writeback      := Bool(true)

    // For C channel requests (ie: Release[Data])
    when (new_request.prio(2) && Bool(!params.firstLevel)) {
      s_execute := Bool(false)
      // Do we need to go dirty?
      when (new_request.opcode(0) && !new_meta.dirty) {
        s_writeback := Bool(false)
      }
      // Does our state change?
      when (isToB(new_request.param) && new_meta.state === TRUNK) {
        s_writeback := Bool(false)
      }
      // Do our clients change?
      when (isToN(new_request.param) && (new_meta.clients & new_clientBit) =/= UInt(0)) {
        s_writeback := Bool(false)
      }
      assert (new_meta.hit)
    }
    // For X channel requests (ie: flush)
    .elsewhen (new_request.control && Bool(params.control)) { // new_request.prio(0)
      s_flush := Bool(false)
      // Do we need to actually do something?
      when (new_meta.hit) {
        s_release := Bool(false)
        w_releaseack := Bool(false)
        // Do we need to shoot-down inner caches?
        when (Bool(!params.firstLevel) && (new_meta.clients =/= UInt(0))) {
          s_rprobe := Bool(false)
          w_rprobeackfirst := Bool(false)
          w_rprobeacklast := Bool(false)
        }
      }
    }
    // For A channel requests
    .otherwise { // new_request.prio(0) && !new_request.control
      s_execute := Bool(false)
      // Do we need an eviction?
      when (!new_meta.hit && new_meta.state =/= INVALID) {
        s_release := Bool(false)
        w_releaseack := Bool(false)
        // Do we need to shoot-down inner caches?
        when (Bool(!params.firstLevel) & (new_meta.clients =/= UInt(0))) {
          s_rprobe := Bool(false)
          w_rprobeackfirst := Bool(false)
          w_rprobeacklast := Bool(false)
        }
      }
      // Do we need an acquire?
      when (!new_meta.hit || (new_meta.state === BRANCH && new_needT)) {
        s_acquire := Bool(false)
        w_grantfirst := Bool(false)
        w_grantlast := Bool(false)
        w_grant := Bool(false)
        s_grantack := Bool(false)
        s_writeback := Bool(false)
      }
      // Do we need a probe?
      when (Bool(!params.firstLevel) && (new_meta.hit &&
            (new_needT || new_meta.state === TRUNK) &&
            (new_meta.clients & ~new_skipProbe) =/= UInt(0))) {
        s_pprobe := Bool(false)
        w_pprobeackfirst := Bool(false)
        w_pprobeacklast := Bool(false)
        w_pprobeack := Bool(false)
        s_writeback := Bool(false)
      }
      // Do we need a grantack?
      when (new_request.opcode === AcquireBlock || new_request.opcode === AcquirePerm) {
        w_grantack := Bool(false)
        s_writeback := Bool(false)
      }
      // Becomes dirty?
      when (!new_request.opcode(2) && new_meta.hit && !new_meta.dirty) {
        s_writeback := Bool(false)
      }
    }
  }
}

scheduler

scheduler主要负责调度请求。

class Scheduler(params: InclusiveCacheParameters) extends Module
{
  val io = new Bundle {
    val in = TLBundle(params.inner.bundle).flip
    val out = TLBundle(params.outer.bundle)
    // Way permissions
    val ways = Vec(params.allClients, UInt(width = params.cache.ways)).flip
    val divs = Vec(params.allClients, UInt(width = InclusiveCacheParameters.lfsrBits + 1)).flip
    // Control port
    val req = Decoupled(new SinkXRequest(params)).flip
    val resp = Decoupled(new SourceXRequest(params))
  }

  val sourceA = Module(new SourceA(params))
  val sourceB = Module(new SourceB(params))
  val sourceC = Module(new SourceC(params))
  val sourceD = Module(new SourceD(params))
  val sourceE = Module(new SourceE(params))
  val sourceX = Module(new SourceX(params))

  io.out.a <> sourceA.io.a
  io.out.c <> sourceC.io.c
  io.out.e <> sourceE.io.e
  io.in.b <> sourceB.io.b
  io.in.d <> sourceD.io.d
  io.resp <> sourceX.io.x

  val sinkA = Module(new SinkA(params))
  val sinkC = Module(new SinkC(params))
  val sinkD = Module(new SinkD(params))
  val sinkE = Module(new SinkE(params))
  val sinkX = Module(new SinkX(params))

  sinkA.io.a <> io.in.a
  sinkC.io.c <> io.in.c
  sinkE.io.e <> io.in.e
  sinkD.io.d <> io.out.d
  sinkX.io.x <> io.req

  io.out.b.ready := Bool(true) // disconnected

  val directory = Module(new Directory(params))
  val bankedStore = Module(new BankedStore(params))
  // 3*mshrs个队列。a排a的队，c排c的队，b排b的队。
  val requests = Module(new ListBuffer(ListBufferParameters(new QueuedRequest(params), 3*params.mshrs, params.secondary, false)))
  // 为什么是一个bc_mshr和一个c_mshr？概率比较小，一个就够了，而且都是存插队的。
  val mshrs = Seq.fill(params.mshrs) { Module(new MSHR(params)) }
  val abc_mshrs = mshrs.init.init
  val bc_mshr = mshrs.init.last
  val c_mshr = mshrs.last
  val nestedwb = Wire(new NestedWriteback(params))

  // Deliver messages from Sinks to MSHRs
  mshrs.zipWithIndex.foreach { case (m, i) =>
    m.io.sinkc.valid := sinkC.io.resp.valid && sinkC.io.resp.bits.set === m.io.status.bits.set
    m.io.sinkd.valid := sinkD.io.resp.valid && sinkD.io.resp.bits.source === UInt(i)
    m.io.sinke.valid := sinkE.io.resp.valid && sinkE.io.resp.bits.sink   === UInt(i)
    m.io.sinkc.bits := sinkC.io.resp.bits
    m.io.sinkd.bits := sinkD.io.resp.bits
    m.io.sinke.bits := sinkE.io.resp.bits
    m.io.nestedwb := nestedwb
  }

  // If the pre-emption BC or C MSHR have a matching set, the normal MSHR must be blocked
  val mshr_stall_abc = abc_mshrs.map { m =>
    (bc_mshr.io.status.valid && m.io.status.bits.set === bc_mshr.io.status.bits.set) ||
    ( c_mshr.io.status.valid && m.io.status.bits.set ===  c_mshr.io.status.bits.set)
  }
  val mshr_stall_bc =
    c_mshr.io.status.valid && bc_mshr.io.status.bits.set === c_mshr.io.status.bits.set
  val mshr_stall_c = Bool(false)
  val mshr_stall = mshr_stall_abc :+ mshr_stall_bc :+ mshr_stall_c


  val stall_abc = (mshr_stall_abc zip abc_mshrs) map { case (s, m) => s && m.io.status.valid }
  if (!params.lastLevel || !params.firstLevel)
    params.ccover(stall_abc.reduce(_||_), "SCHEDULER_ABC_INTERLOCK", "ABC MSHR interlocked due to pre-emption")
  if (!params.lastLevel)
    params.ccover(mshr_stall_bc && bc_mshr.io.status.valid, "SCHEDULER_BC_INTERLOCK", "BC MSHR interlocked due to pre-emption")

  // Consider scheduling an MSHR only if all the resources it requires are available
  val mshr_request = Cat((mshrs zip mshr_stall).map { case (m, s) =>
    m.io.schedule.valid && !s &&
      (sourceA.io.req.ready || !m.io.schedule.bits.a.valid) &&
      (sourceB.io.req.ready || !m.io.schedule.bits.b.valid) &&
      (sourceC.io.req.ready || !m.io.schedule.bits.c.valid) &&
      (sourceD.io.req.ready || !m.io.schedule.bits.d.valid) &&
      (sourceE.io.req.ready || !m.io.schedule.bits.e.valid) &&
      (sourceX.io.req.ready || !m.io.schedule.bits.x.valid) &&
      (directory.io.write.ready || !m.io.schedule.bits.dir.valid)
  }.reverse)

  // Round-robin arbitration of MSHRs
  val robin_filter = RegInit(UInt(0, width = params.mshrs))
  val robin_request = Cat(mshr_request, mshr_request & robin_filter)
  val mshr_selectOH2 = ~(leftOR(robin_request) << 1) & robin_request
  val mshr_selectOH = mshr_selectOH2(2*params.mshrs-1, params.mshrs) | mshr_selectOH2(params.mshrs-1, 0)
  val mshr_select = OHToUInt(mshr_selectOH)
  val schedule = Mux1H(mshr_selectOH, mshrs.map(_.io.schedule.bits))
  val scheduleTag = Mux1H(mshr_selectOH, mshrs.map(_.io.status.bits.tag))
  val scheduleSet = Mux1H(mshr_selectOH, mshrs.map(_.io.status.bits.set))

  // When an MSHR wins the schedule, it has lowest priority next time
  when (mshr_request.orR()) { robin_filter := ~rightOR(mshr_selectOH) }

  // Fill in which MSHR sends the request
  schedule.a.bits.source := mshr_select
  // c.source为什么填0，因为c通道的probeAck是用地址寻址的，而不是用source寻址。在第77行，接收的时候去到哪个MSRH的。
  schedule.c.bits.source := Mux(schedule.c.bits.opcode(1), mshr_select, UInt(0)) // only set for Release[Data] not ProbeAck[Data]
  schedule.d.bits.sink   := mshr_select

  sourceA.io.req := schedule.a
  sourceB.io.req := schedule.b
  sourceC.io.req := schedule.c
  sourceD.io.req := schedule.d
  sourceE.io.req := schedule.e
  sourceX.io.req := schedule.x
  directory.io.write := schedule.dir

  // Forward meta-data changes from nested transaction completion
  val select_c  = mshr_selectOH(params.mshrs-1)
  val select_bc = mshr_selectOH(params.mshrs-2)
  nestedwb.set   := Mux(select_c, c_mshr.io.status.bits.set, bc_mshr.io.status.bits.set)
  nestedwb.tag   := Mux(select_c, c_mshr.io.status.bits.tag, bc_mshr.io.status.bits.tag)
  nestedwb.b_toN       := select_bc && bc_mshr.io.schedule.bits.dir.valid && bc_mshr.io.schedule.bits.dir.bits.data.state === MetaData.INVALID
  nestedwb.b_toB       := select_bc && bc_mshr.io.schedule.bits.dir.valid && bc_mshr.io.schedule.bits.dir.bits.data.state === MetaData.BRANCH
  // selec_bc一定是clr_dirty吗？不可能set_dirty？因为nestB只能进bc_mshr，nestC只能进c_mshr。而alloc的b可以进abc/bc，alloc的c能进bc/c，alloc的a能进abc。
  nestedwb.b_clr_dirty := select_bc && bc_mshr.io.schedule.bits.dir.valid
  nestedwb.c_set_dirty := select_c  &&  c_mshr.io.schedule.bits.dir.valid && c_mshr.io.schedule.bits.dir.bits.data.dirty

  // Pick highest priority request
  val request = Wire(Decoupled(new FullRequest(params)))
  request.valid := directory.io.ready && (sinkA.io.req.valid || sinkX.io.req.valid || sinkC.io.req.valid)
  request.bits := Mux(sinkC.io.req.valid, sinkC.io.req.bits,
                  Mux(sinkX.io.req.valid, sinkX.io.req.bits, sinkA.io.req.bits))
  sinkC.io.req.ready := directory.io.ready && request.ready
  sinkX.io.req.ready := directory.io.ready && request.ready && !sinkC.io.req.valid
  sinkA.io.req.ready := directory.io.ready && request.ready && !sinkC.io.req.valid && !sinkX.io.req.valid

  // If no MSHR has been assigned to this set, we need to allocate one
  val setMatches = Cat(mshrs.map { m => m.io.status.valid && m.io.status.bits.set === request.bits.set }.reverse)
  val alloc = !setMatches.orR() // NOTE: no matches also means no BC or C pre-emption on this set
  // If a same-set MSHR says that requests of this type must be blocked (for bounded time), do it
  val blockB = Mux1H(setMatches, mshrs.map(_.io.status.bits.blockB)) && request.bits.prio(1)
  val blockC = Mux1H(setMatches, mshrs.map(_.io.status.bits.blockC)) && request.bits.prio(2)
  // If a same-set MSHR says that requests of this type must be handled out-of-band, use special BC|C MSHR
  // ... these special MSHRs interlock the MSHR that said it should be pre-empted.
  val nestB  = Mux1H(setMatches, mshrs.map(_.io.status.bits.nestB))  && request.bits.prio(1)
  val nestC  = Mux1H(setMatches, mshrs.map(_.io.status.bits.nestC))  && request.bits.prio(2)
  // Prevent priority inversion; we may not queue to MSHRs beyond our level
  // 第一个含义是：c的可以进c的，第二个含义是：不是a的可以进bc，第三个含义是：所有的都能进。
  val prioFilter = Cat(request.bits.prio(2), !request.bits.prio(0), ~UInt(0, width = params.mshrs-2))
  // 把比我优先级高的给排除掉了。不能去比我优先级高的排队。
  val lowerMatches = setMatches & prioFilter
  // If we match an MSHR <= our priority that neither blocks nor nests us, queue to it.
  // 为什么要区分block和queue这两种，都是挡住吧？因为在某些时刻，需要把他挡在接口上，不能进queue，等到mshr状况明确了，再处理他，因为如果进queue了，mshr就没法知道queue里的东西能不能nest了。具体见scenaro2。
  val queue = lowerMatches.orR() && !nestB && !nestC && !blockB && !blockC

  if (!params.lastLevel) {
    params.ccover(request.valid && blockB, "SCHEDULER_BLOCKB", "Interlock B request while resolving set conflict")
    params.ccover(request.valid && nestB,  "SCHEDULER_NESTB", "Priority escalation from channel B")
  }
  if (!params.firstLevel) {
    params.ccover(request.valid && blockC, "SCHEDULER_BLOCKC", "Interlock C request while resolving set conflict")
    params.ccover(request.valid && nestC,  "SCHEDULER_NESTC", "Priority escalation from channel C")
  }
  params.ccover(request.valid && queue, "SCHEDULER_SECONDARY", "Enqueue secondary miss")

  // It might happen that lowerMatches has >1 bit if the two special MSHRs are in-use
  // We want to Q to the highest matching priority MSHR.
  // lowerMatches1，加1一般是one hot的，这里是把有可能两bit的搞成1bit。
  val lowerMatches1 =
    Mux(lowerMatches(params.mshrs-1), UInt(1 << (params.mshrs-1)),
    Mux(lowerMatches(params.mshrs-2), UInt(1 << (params.mshrs-2)),
    lowerMatches))

  // If this goes to the scheduled MSHR, it may need to be bypassed
  // Alternatively, the MSHR may be refilled from a request queued in the ListBuffer
  val selected_requests = Cat(mshr_selectOH, mshr_selectOH, mshr_selectOH) & requests.io.valid
  // a_pop只是代表a队列有没有东西，不是代表是否pop。
  val a_pop = selected_requests((0 + 1) * params.mshrs - 1, 0 * params.mshrs).orR()
  val b_pop = selected_requests((1 + 1) * params.mshrs - 1, 1 * params.mshrs).orR()
  val c_pop = selected_requests((2 + 1) * params.mshrs - 1, 2 * params.mshrs).orR()
  // bypassMatches, 如果是c通道的请求，就判断c通道没有东西的时候才能bypass。
  val bypassMatches = (mshr_selectOH & lowerMatches1).orR() &&
                      Mux(c_pop || request.bits.prio(2), !c_pop, Mux(b_pop || request.bits.prio(1), !b_pop, !a_pop))
  val may_pop = a_pop || b_pop || c_pop
  // bypass什么意思？request bypass给mshr
  val bypass = request.valid && queue && bypassMatches
  val will_reload = schedule.reload && (may_pop || bypass)
  val will_pop = schedule.reload && may_pop && !bypass

  params.ccover(mshr_selectOH.orR && bypass, "SCHEDULER_BYPASS", "Bypass new request directly to conflicting MSHR")
  params.ccover(mshr_selectOH.orR && will_reload, "SCHEDULER_RELOAD", "Back-to-back service of two requests")
  params.ccover(mshr_selectOH.orR && will_pop, "SCHEDULER_POP", "Service of a secondary miss")

  // Repeat the above logic, but without the fan-in
  mshrs.zipWithIndex.foreach { case (m, i) =>
    val sel = mshr_selectOH(i)
    m.io.schedule.ready := sel
    val a_pop = requests.io.valid(params.mshrs * 0 + i)
    val b_pop = requests.io.valid(params.mshrs * 1 + i)
    val c_pop = requests.io.valid(params.mshrs * 2 + i)
    val bypassMatches = lowerMatches1(i) &&
                        Mux(c_pop || request.bits.prio(2), !c_pop, Mux(b_pop || request.bits.prio(1), !b_pop, !a_pop))
    val may_pop = a_pop || b_pop || c_pop
    val bypass = request.valid && queue && bypassMatches
    val will_reload = m.io.schedule.bits.reload && (may_pop || bypass)
    m.io.allocate.bits := Mux(bypass, Wire(new QueuedRequest(params), init = request.bits), requests.io.data)
    m.io.allocate.bits.set := m.io.status.bits.set
    m.io.allocate.bits.repeat := m.io.allocate.bits.tag === m.io.status.bits.tag
    // 只有reload的时候才会从队列里拿一个出来，因为上一个同set的还没处理完，他也不能从队列出来。
    m.io.allocate.valid := sel && will_reload
  }

  // Determine which of the queued requests to pop (supposing will_pop)
  // 选出a_pop/b_pop/c_pop中优先级高的。
  val prio_requests = ~(~requests.io.valid | (requests.io.valid >> params.mshrs) | (requests.io.valid >> 2*params.mshrs))
  val pop_index = OHToUInt(Cat(mshr_selectOH, mshr_selectOH, mshr_selectOH) & prio_requests)
  requests.io.pop.valid := will_pop
  requests.io.pop.bits  := pop_index

  // 判断队列出来的是否可以repeat，同set的只能进到这个指定的MSHR，如果同tag就可以repeat，就不用去读directory，如果不同tag，就不repeat，就要去读directory。
  // Reload from the Directory if the next MSHR operation changes tags
  val lb_tag_mismatch = scheduleTag =/= requests.io.data.tag
  val mshr_uses_directory_assuming_no_bypass = schedule.reload && may_pop && lb_tag_mismatch
  val mshr_uses_directory_for_lb = will_pop && lb_tag_mismatch
  val mshr_uses_directory = will_reload && scheduleTag =/= Mux(bypass, request.bits.tag, requests.io.data.tag)

  // Is there an MSHR free for this request?
  val mshr_validOH = Cat(mshrs.map(_.io.status.valid).reverse)
  val mshr_free = (~mshr_validOH & prioFilter).orR()

  // Fanout the request to the appropriate handler (if any)
  val bypassQueue = schedule.reload && bypassMatches
  val request_alloc_cases =
     (alloc && !mshr_uses_directory_assuming_no_bypass && mshr_free) ||
     (nestB && !mshr_uses_directory_assuming_no_bypass && !bc_mshr.io.status.valid && !c_mshr.io.status.valid) ||
     (nestC && !mshr_uses_directory_assuming_no_bypass && !c_mshr.io.status.valid)
  request.ready := request_alloc_cases || (queue && (bypassQueue || requests.io.push.ready))
  val alloc_uses_directory = request.valid && request_alloc_cases

  // directory的读是scheduler发起的，在进mshr的同时发出，mshr只是在等结果。
  // When a request goes through, it will need to hit the Directory
  directory.io.read.valid := mshr_uses_directory || alloc_uses_directory
  directory.io.read.bits.set := Mux(mshr_uses_directory_for_lb, scheduleSet,          request.bits.set)
  directory.io.read.bits.tag := Mux(mshr_uses_directory_for_lb, requests.io.data.tag, request.bits.tag)

  // Enqueue the request if not bypassed directly into an MSHR
  requests.io.push.valid := request.valid && queue && !bypassQueue
  requests.io.push.bits.data  := request.bits
  requests.io.push.bits.index := Mux1H(
    request.bits.prio, Seq(
      OHToUInt(lowerMatches1 << params.mshrs*0),
      OHToUInt(lowerMatches1 << params.mshrs*1),
      OHToUInt(lowerMatches1 << params.mshrs*2)))

  // 只有不valid的才会被正常allocate（也就是269到272），而前面reload的MSHR的valid还没有拉低，所以两个不会同时发生。
  // 选择一个mshr，prioFilter把权限不够的给去掉
  val mshr_insertOH = ~(leftOR(~mshr_validOH) << 1) & ~mshr_validOH & prioFilter
  (mshr_insertOH.asBools zip mshrs) map { case (s, m) =>
    // 后面的条件是，有人在抢directory读的接口，所以这边不能alloc。如果这边不能alloc，那request会被挡在接口上，因为request.ready会和这个条件有关，在250行。
    when (request.valid && alloc && s && !mshr_uses_directory_assuming_no_bypass) {
      m.io.allocate.valid := Bool(true)
      m.io.allocate.bits := request.bits
      m.io.allocate.bits.repeat := Bool(false)
    }
  }

  when (request.valid && nestB && !bc_mshr.io.status.valid && !c_mshr.io.status.valid && !mshr_uses_directory_assuming_no_bypass) {
    bc_mshr.io.allocate.valid := Bool(true)
    bc_mshr.io.allocate.bits := request.bits
    bc_mshr.io.allocate.bits.repeat := Bool(false)
    assert (!request.bits.prio(0))
  }
  bc_mshr.io.allocate.bits.prio(0) := Bool(false)

  when (request.valid && nestC && !c_mshr.io.status.valid && !mshr_uses_directory_assuming_no_bypass) {
    c_mshr.io.allocate.valid := Bool(true)
    c_mshr.io.allocate.bits := request.bits
    c_mshr.io.allocate.bits.repeat := Bool(false)
    assert (!request.bits.prio(0))
    assert (!request.bits.prio(1))
  }
  c_mshr.io.allocate.bits.prio(0) := Bool(false)
  c_mshr.io.allocate.bits.prio(1) := Bool(false)

  // Fanout the result of the Directory lookup
  // 在记读directory的结果，应该给谁，之前是谁抢到了directory的读端口。
  val dirTarget = Mux(alloc, mshr_insertOH, Mux(nestB, UInt(1 << (params.mshrs-2)), UInt(1 << (params.mshrs-1))))
  val directoryFanout = params.dirReg(RegNext(Mux(mshr_uses_directory, mshr_selectOH, Mux(alloc_uses_directory, dirTarget, UInt(0)))))
  mshrs.zipWithIndex.foreach { case (m, i) =>
    m.io.directory.valid := directoryFanout(i)
    m.io.directory.bits := directory.io.result.bits
  }

  // MSHR response meta-data fetch
  // 用sinkC的set来查MSHR的set，找到相应的way。
  sinkC.io.way :=
    Mux(bc_mshr.io.status.valid && bc_mshr.io.status.bits.set === sinkC.io.set,
      bc_mshr.io.status.bits.way,
      Mux1H(abc_mshrs.map(m => m.io.status.valid && m.io.status.bits.set === sinkC.io.set),
            abc_mshrs.map(_.io.status.bits.way)))
  sinkD.io.way := Vec(mshrs.map(_.io.status.bits.way))(sinkD.io.source)
  sinkD.io.set := Vec(mshrs.map(_.io.status.bits.set))(sinkD.io.source)

  // Beat buffer connections between components
  sinkA.io.pb_pop <> sourceD.io.pb_pop
  sourceD.io.pb_beat := sinkA.io.pb_beat
  sinkC.io.rel_pop <> sourceD.io.rel_pop
  sourceD.io.rel_beat := sinkC.io.rel_beat

  // BankedStore ports
  bankedStore.io.sinkC_adr <> sinkC.io.bs_adr
  bankedStore.io.sinkC_dat := sinkC.io.bs_dat
  bankedStore.io.sinkD_adr <> sinkD.io.bs_adr
  bankedStore.io.sinkD_dat := sinkD.io.bs_dat
  bankedStore.io.sourceC_adr <> sourceC.io.bs_adr
  bankedStore.io.sourceD_radr <> sourceD.io.bs_radr
  bankedStore.io.sourceD_wadr <> sourceD.io.bs_wadr
  bankedStore.io.sourceD_wdat := sourceD.io.bs_wdat
  sourceC.io.bs_dat := bankedStore.io.sourceC_dat
  sourceD.io.bs_rdat := bankedStore.io.sourceD_rdat

  // SourceD data hazard interlock
  sourceD.io.evict_req := sourceC.io.evict_req
  sourceD.io.grant_req := sinkD  .io.grant_req
  sourceC.io.evict_safe := sourceD.io.evict_safe
  sinkD  .io.grant_safe := sourceD.io.grant_safe

  private def afmt(x: AddressSet) = s"""{"base":${x.base},"mask":${x.mask}}"""
  private def addresses = params.inner.manager.managers.flatMap(_.address).map(afmt _).mkString(",")
  private def setBits = params.addressMapping.drop(params.offsetBits).take(params.setBits).mkString(",")
  private def tagBits = params.addressMapping.drop(params.offsetBits + params.setBits).take(params.tagBits).mkString(",")
  private def simple = s""""reset":"${reset.pathName}","tagBits":[${tagBits}],"setBits":[${setBits}],"blockBytes":${params.cache.blockBytes},"ways":${params.cache.ways}"""
  def json: String = s"""{"addresses":[${addresses}],${simple},"directory":${directory.json},"subbanks":${bankedStore.json}}"""
}

configs

memCycles: Int // 是从L2到memory的latency是多少，在parameter.scala里面计算的。是假定在这么多cycle内DDR能回我多少个transaction，从而我就需要多少个MSHR来接收，也就是MSHR至少需要多少个才能满足DDR的需求，超过这个值才能让MSHR不变成瓶颈。

parameter里117行。50ns是外面DDR的延时，800MHz是L2C的频率。所以是40个cycle的latency。（如果L2C是1.8G，那大概90个cycle）

scenario

scenario 1

MSHR中如果w_grantfirst就不能nestedB。

scenario 2

假设有一个acquire进到MSHR里面，在meta_data还没读出来的时候（也就是不知道directory状态的时候），会把blockC拉高，这样在scheduler里面会把C通道的请求都挡在接口上，等到blockC拉低，再来判断C通道的请求是不是可以nestC，如果可以就nest，如果不行就进queue。

如果没有这个blockC，而是让C通道的请求直接进queue的话，那有可能出现问题。比如在meta_data没准备好的时候来了个releaseData，发现不能nest，就进queue，等MSHR查出directory之后，releaseData已经进到queue里排队了，不会再来nest了，MSHR就处理acquire，发probe上去，上面回NtoN，从而忽略了releaseData，最新的data就留在了queue里，没拿到，导致问题。

也就是说一个请求的过程中，会出现三种情况：

block：把外面的请求挡在接口上，不能进queue，也不能进MSHR，这种最为严格，因为此时状态还不确定，没法做判断，挡住最保险
nest：可以被插队，当前状态已经比较明确了，并且可以被别人插队
非block非nest：不能被插队，当前状态也比较明确了，但是当前请求比外面的请求优先级高，就先做当前请求，让外面的请求进queue排队处理。