1RMA: Re-envisioning Remote Memory Access forMulti-tenant Datacenters
    
1 Background & Issues
  - 傳統的RDMA並不適用於Multi-tenant datacenter
    
      - 使用過多連線
        
          - 基於InfiniBand RDMA的特性, NIC會cache住host-remote每一條連線的state
 
          - 連線是per app-pair, 在大流量環境會占用過多NIC cache
 
        
       
      - Induced Ordering
        
          - 當多筆request要共用同一條connection時
 
          - 由於RDMA要求FIFO方式處理,可能發生head-of-line blocking,導致高priority request陷入stravation
 
        
       
      - Security Issue
        
          - 加密完全交由NIC處理,app不能快速做key rotation
 
        
       
      - Hardware Congestion Control
        
          - Switch必須用 Priority Flow Control(PFC),提供RDMA lossless環境
            
              - 但PFC會有deadlock, poor failure isolation, head-of-line blocking
 
            
           
          - 寫死在網卡,deploy後很難手動調整
 
        
       
      - Error Handling
        
          - 當斷線時,client不知道server端的RDMA op到底有沒有改道remote memory
 
        
       
    
   
2 Introduction
  - 1RMA重新設計劃分網卡與軟體間的責任,把原本全部都由NIC完成的工作,拆一些出來給軟體做
    
      
 
      - 硬體專注RMA read/write, encrypt
 
      - 軟體負責CC, op pacing, timeout policy
 
    
   
  - 設計目標:
    
      - No Connections
        
          - 不用cache connection state
 
          - 每個op都視為獨立 => per-op retry/fail-recovery
 
        
       
      - Small-sized ops, solicitation based
        
          - Hardware solicitation window to prevent TCP incast
 
        
       
      - Software Congestion Control
        
      
 
      - Software-defined resource allocation
        
          - 不像傳統RDMA為了滿足lossless的網路環境,而必須要在網卡上cache一堆state
 
          - 透過priority決定要分配給request多少資源
 
        
       
      - First-Class Security
        
          - 讓app有權限做key rotation
 
          - 每個memory region用不同key保護
 
        
       
    
   
3 1RMA Overview

  - Step1~2 Get $K_d$, $RegionID$
 
  - Step3   Solicitation window有空位的時候,才能issue request到NIC
    
      - SW: chunk, SW->NIC的速率、Congestion Control (Slow)
 
      - HW: 用Soli. window做Admission Control (Fast)
 
    
   
  - Step4   $K_d$ sign request
 
  - Step6   $K_d$ encrypt response
 
4 1RMA Design In Depth
4-1 Security
  - Derived Key
    
      
 
      - $K_d:$ Session Key,  per-process, 用來sign跟(encrypt)
        
          - client拿到$K_d$後才有資格做RDMA request
 
        
       
      - $K_r:$ Region Key, per-memory-block, 存在remote NIC
 
      - 可防以下攻擊
        
          - Replay attack: 因server生$K_d$有加salt
 
          - inject ciphertext: 沒$K_d$無法decrypt
 
          - Access other’s remote memory: 可猜出
RegionID猜不出$K_d$ 
        
       
    
   
4-2 Hardware

  - RRT: static table, 存
RegionID, Kr對應到的memory range 
  - CST: single in-flight operation
 
  - Solicitaion Window: Admission Control, 限制FIFO中多少packet能進入網路
 
  - Number of memory regions for RMA based on tasks, not task-pairs
    
      - manageable in finite resources
 
    
   
  - Timeout: 等太久都沒進入window就直接timeout
    
      - 避免head-of-line blocking, 提供congestion signal
 
    
   
4-3 Software
  
 
  - CommandExecutor: chunking, CC, pacing
 
  - ComamndPortal: App. memory <—> NIC register mapping
 
5 Other 1RMA Ops
5-1 RMA Write
  - 要求Remote對local做RMA read
 
  - Con: 多花一個RTT
 
  - Pro:
    
      - 機制可以沿用RMA read的,不用重新設計
 
      - client會比remote晚timeout
        
          - 可避免斷線時,client不知道write remote memory到底有沒有成功
 
        
       
    
   
5-2 Rekey
  - 用RMA write做key rotation
 
  - 成本低:Install a new region key 𝐾𝑟 in 1 RRT
 
  - 傳統方法問題
    
      - High transient connection usage: 要先建用新key的連線
 
      - Bursts of connection failure: 換key的時候會瘋狂auth fail
 
    
   
6 Congestion Control
  - SW: 主要做CC運算
    
  
 
  - 用Delay當作CC指標
 
  - Connection-free 1RMA, 故可以只在request端做CC就好
    
  
 
  - Congestion Window
    
      - Remote: 每張remote NIC/direction 一個
 
      - Local:  共用同一個
 
    
   
  - Local CC Algo
    
  
 
  - ops’ issue rate = $\frac{OpSize*CWND}{RTT}$