Hopper: Predictive Load Balancing for RDMA Traffic — Detailed Summary

Erfan Nosrati & Majid Ghaderi, University of Calgary | arXiv:2506.08132 | June 2025

Per-paragraph summary (2 points each), organized by section and subsection headings.


Abstract

Paragraph 1:

Paragraph 2:


1. Introduction

Paragraph 1:

Paragraph 2:

Paragraph 3:

Paragraph 4:

Paragraph 5:

Paragraph 6:

Paragraph 7 (Contributions):


2. Background and Motivation

Paragraph 1:

Paragraph 2 (RDMA):

Paragraph 3 (RoCEv2):

Paragraph 4 (Problem with OOO Packets):

Paragraph 5 (Problem with Random Packet Spraying):

Paragraph 6 (Summary):


3. Design

Paragraph 1 (Design Overview):


3.1 Congestion Detection Module

Paragraph 1 (Congestion Signal):

Paragraph 2 (Detection Mechanism):


3.2 Path Probing Module

Paragraph 1 (Probe Mechanism):

Paragraph 2 (Probe Initiation):


3.3 Path Switching Module

Paragraph 1 (Path Selection):

Paragraph 2 (Path Switching Mechanics):

Paragraph 3 (Reducing Out-of-Order Packets):


4. Evaluation

Paragraph 1:


4.1 Simulation Experiments

4.1.1 Setup and Methodology

Paragraph 1 (Network Topology):

Paragraph 2 (Transport):

Paragraph 3 (Workloads):

Paragraph 4 (Baselines & Metric):


4.1.2 Results and Discussion

Paragraph 1 (Datacenter Workloads — Hadoop):

Paragraph 2 (ML Workload):


4.2 Testbed Experiments

Paragraph 1:


4.2.1 Setup and Methodology

Paragraph 1 (Implementation):

Paragraph 2 (Network Topology):

Paragraph 3 (Transport & Workload):

Paragraph 4 (Baselines & Metrics):


4.2.2 Results and Discussion

Paragraph 1 (Avoiding Congested Paths):

Paragraph 2 (FCT Slowdown):

Paragraph 3 (Training Time):


Paragraph 1 (Flowlets in RDMA):

Paragraph 2 (Multipath RDMA):

Paragraph 3 (Path Probing):

Paragraph 4 (RTT Measurement):


6. Conclusion

Paragraph 1:

Paragraph 2:


Appendix A: Hopper's Workflow

Paragraph 1:


Appendix B: AliCloud Storage Workload

Paragraph 1:


Appendix C: Meta Hadoop Workload (Full Distribution)

Paragraph 1: