Brief Summaries
One-page overview: problem, method, results, relevance
1.Megatron LM
2.hopper
3.ZeRO
4.Switch Transformers
5.PipeDream
6.nnScaler
7.p3
8.GPipe
9.BitNet LLM microsoft
10.AutoCCL
11.Efficient Schedule Construction for Distributed Execution of Large DNN Models
12.A3C Asynchronous Methods for Deep Reinforcement Learning
13.Demystifying NCCL
14.EMLIO
15.GPU Perf modeling LLM
16.Immediate .Comm Dist tasks GPU
17.MSCCL++
18.GPU Initiated net NCCL
19.pensieve sigcomm17