Block Diagrams

ASCII architecture diagrams, data-flow, design trade-offs, borrowings


1.Megatron LM 2.ZeRO 3.Switch Transformers 4.PipeDream 5.nnScaler 6.p3 7.GPipe 8.BitNet LLM microsoft 9.AutoCCL 10.Efficient Schedule Construction for Distributed Execution of Large DNN Models 11.A3C Asynchronous Methods for Deep Reinforcement Learning 12.Demystifying NCCL 13.EMLIO 14.GPU Perf modeling LLM 15.Immediate .Comm Dist tasks GPU 16.MSCCL++ 17.GPU Initiated net NCCL 18.pensieve sigcomm17 19.CollCommConfigSurvey 20.NCCLX 21.CollCommPerfEval4DDLTraining 22.HiCCL 23.gZCCL 24.R2CCL 25.BigSendOff resilaientAndPerformant 26.Survey CommEfficientDDL 27.Survey CommEfficientLargeScaleDDL 28.Survey Communication Optimization Algorithms for Distributed Deep Learning Systems A Survey 29.Survey Communication Optimization for Distributed Training Architecture Advances and Opportunities 30.Survey DistributedMachineLearning 31.0030 Survey QuantitativeSurveyCommunicationOptimizationsInDDL 32.SCCL 33.TACCL 34.MSCCLang 35.GC3 36.TE CCL 37.SyCCL 38.Chakara 39.Crux 40.BytePS 41.Horovod 42.Poseidon 43.SparCML 44.1Bit Adam 45.near optimal sparse allReduce 46.efficient largescale modelTraining 47.MegaScale 48.C4 enhancing LScale AI trining 49.HPCA collectives 50.Unified Collective Communication UCC An Unified Library for CPU GPU and DPU Collectives 51.MVAPICH2 GDR 52.GPU2GPU communication