Gå direkte til innholdet
Parallel Computing Cluster for Solving Computational Problems in Data
Spar

Parallel Computing Cluster for Solving Computational Problems in Data

pocket, 2025
Engelsk
Thisbook tackles a critical bottleneck in large-scale AI: the slow and communication-heavy training of massive Deep Neural Networks (DNNs) on multi-GPU systems. It addresses the trade-off between two main parallelization methods. Data parallelism suffers from severe communication overhead for large models, while pipelined model parallelism (like PipeDream) offers up to 8.91x speedup for large Fully Connected/Recurrent Neural Networks but causes "weight staleness," degrading model accuracy. To resolve this, the paper introduces SpecTrain, a novel technique. SpecTrain uses the momentum from optimizers to predict future weight updates, allowing pipelined computation with accurate, non-stale weights. This enables the high GPU utilization and speed of pipelining while maintaining the training robustness and final accuracy of synchronous methods.
ISBN
9786209340734
Språk
Engelsk
Vekt
86 gram
Utgivelsesdato
12.12.2025
Antall sider
56