PyTorch's PowerSGD

I increased the performances of PowerSGD ~5x