
PyTorch 2.12 brings 100x faster CUDA linalg.eigh and unified graph API
PyTorch 2.12 delivers up to 100x faster batched eigendecomposition on CUDA by overhauling the linalg.eigh backend.
The release introduces the torch.accelerator.Graph API, enabling unified graph capture and replay across CUDA, XPU, and custom backends.
torch.export.save now supports Microscaling quantization formats, allowing export of more aggressively compressed models.
No comments so far, maybe you want to be first?
Gu
