PyTorch 2.9 adds easier install on AMD ROCm & Intel XPUs, Arm improvements, and more
PyTorch 2.9 is now available as the newest release for users of this open-source machine learning library. This update brings a stable libtorch application binary interface (ABI), ensuring improved compatibility for third-party C++ and CUDA extensions. At the same time, developers gain access to Symmetric Memory, making it easier to program multi-GPU kernels that can operate over NVLinks and remote direct memory access networks.
Alongside these main upgrades, PyTorch 2.9 broadens its wheel support to include AMD ROCm, Intel XPU, and NVIDIA CUDA 13, paving the way for wider adoption on cutting-edge hardware. Error and resume toggling is now possible on graph breaks in torch.compile, offering developers greater control over error handling and execution flow.
Following the expanded platform support, FlexAttention forward and backward are now fully supported on Intel GPUs. This ensures performance and behavior remain consistent across different GPU types. In parallel, flash decoding optimization with FlexAttention extends to the X86 CPU inductor backend, providing improved parallelism for key-value sequence processing.
Additionally, Arm processor users benefit from enhanced performance and test coverage with this release.
