PyTorch 2.7 brings NVIDIA Blackwell support, Mega Cache, FlexAttention updates, and more
PyTorch has launched version 2.7, bringing a host of new features and improvements to this machine learning library. This update introduces support for NVIDIA's Blackwell GPU architecture and provides pre-built wheels for CUDA 12.8 compatible with both Linux x86 and arm64 architectures. A notable addition is the torch.compile support for Torch Function Modes, allowing users to override any *torch.** operation for custom behavior.
The release also debuts the Mega Cache feature, offering end-to-end portable caching capabilities. FlexAttention sees several enhancements, including LLM first token processing, LLM throughput mode optimization, and support for inference. Performance optimizations have been made for Intel GPU architectures, improving efficiency and speed.
Additional updates include the PyTorch Context Parallel API, support for foreach_map, and prologue fusion support in inductor, alongside various other enhancements and bug fixes. These changes aim to enhance the functionality and performance of PyTorch for developers working in fields like computer vision and natural language processing.
