OpenXLA

7 ай бұрын

(Meta) Integrating XLA backend with DTensor API using PyTorch XLA SPMD

Пікірлер

@wolpumba40992 ай бұрын

*OpenXLA Community Meeting - March 2023: Abstract* The March 2023 OpenXLA community meeting showcased a range of exciting developments and ongoing discussions within the project. Key highlights included: * *PGRT Plugin advancements:* Enhanced functionality and broader hardware support, including Apple Silicon integration for Jax. * *StableHLO Quantizer:* A novel project addressing quantization scalability challenges and promoting "write once, run everywhere" in the ML domain. * *Chardon Partitioner:* A new partitioner merging the strengths of GSPMD and XLA-Par, offering improved user control, debuggability, and advanced features. * *Batch Dimensions for Gather and Scatter:* A proposal to enhance expressiveness and facilitate efficient sharding for batch operations. * *Composite Op Introduction:* Enabling experimentation with novel ML abstractions and ensuring backend compatibility. * *Active RFCs:* Numerous RFCs addressing diverse topics, including precision configuration for Dops, new MHLO features, hybrid quantization, and StableHLO v1.0 compatibility. The meeting demonstrated a vibrant and growing OpenXLA community, with active contributions and collaborative efforts driving innovation in the ML compiler ecosystem. *OpenXLA Community Meeting - March 2023* *Introductions* * *0:05**:* Elliot, the new Technical Lead (TL) for OpenXLA at Google, introduces himself and outlines his focus on technical direction, roadmap organization, and community process improvement. *Agenda* * *1:27**:* A brief reminder about OpenXLA as an open-source, state-of-the-art ML compiler ecosystem built in collaboration with various partners. *PGRT Blog Post* * *1:47**:* Aman discusses the recently published blog post about the PGRT plugin, covering its functionality, creation process, and discovery by frameworks. * *2:21**:* Updates on the PGRT API are highlighted, including versioning, API compatibility, and multi-node DLAC support. * *2:42**:* Apple's adoption of PGRT for Apple Silicon support in Jax is showcased, detailing the generation and execution of stable HLO, MPS graphs, and integration into a Metal plugin. * *3:07**:* The blog post emphasizes the broad range of hardware targets using PGRT, including Intel Max GPUs, Google Cloud GPUs, NVIDIA GPUs, and Apple Silicon. *Technical Updates* *StableHLO Quantizer* * *4:48**:* J from Google's Model Optimization project introduces the StableHLO Quantizer project and seeks community feedback, interest, and collaboration opportunities. * *5:41**:* The project aims to address the scalability challenges of quantization solutions that are often tied to specific hardware or ML frameworks. * *7:53**:* By leveraging StableHLO graphs and a uniform quantization representation, the project promotes "write once, run everywhere" in the ML domain. * *8:21**:* The current implementation is integrated into Cloud TPU inference converter and TFLite products, enabling quantization across mobile and server environments. * *10:28**:* Open-sourcing the project is planned to enhance the overall quantization execution ecosystem. *Chardon Partitioner* * *15:50**:* Tom and Dom from DeepMind present Chardon, a new partitioner combining the best features of GSPMD and XLA-Par (an ML-based partitioner). * *16:23**:* Key features include priorities for controlling sharding propagation, intermediate sharding annotations for fine-grained control, and improved user control and debuggability. * *17:21**:* Developed in MLIR and planned for open-sourcing within OpenXLA, Chardon aims to be dialect-agnostic and facilitate integration with other compiler infrastructure. * *21:00**:* The presentation details the Sharding API, including mesh representation, sharding annotations, axis splitting, and priorities. * *25:22**:* Additional features like ShardAs/ShardLike and Manual computation are explained. *Batch Dimensions for Gather and Scatter* * *32:17**:* Tom proposes adding batch dimensions to Gather and Scatter operations in StableHLO, aiming for simpler batching, easier partitioning, and improved expressiveness. * *33:26**:* Using a Jax example, the current limitations of Gather in handling batch dimensions are demonstrated, highlighting the need for explicit representation. * *35:50**:* The proposal draws inspiration from DotGeneral in XLA, which preserves batch dimension information during vectorization. * *36:29**:* The solution involves introducing batching dimension attributes to Gather and Scatter, similar to DotGeneral, enabling efficient sharding propagation. *Composite Op* * *45:20**:* Michael from the StableHLO team introduces the new Composite op, designed to support experimentation with novel ML abstractions. * *45:48**:* Composite op allows decomposing complex operations into simpler ones, ensuring backend compatibility and enabling future inclusion in StableHLO if widely adopted. * *46:34**:* An example demonstrates the structure and usage of Composite op, including its name, operands, and decomposition function reference. * *47:39**:* The StableHLO LegalizeCompositeToCall pass enables backends to choose between processing the composite directly or expanding it into simpler operations. *Active RFCs* * *52:09**:* Elliot provides an overview of active RFCs, indicating significant community engagement and progress. * *52:36**:* RFCs under discussion include improved precision configuration for Dops, new MHLO features (Tan op, CustomCall with dictionary, and variadic collectives), hybrid quantization, and ODML compatibility for StableHLO v1.0. *DevLab and Closing* * *54:38**:* Reminder about the upcoming OpenXLA DevLab on April 25th, with the agenda to be finalized and shared soon. * *55:11**:* Slides, recording, and notes from the meeting will be shared by the end of the week. i summarized the transcript using gemini 1.5 pro Token count 14,088 / 1,048,576

@jony77794 ай бұрын

she says vmap and jvp are already supported function transformations for pallas kernels. Do I understand correctly then that reverse mode (i.e. vjp) autodiff is not supported right now? i.e. if you write some DL primitive in pallas you gotta write its grad kernel too?

@alexkarl35544 ай бұрын

What does "PJRT" stand for?

@MbjYjbpivj5 ай бұрын

Any slice I can find？

@jueonpark117 ай бұрын

Nice work!

@PeterHan960610 ай бұрын

Technical Upates starts kzread.info/dash/bejne/hGaExbSYic-7nrg.html

@brookssong443711 ай бұрын

what's the relation between XLA and Triton?

OpenXLA

OpenXLA DevLab Kick Off 2024 04 25 09 04 GMT 7

Optimizing PyTorch Auto sharding For Your Hardware 2024 04 25 09 40 GMT 7

StableHLO Tutorial Session2 2024 04 25 10 48 GMT 7, recommended

PJRT tutorial 2024 04 25 10 51 GMT 7

OpenXLA Community Meeting March 26, 2024

OpenXLA Community Meeting Feb 13, 2024

OpenXLA Community Meeting Dec 12, 2023

(SiFive) Deploy and Optimize ML E2E models on RISC V platform via OpenXLA

(Fraunhofer IML) TinyIREE on bare metal Arm Platforms

(SambaNova) Bridging Incompatible Compilers

(AMD) Data Tiling + Microkernels in IREE

(NVIDIA) Using cuDNN fused attention in XLA GPU

(NVIDIA) Horizontal Scaling in XLA GPU

XLA GPU Roadmap

(Intel) Optimizing XLA CPU with oneDNN

XLA CPU Roadmap

(AWS) Innovating with XLA for the Neuron compiler for AWS Trainium and Inferentia

PJRT: Simplifying ML Hardware and Framework Integration

(Google) JAX: Low-level control with shard_map and Pallas

(Alibaba) TorchAcc: A TorchXLA enabled Distributed Training Framework

(Meta) PyTorch Export: Sound Whole Graph Capture for PyTorch

(Meta) Integrating XLA backend with DTensor API using PyTorch XLA SPMD

PyTorch 💙 XLA

OpenXLA Overview

(Day 2 - Breakout Session) On-device Machine Learning

(Day 2 - Breakout Session) XLA GPU Architecture

(Day 2 - Breakout Session) XLA FSDP

(Day 1 - Breakout Session) StableHLO & PJRT

(Day 1 - Breakout Session) CPU Strategy

Пікірлер