Utilizing Chakra execution traces for benchmarking and community efficiency optimization

  • Meta presents Chakra execution traces, an open graph-based illustration of AI/ML workload execution, laying the muse for benchmarking and community efficiency optimization.
  • Chakra execution traces signify key operations, similar to compute, reminiscence, and communication, knowledge and management dependencies, timing, and useful resource constraints.
  • In collaboration with MLCommons, we’re in search of industry-wide adoption for benchmarking. 
  • Meta open sourced a set of instruments to allow the gathering, evaluation, technology, and adoption of Chakra execution traces by a broad vary of simulators, emulators, and replay instruments.

At Meta, our endeavors should not solely geared in the direction of pushing the boundaries of AI/ML but additionally in the direction of optimizing the huge networks that allow these computations. Our agile, reproducible, and standardized benchmarking system performs an essential position on this. By way of our collaboration with MLCommons, and our deep insights into conventional benchmarking constraints, we’ve got initiated the Chakra execution traces—a graph-based illustration of AI/ML workloads. This strategy goals to unify numerous execution hint schemas, in search of industry-wide adoption for enhanced AI effectivity evaluation instruments and holistic efficiency benchmarking.

The restrictions of conventional AI benchmarking methodology

Historically, benchmarking AI programs has largely relied on operating full ML workloads. Established benchmarking approaches, similar to MLPerf, have supplied invaluable insights into the conduct and efficiency of AI workloads and programs. Nonetheless, conventional full workload benchmarking presents a number of challenges:

  1. Problem in forecasting future system efficiency: When designing an AI system, engineers ceaselessly face the problem of predicting the efficiency of future programs. Such predictions turn out to be much more complicated when the compute engines aren’t prepared or when adjustments in community topology and bandwidth turn out to be obligatory. Counting on full workloads to guage the efficiency of those not-yet-realized programs will not be possible.
  2. Excessive compute price: Executing full workload benchmarks comes at a considerable compute price. On condition that coaching modern ML fashions typically requires 1000’s of graphics processing items (GPUs), these benchmarks ought to ideally be executed on a equally huge variety of GPUs. Moreover, gauging the efficiency of a system utilizing this technique will be time-consuming.
  3. Incapacity to adapt to evolving workloads: The panorama of ML workloads and their necessities is quickly evolving. Conventional full workload benchmarks fall quick in relation to addressing these altering wants, primarily as a result of they necessitate vital efforts to standardize workloads as benchmarks.

An summary of Chakra

Constructing upon our insights into the constraints of conventional benchmarking, we current the Chakra execution traces. This new strategy gives an open, interoperable graph-based depiction of AI/ML workload execution. The Chakra execution hint captures core operations—together with compute, reminiscence, and communication—together with their dependencies, timing, and metadata. 

Although execution traces are a useful illustration of an ML job, the construction and metadata of the ensuing traces can differ primarily based on the ML framework utilized. Recognizing this, Chakra introduces a standardized schema for efficiency modeling, termed the Chakra execution hint. The under determine outlines the Chakra ecosystem, with execution traces as its central element. As depicted within the determine, Chakra additionally provides a spread of instruments to transform, visualize, generate, and simulate these execution traces.

How Meta leverages Chakra execution traces

At Meta, we accumulate execution traces from our manufacturing servers on daily basis. These execution traces serve a number of functions: Benchmarking, visualization, and efficiency optimization.

Benchmarking

Benchmarking is crucial for enhancing present AI programs and planning future networks. We particularly make the most of Chakra execution traces for this job. We have now developed a number of benchmarking instruments, together with Mystique and PARAM. Mystique permits us to copy the efficiency of an ML workload by replaying each compute and communication operators present in execution traces. It leverages the Chakra execution hint to report runtime particulars of a mannequin on the operator stage after which replays them to breed the unique efficiency. According to our imaginative and prescient, the MLCommons Chakra working group is curating the ‘Chakra hint benchmark suite’ by gathering execution traces from varied {industry} gamers.

Visualization and efficiency optimization

One instance of visualization and efficiency optimization is the evaluation of collective message sizes. We analyze manufacturing execution traces utilizing an automatic system. The visible knowledge generated aids us in figuring out any stability or imbalance in collective message sizes throughout completely different ranks. Our visualization device can exactly spotlight these imbalances, as proven by the under determine. 

With this info at hand, Meta engineers are geared up to craft acceptable options, guaranteeing a balanced message dimension, as demonstrated within the under determine.

Future plans

Enhancing the benchmarking functionality of Chakra execution traces

Whereas the execution hint replayer allows replay of execution traces, it brings forth challenges. A main problem is the intrinsic linkage of collected execution traces to particular programs. As a result of traces are gathered from precise machine runs, the kernels executed are optimized for the particular system at play. Consequently, traces sourced from one system may not precisely simulate on one other with a special GPU, community topology, and bandwidth.

We’re addressing this constraint in collaboration with the MLCommons Chakra working group. We aspire to collect execution traces previous to the operator optimization section for any goal system, as proven within the determine. These are termed pre-execution traces. In parallel, to allow benchmarking next-gen AI programs, we’re streamlining the method from hint assortment to simulation on a simulator.

Utilizing AI to generate consultant execution traces

Chakra execution traces are able to figuring out community bottlenecks in ML workload execution. Nonetheless, optimizing SW/HW stacks with manufacturing execution traces presents a sensible problem. The principle problem arises when attempting to globally optimize our manufacturing programs. Given the sheer quantity of manufacturing traces, exhaustively operating them for system optimization is neither possible nor environment friendly. Doing so can be each time-consuming and computationally costly. Thus, deciding on a consultant subset of manufacturing execution traces turns into crucial. 

Nonetheless, there’s a threat: The chosen traces may not holistically signify the worldwide traits, doubtlessly skewing optimization efforts in the direction of solely particular ML workloads. We envision a generative AI mannequin that may establish and generate execution traces which might be consultant of the first traits noticed. We additionally plan to include an obfuscation mechanism inside the AI mannequin. This may facilitate hint sharing with out jeopardizing mental property, fostering SW/HW co-design between completely different corporations.

 

Taking the leap with {industry} collaboration

For such an ecosystem to flourish, {industry} consensus is paramount. Our collaboration with the MLCommons consortium, an open engineering meeting of over 50 main corporations, is a testomony to our dedication. This collaboration goals to ascertain Chakra inside its fold, offering a framework for broad adoption.

Chakra’s working group beneath MLCommons will spearhead efforts to create and develop:

  • A standardized schema that may seize and convert execution traces from numerous frameworks.
  • ML fashions for creating consultant Chakra execution traces – defending proprietary info whereas additionally projecting future AI workloads.
  • An open ecosystem of instruments for benchmarks, simulations, and emulations.
  • Complete benchmarks with Chakra execution traces primarily based on MLCommons/MLPerf pointers.

Be part of us on this journey

Our imaginative and prescient is to forge an agile, reproducible benchmarking and co-design system for AI. Collaboration with friends, tutorial establishments, and consortiums can be pivotal. We invite people and corporations to turn out to be part of the Chakra working group, to assist contribute to the paradigm shift in benchmarking and community efficiency optimization.

Learn the analysis paper

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Acknowledgements

We want to thank all contributors to the Chakra undertaking inside Meta: Taekyung Heo, Srinivas Sridharan, Brian Coutinho, Hiwot Kassa, Matt Bergeron, Parth Malani, Shashi Gandham, Omar Baldonado, our exterior companions in Georgia Tech and MLCommons, in addition to exterior collaborators in AMD, CMU, Cornell, Enfabrica, Google, Harvard, HP Labs, Intel, Keysight Applied sciences, Microsoft, NVIDIA, OCP, and Stanford.