HyperDex Toolchain
The HyperDex Toolchain is a comprehensive software stack that simplifies the deployment and execution of AI workloads on LPU (Language Processing Unit) hardware, ensuring optimal performance. It includes device drivers, runtime environment, and compiler tools that automate all processes required to initialize, compile, and execute models on the LPU.
This toolchain is delivered as a single, integrated package comprising two core components: hyperdex.tools
and hyperdex.transformers
, which serve as the compiler and runtime packages respectively. Together, they enable developers to efficiently deploy LLMs on LPU hardware with minimal modifications to existing workflows.
hyperdex.tools — Compiler
hyperdex.tools
is a Python-based compiler interface specifically designed to support HuggingFace Transformers models. It provides a high-level entry point for seamlessly adapting pre-trained models to LPU hardware architecture.
The core of this component consists of two primary classes: AutoCompiler and AutoModelConverter.
- AutoCompiler orchestrates the entire compilation pipeline by verifying model compatibility with LPU hardware, downloading the target model, and preparing it for execution using LPU-optimized instructions.
- AutoModelConverter performs comprehensive validation of downloaded checkpoints, analyzes both the model architecture and tokenizer configuration, and converts them into an LPU-compatible format.
Through automated handling of these complex processes, hyperdex.tools
enables developers to seamlessly migrate HuggingFace models to LPU hardware with minimal manual intervention.
hyperdex.transformers — Runtime
hyperdex.transformers
provides the runtime environment for executing LLMs on the LPU. It is fully aligned with the HuggingFace Transformers interface, enabling developers to leverage familiar APIs while benefiting from LPU acceleration.
Key functionality includes:
- AutoConfig - Loads runtime parameters such as the number of devices and maximum token length
- AutoModelForCausalLM - Loads and executes compiled LPU models
- AutoTokenizer - Retrieves the appropriate tokenizer for each model
- TextStreamer - Streams decoded tokens incrementally in real-time, either to stdout or via Server-Sent Events (SSE), enabling words to appear as they are generated
Additionally, the runtime supports prefix caching, which allows repeated or long-running workloads to be processed more efficiently.
Compatibility
All classes and tools within the HyperDex Toolchain are compatible with HuggingFace's transformers
library. This ensures that developers can leverage the same ecosystem they are familiar with while seamlessly transitioning their workloads to LPU-based execution.
Before using the HyperDex Toolchain, please ensure you have met all the requirements listed in the Prerequisites section.
Prerequisites
- An LPU device is installed and recognized (
lspci | grep -i xilinx
) - XRT is installed successfully
- Linux kernel version is supported by both XRT and HyperDex Toolchain
- You have the necessary credentials to download required packages
For installation instructions, please refer to HyperDex Toolchain Installation. If you prefer to use vLLM, please see vLLM Installation.