arxiv:2604.16498

Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Published on Apr 14

· Submitted by

Authors:

Abstract

Forge-UGC is a four-phase compiler for efficient transformer deployment on heterogeneous hardware, offering faster compilation, reduced inference latency, and lower energy consumption compared to existing frameworks.

AI-generated summary

We present Forge-UGC (FX Optimization and Register-Graph Engine for Universal Graph Compilation), a four-phase compiler for transformer deployment on heterogeneous accelerator hardware, validated on Intel AI Boost NPU. Existing frameworks such as OpenVINO and ONNX Runtime often use opaque compilation pipelines, limited pass-level visibility, and weak buffer management, which can lead to higher compilation cost and runtime overhead. Forge-UGC addresses this with a hardware-agnostic design that separates graph capture, optimization, intermediate representation lowering, and backend scheduling. Phase 1 captures graphs with torch.export at the ATen operator level, supporting modern transformer components such as rotary position embeddings, grouped-query attention, and SwiGLU without manual decomposition. Phase 2 applies six optimization passes: dead code elimination, common subexpression elimination, constant folding, attention fusion, operator fusion, and layout optimization, reducing graph node count by 14.2 to 21.9%. Phase 3 lowers the optimized graph into a typed intermediate representation with explicit virtual register assignments. Phase 4 performs liveness analysis, linear-scan buffer allocation, reducing peak buffer count by 30 to 48%, and device-affinity scheduling, reducing NPU-CPU transitions by 42 to 65%. Across six model families ranging from 125M to 8B parameters, evaluated on WikiText-103 and GLUE, Forge-UGC delivers 6.9 to 9.2x faster compilation than OpenVINO and ONNX Runtime, 18.2 to 35.7% lower inference latency, and 30.2 to 40.9% lower energy per inference. Fidelity is preserved, with max absolute logit differences below 2.1e-5 and KL divergence below 8.4e-9. We also introduce Fusion Gain Ratio, Compilation Efficiency Index, and per-pass execution profiling for systematic evaluation of NPU compilation pipelines.

View arXiv page View PDF Add to collection

Community

Satyamk098

Paper submitter about 16 hours ago

🚀 Forge-UGC just dropped — a brand-new transparent 4-phase FX + register-graph compiler that makes transformer deployment on heterogeneous accelerators (validated on Intel AI Boost NPU) dramatically faster and more efficient!
Key wins over OpenVINO & ONNX Runtime:

6.9–9.2× faster compilation
18.2–35.7% lower inference latency
30.2–40.9% lower energy per inference

All while preserving perfect numerical fidelity (max logit diff < 2.1e-5, KL < 8.4e-9).
It natively handles modern transformer blocks (RoPE, GQA, SwiGLU) without manual decomposition, crushes graph size with smart fusions, and uses linear-scan register allocation + device-affinity scheduling to slash peak buffer usage and CPU↔NPU transitions.
Tested on 6 model families (125M → 8B params) across WikiText-103 and GLUE. They even introduced new metrics (Fusion Gain Ratio & Compilation Efficiency Index) so the community can finally compare compilers fairly.
If you care about fast, efficient inference of Hugging Face models on edge hardware or NPUs, this is a must-read.
Paper → https://arxiv.org/abs/2604.16498
Would love to hear your thoughts — especially if you’re working on deployment, custom backends, or NPU optimization! 🔥

librarian-bot

about 5 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2604.16498

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.16498 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.16498 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.16498 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.