Guangxuan Xu
Guangxuan (GX) Xu is a Research Engineer at Red Hat AI Innovation, where he focuses on large language model alignment, emergent reasoning, and production-scale AI systems. He holds a Master's degree in Computer Science from UCLA, with research published at top venues like ACL and on arXiv. He has contributed to InstructLab and RL-driven model optimization at IBM Research. GX has led open-source releases in event-based NLP and dialogue safety, and his work bridges cutting-edge machine learning with enterprise deployment at scale.
Research Engineer
Company or affiliation –Red Hat Inc.
Session
Traditional approaches to improving AI model performance—scaling model size or training data—are increasingly constrained by cost, latency, and diminishing returns.
Inference-Time Scaling (ITS) offers an orthogonal solution by optimizing how computational resources are allocated during inference.By restructuring search and evaluation strategies at test-time, ITS significantly enhances model output quality without retraining or expanding model parameters.
In this talk, we will introduce the top methods of ITS, and how you can try it on your existing models using off-the-shelf tool-kits such as reward_hub and inference_time_scaling library.