Mustafa Eyceoz
A research engineer with a focus on language models. Currently working on model reasoning, efficiency, and customization. Previously worked on text-to-SQL translation, speech recognition, and distributed systems for AI/ML workloads.
Senior Research Engineer
Company or affiliation –Red Hat
Session
Join us for an overview of all the latest methods of language model post-training openly available today! We will begin with offline methods like standard Supervised Fine-Tuning (SFT), Parameter-Efficient Fine-Tuning (PEFT), Direct Preference Optimization (DPO), and continual learning techniques for further tuning existing instruct models. We will then move into online reinforcement learning options like Reinforcement Learning from Human Feedback (RLHF) and Group Relative Policy Optimization (GRPO). The talk will consist of a walkthrough of the use-cases for each method, as well as how to get started today via our very own Training Hub!