Misha Ramendik
Principal Technical Writer at Red Hat in Ireland, with lots of side interests
Session
06-19
12:30
35min
Fine-tuning a small model for style/vibe (a Kimi distillation and beyond)
Misha Ramendik
Edge-runnable smaller language models have a significant appeal to privacy-conscious technical users.
Kimi K2 Instruct, an extremely large language model (1T parameter MoE), has developed a "vibe" uniquely attractive to technical users. with low sycophancy and high creativity.
Attempting to distill a heavy "vibe" into a 1.5B model (IBM Granite 4-h Nano) has run into a number of issues but produced interesting results, with different training methods and optimizers tried.
Artificial Intelligence and Data Science
D105 (capacity 300)