Accelerating LLM Inference with Speculative Decoding - LinkedIn Engineering Blog
LinkedIn's Hiring Assistant team shares how they used speculative decoding to achieve 4× higher throughput and 66% latency reduction for their AI agent. This deep dive explores n-gram speculation, implementation strategies with vLLM, and practical lessons for serving LLMs at scale in production.
Nov 6, 2025