Fine-tuning has a marketing problem: it's the first thing people reach for when a model doesn't do what they want, and the last thing that actually fixes it. Most production teams that reach for fine-tuning would get better ROI from a well-structured prompt and a reranking layer.
The decision framework
Fine-tuning is worth it when: (1) you need consistent output format that prompting can't enforce reliably, (2) you have >10,000 high-quality examples of the exact task, or (3) you need to compress a complex multi-turn prompt into a faster, cheaper inference call. Everything else is a prompt engineering problem.