So many cool ideas are being developed by the community. Here are some of my favorites.

  • Batch inference optimization: FlexGen, llama.cpp
  • Faster decoder with techniques such as Medusa, LookaheadDecoding
  • Model merging: mergekit
  • Constrained sampling: outlines, guidance, SGLang
  • Seemingly niche tools that solve one problem really well, such as einops and safetensors.

Send me a message or webmention
Back to feed