blog

notes on efficient inference, multi-token generation, and multimodal machine learning