Generative AI 2.0 Foundations, AGI, Applications, and Beyond
Our goal is to develop fundamental algorithmic and system framework for building and deploying orders of magnitude complex GenAI systems, compared to that of today, without creating a compute and energy crisis.
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization. [To appear]
Tianyi Zhang, Jonah Wonkyu Yi, Zhaozhuo Xu, Anshumali Shrivastava
Neural Information Processing Systems (NeurIPS) 2024.
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention. [To appear]
Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava
Neural Information Processing Systems (NeurIPS) 2023.
Accelerating Inference with Fast and Expressive Sketch Structured Transform. [To appear]
Aditya Desai, Kimia Saedi, Apoorv Walia, Jihyeong Lee, Keren Zhou, Anshumali Shrivastava
Neural Information Processing Systems (NeurIPS) 2024.
In defense of parameter sharing for model-compression .
Aditya Desai and Anshumali Shrivastava
International Conference on Learning Representations (ICLR) 2024.