SPIN-OFFs

Publications

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization. [To appear]
Tianyi Zhang, Jonah Wonkyu Yi, Zhaozhuo Xu, Anshumali Shrivastava
Neural Information Processing Systems (NeurIPS) 2024.
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention. [To appear]
Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava
Neural Information Processing Systems (NeurIPS) 2023.
Accelerating Inference with Fast and Expressive Sketch Structured Transform. [To appear]
Aditya Desai, Kimia Saedi, Apoorv Walia, Jihyeong Lee, Keren Zhou, Anshumali Shrivastava
Neural Information Processing Systems (NeurIPS) 2024.
Soft Prompt Recovers Compressed LLMs, Transferably. [pdf coming soon]
Zhaozhuo Xu, Zirui Liu, Beidi Chen, Shaochen Zhong, Yuxin Tang, Jue WANG, Kaixiong Zhou, Xia Hu, Anshumali Shrivastava
International Conference on Machine Learning (ICML) 2024.
In defense of parameter sharing for model-compression .
Aditya Desai and Anshumali Shrivastava
International Conference on Learning Representations (ICLR) 2024.