What we do

We focus on various facets of Efficient AI to make AI equally accessible to everyone. In particular, we are interested in achieving the favorable tradeoffs in terms of:

  • Model accuracy
  • Inference cost
  • Training cost

How we approach

We pursue this goal through identifying, understanding and harnessing the algorithmic biases of machine learning. Modern machine learning algorithms tend to be biased toward finding a very specific solution, instead of simply finding any solution which fits the training dataset well. Luckily, it turns out that we can steer ML algorithms in a way that the learned solution generalize well. However, it often comes at the cost of more expensive training or inference.

Through theories, we seek to characterize such tradeoffs; through algorithms, we seek to achieve Pareto-optimal points on the tradeoff curve.

Research highlights

Here are some examples of our recent projects:

  • Speed, accuracy, and initialization. We characterize how the initialization scale affects the generalizability and training speed of neural networks (A,B).
  • Adaptation for compression. We develop techniques to improve the performance of compressed models by tuning the models pre- or post-compression (A, B).
  • On-device LMs that can hear. By crafting new benchmarks, we reveal that on-device LMs often lack auditory commonsense. Then, we mitigate this through retrieval and generation (A, B, C).
  • Variable-rate compression of deep nets. We devise algorithms to compress neural networks in a way that reflects the distinct roles of each layers (A, B).
  • Neural codec for model weights. We develop neural data compression algorithms for neural network weights, with applications in LLMs or neural fields (A, B, C, D)

See our papers for more.