Research

  1. Efficient machine learning systems
  2. Secure and Trustworthy AI/ML systems

Efficient machine learning systems

My current research focuses on efficient algorithm design and mapping on existing computing platforms such as FPGA and GPU. Medusa is a easy-to-use framework which accelerates LLM generation through multiple light-weighted decoding head. The overhead is much smaller than that of the draft model based speculative decoding design. My DAC22 paper is a Transformer-based model mapping on FPGA which features sequence-length adaptive hardware design and re-designed linear complexity self-attention. My ICCAD23 paper is a GNN mapping on GPU which features degree sorting, block-level partition and combined warp on GPU CUDA kernel, which maximizes GPU memory bandwidth and computational parallelism on SpMM operator.


Secure and Trustworthy AI/ML systems

My current research focuses on novel algorithms and hardware co-design for accelerating privacy-preserving machine learning, aiming to facilitate the practical deployment of PPML across various industries that interact with sensitive data, such as healthcare, biomedicine, banking, finance, etc.