Research
Efficient machine learning systems
My current research focuses on efficient algorithm design and mapping on existing computing platforms such as FPGA and GPU. Medusa is a easy-to-use framework which accelerates LLM generation through multiple light-weighted decoding head. The overhead is much smaller than that of the draft model based speculative decoding design. My DAC22 paper is a Transformer-based model mapping on FPGA which features sequence-length adaptive hardware design and re-designed linear complexity self-attention. My ICCAD23 paper is a GNN mapping on GPU which features degree sorting, block-level partition and combined warp on GPU CUDA kernel, which maximizes GPU memory bandwidth and computational parallelism on SpMM operator.
Featured open source project
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads, 2023. [Code].
Featured publications
- MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training (ASPLOS), 2024. [Code].
- Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks (ICCAD), 2023. [Code].
- A length adaptive algorithm-hardware co-design of transformer on fpga through sparse attention and dynamic pipelining (DAC), 2022, publicity paper!.
- Towards sparsification of graph neural networks (ICCD), 20022. [Code].
Secure and Trustworthy AI/ML systems
My current research focuses on novel algorithms and hardware co-design for accelerating privacy-preserving machine learning, aiming to facilitate the practical deployment of PPML across various industries that interact with sensitive data, such as healthcare, biomedicine, banking, finance, etc.
Featured publications
- LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference (NeurIPS), 2023. [Code].
- AQ2PNN: Enabling Two-party Privacy-Preserving Deep Neural Network Inference with Adaptive Quantization(MICRO), 2023.
- AutoReP: Automatic ReLU Replacement for Fast Private Network Inference (ICCV), 2023. [Code].
- PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment (DAC), 2023. [Code].