Research

Efficient machine learning systems
Secure and Trustworthy AI/ML systems

Efficient machine learning systems

My current research focuses on efficient algorithm design and mapping on existing computing platforms such as FPGA and GPU. Medusa is a easy-to-use framework which accelerates LLM generation through multiple light-weighted decoding head. The overhead is much smaller than that of the draft model based speculative decoding design. My DAC22 paper is a Transformer-based model mapping on FPGA which features sequence-length adaptive hardware design and re-designed linear complexity self-attention. My ICCAD23 paper is a GNN mapping on GPU which features degree sorting, block-level partition and combined warp on GPU CUDA kernel, which maximizes GPU memory bandwidth and computational parallelism on SpMM operator.

Featured open source project

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads, 2023. [Code].

Featured publications

MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training (ASPLOS), 2024. [Code].
Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks (ICCAD), 2023. [Code].
A length adaptive algorithm-hardware co-design of transformer on fpga through sparse attention and dynamic pipelining (DAC), 2022, publicity paper!.
Towards sparsification of graph neural networks (ICCD), 20022. [Code].

Secure and Trustworthy AI/ML systems

My current research focuses on novel algorithms and hardware co-design for accelerating privacy-preserving machine learning, aiming to facilitate the practical deployment of PPML across various industries that interact with sensitive data, such as healthcare, biomedicine, banking, finance, etc.

Featured publications

LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference (NeurIPS), 2023. [Code].
AQ2PNN: Enabling Two-party Privacy-Preserving Deep Neural Network Inference with Adaptive Quantization(MICRO), 2023.
AutoReP: Automatic ReLU Replacement for Fast Private Network Inference (ICCV), 2023. [Code].
PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment (DAC), 2023. [Code].

Hongwu Peng

Research

Efficient machine learning systems

Featured open source project

Featured publications

Secure and Trustworthy AI/ML systems

Featured publications