We stand with Ukraine

We stand with Ukraine

Jianfei Chen

Orcid: 0000-0002-9279-6098

Affiliations:

Tsinghua University, Beijing, China

According to our database¹, Jianfei Chen authored at least 88 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org
on openreview.net

On csauthors.net:

Bibliography

2026

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels.

[DOI]

,

,

,

,

,

CoRR, May, 2026

Deterministic Differentiable Structured Pruning for Large Language Models.

[DOI]

,

,

,

,

,

CoRR, March, 2026

SageBwd: A Trainable Low-bit Attention.

[DOI]

,

,

,

,

,

Joseph E. Gonzalez

,

,

CoRR, March, 2026

SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning.

[DOI]

,

,

,

,

,

,

,

CoRR, February, 2026

SLA2: Sparse-Linear Attention with Learnable Routing and QAT.

[DOI]

,

,

,

,

,

,

,

,

Joseph E. Gonzalez

CoRR, February, 2026

2025

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times.

[DOI]

,

,

,

,

,

Joseph E. Gonzalez

,

,

CoRR, December, 2025

TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control.

[DOI]

,

,

,

,

,

,

CoRR, October, 2025

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models.

[DOI]

,

,

,

CoRR, September, 2025

Efficient Hyperparameter Tuning via Trajectory Invariance Principle.

[DOI]

,

,

,

,

CoRR, September, 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention.

[DOI]

,

,

,

,

,

,

,

,

,

,

Joseph E. Gonzalez

,

,

CoRR, September, 2025

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models.

[DOI]

,

,

,

,

,

Mach. Intell. Res., August, 2025

SageAttention2++: A More Efficient Implementation of SageAttention2.

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2025

LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training.

[DOI]

,

,

,

,

,

,

,

,

CoRR, May, 2025

Accurate INT8 Training Through Dynamic Block-Level Fallback.

[DOI]

,

,

,

,

CoRR, March, 2025

Identifying Sensitive Weights via Post-quantization Integral.

[DOI]

,

,

,

,

,

,

CoRR, March, 2025

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference.

[DOI]

,

,

,

,

,

,

CoRR, February, 2025

Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, February, 2025

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs.

[DOI]

,

,

,

,

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference.

[DOI]

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization.

[DOI]

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Sparse Video-Gen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

FrameBridge: Improving Image-to-Video Generation with Bridge Models.

[DOI]

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Oscillation-Reduced MXFP4 Training for Vision Transformers.

[DOI]

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Visual Generation Without Guidance.

[DOI]

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SparseDM: Toward Sparse Efficient Diffusion Models.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Diffusion Bridge Implicit Models.

[DOI]

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Elucidating the Preconditioning in Consistency Distillation.

[DOI]

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration.

[DOI]

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training.

[DOI]

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing.

[DOI]

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent.

[DOI]

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training.

[DOI]

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Calibrating Deep Ensemble through Functional Variational Inference.

[DOI]

,

,

,

,

Trans. Mach. Learn. Res., 2024

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration.

[DOI]

,

,

,

,

,

CoRR, 2024

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training.

[DOI]

,

,

,

,

,

,

CoRR, 2024

FrameBridge: Improving Image-to-Video Generation with Bridge Models.

[DOI]

,

,

,

,

CoRR, 2024

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization.

[DOI]

,

,

,

,

,

CoRR, 2024

C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

S-STE: Continuous Pruning Function for Efficient 2: 4 Sparse Pre-training.

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Consistency Diffusion Bridge Models.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization.

[DOI]

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Accelerating Transformer Pre-training with 2: 4 Sparsity.

[DOI]

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Backpropagation with Variance Controlled Adaptive Sampling.

[DOI]

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Parameter-efficient fine-tuning of large-scale pre-trained language models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Nat. Mac. Intell., March, 2023

Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting.

[DOI]

,

,

,

,

CoRR, 2023

DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Training Transformers with 4-bit Integers.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Memory Efficient Optimizers with 4-bit States.

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs.

[DOI]

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Stabilizing GANs' Training with Brownian Motion Controller.

[DOI]

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning.

[DOI]

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models.

[DOI]

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

GACT: Activation Compressed Training for General Architectures.

[DOI]

,

,

,

,

,

,

,

,

,

,

Michael W. Mahoney

,

CoRR, 2022

Deep Ensemble as a Gaussian Process Approximate Posterior.

[DOI]

,

,

,

,

CoRR, 2022

Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2022

DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fast Lossless Neural Compression with Integer-Only Discrete Flows.

[DOI]

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2022

Maximum Likelihood Training for Score-based Diffusion ODEs by High Order Denoising Score Matching.

[DOI]

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2022

GACT: Activation Compressed Training for Generic Network Architectures.

[DOI]

,

,

,

,

,

,

,

,

,

,

Michael W. Mahoney

,

Proceedings of the International Conference on Machine Learning, 2022

2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training.

[DOI]

,

,

,

,

,

Michael W. Mahoney

,

Joseph Gonzalez

Proceedings of the 38th International Conference on Machine Learning, 2021

Implicit Normalizing Flows.

[DOI]

,

,

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud.

[DOI]

,

,

,

,

,

Joseph E. Gonzalez

CoRR, 2020

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks.

[DOI]

,

,

,

Michael W. Mahoney

,

Joseph E. Gonzalez

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

VFlow: More Expressive Generative Flows with Variational Data Augmentation.

[DOI]

,

,

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

2018

Scalable Training of Hierarchical Topic Models.

[DOI]

,

,

,

Proc. VLDB Endow., 2018

Dropout training for SVMs with data augmentation.

[DOI]

,

,

,

Frontiers Comput. Sci., 2018

Stochastic Expectation Maximization with Variance Reduction.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Stochastic Training of Graph Convolutional Networks with Variance Reduction.

[DOI]

,

,

Proceedings of the 35th International Conference on Machine Learning, 2018

Towards Training Probabilistic Topic Models on Neuromorphic Multi-Chip Systems.

[DOI]

,

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Stochastic Training of Graph Convolutional Networks.

[DOI]

,

CoRR, 2017

ZhuSuan: A Library for Bayesian Deep Learning.

[DOI]

,

,

,

,

,

,

CoRR, 2017

Scalable Inference for Nested Chinese Restaurant Process Topic Models.

[DOI]

,

,

,

CoRR, 2017

Population Matching Discrepancy and Applications in Deep Learning.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs.

[DOI]

,

,

,

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016

TopicPanorama: A Full Picture of Relevant Topics.

[DOI]

,

,

,

,

,

IEEE Trans. Vis. Comput. Graph., 2016

WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation.

[DOI]

,

,

,

Proc. VLDB Endow., 2016

Streaming Gibbs Sampling for LDA Model.

[DOI]

,

,

CoRR, 2016

Scaling up Dynamic Topic Models.

[DOI]

,

,

,

Proceedings of the 25th International Conference on World Wide Web, 2016

Distributing the Stochastic Gradient Sampler for Large-Scale LDA.

[DOI]

,

,

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

2015

WarpLDA: a Simple and Efficient O(1) Algorithm for Latent Dirichlet Allocation.

[DOI]

,

,

,

CoRR, 2015

2014

Big Learning with Bayesian Methods.

[DOI]

,

,

CoRR, 2014

TopicPanorama: A full picture of relevant topics.

[DOI]

,

,

,

,

Proceedings of the 9th IEEE Conference on Visual Analytics Science and Technology, 2014

Bayesian Max-margin Multi-Task Learning with Data Augmentation.

[DOI]

,

,

Proceedings of the 31th International Conference on Machine Learning, 2014

Dropout Training for Support Vector Machines.

[DOI]

,

,

,

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013

Scalable Inference for Logistic-Normal Topic Models.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Loading...