Jiaheng Liu

Orcid: 0000-0002-5183-8538

According to our database1, Jiaheng Liu authored at least 162 papers between 2018 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning.
CoRR, August, 2025

Efficient Agents: Building Effective Agents While Reducing Cost.
CoRR, August, 2025

Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning.
CoRR, August, 2025

IFEvalCode: Controlled Code Generation.
CoRR, July, 2025

Multilingual Multimodal Software Developer for Code Generation.
CoRR, July, 2025

KAT-V1: Kwai-AutoThink Technical Report.
CoRR, July, 2025

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving.
CoRR, July, 2025

A Survey on Latent Reasoning.
CoRR, July, 2025

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization.
CoRR, July, 2025

ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation.
CoRR, July, 2025

SPEAR: Structured Pruning for Spiking Neural Networks via Synaptic Operation Estimation and Reinforcement Learning.
CoRR, July, 2025

OAgents: An Empirical Study of Building Effective Agents.
CoRR, June, 2025

Scaling Test-time Compute for LLM Agents.
CoRR, June, 2025

TaskCraft: Automated Generation of Agentic Tasks.
CoRR, June, 2025

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library.
CoRR, June, 2025

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding.
CoRR, May, 2025

USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models.
CoRR, May, 2025

Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models.
CoRR, May, 2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation.
CoRR, May, 2025

Think-J: Learning to Think for Generative LLM-as-a-Judge.
CoRR, May, 2025

Table-R1: Region-based Reinforcement Learning for Table Understanding.
CoRR, May, 2025

Flow-GRPO: Training Flow Matching Models via Online RL.
CoRR, May, 2025

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models.
CoRR, April, 2025

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs.
CoRR, April, 2025

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model.
CoRR, April, 2025

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values.
CoRR, April, 2025

A Comprehensive Survey on Long Context Language Modeling.
CoRR, March, 2025

Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation.
CoRR, March, 2025

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
CoRR, February, 2025

AIR: Complex Instruction Generation via Automatic Iterative Refinement.
CoRR, February, 2025

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models.
CoRR, February, 2025

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines.
CoRR, February, 2025

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models.
CoRR, February, 2025

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs.
CoRR, February, 2025

ChineseSimpleVQA - "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models.
CoRR, February, 2025

Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models.
CoRR, February, 2025

SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks.
CoRR, February, 2025

Distillation Quantification for Large Language Models.
CoRR, January, 2025

Aligning Instruction Tuning with Pre-training.
CoRR, January, 2025

ProgCo: Program Helps Self-Correction of Large Language Models.
CoRR, January, 2025

Molecular graph contrastive learning with line graph.
Pattern Recognit., 2025

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MuPT: A Generative Symbolic Music Pretrained Transformer.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

McEval: Massively Multilingual Code Evaluation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LIME: Less Is More for MLLM Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

M2RC-EVAL: Massively Multilingual Repository-level Code Completion Evaluation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Quantification of Large Language Model Distillation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Can MLLMs Understand the Deep Implication Behind Chinese Images?
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

XCOT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Deep learning-based multimodal fusion of the surface ECG and clinical features in prediction of atrial fibrillation recurrence following catheter ablation.
BMC Medical Informatics Decis. Mak., December, 2024

A unified efficient deep image compression framework and its application on human-centric Task.
Multim. Tools Appl., September, 2024

Reuse distance-based shared LLC management mechanism for heterogeneous CPU-GPU systems.
IEICE Electron. Express, 2024

mt4CrossOIE: Multi-stage tuning for cross-lingual open information extraction.
Expert Syst. Appl., 2024

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models.
CoRR, 2024

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models.
CoRR, 2024

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos.
CoRR, 2024

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models.
CoRR, 2024

MdEval: Massively Multilingual Code Debugging.
CoRR, 2024

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation.
CoRR, 2024

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions.
CoRR, 2024

Aligning CodeLLMs with Direct Preference Optimization.
CoRR, 2024

Can MLLMs Understand the Deep Implication Behind Chinese Images?
CoRR, 2024

PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment.
CoRR, 2024

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model.
CoRR, 2024

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models.
CoRR, 2024

ING-VP: MLLMs cannot Play Easy Vision-based Games Yet.
CoRR, 2024

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks.
CoRR, 2024

MIO: A Foundation Model on Multimodal Tokens.
CoRR, 2024

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models.
CoRR, 2024

OmniBench: Towards The Future of Universal Omni-Language Models.
CoRR, 2024

FuzzCoder: Byte-level Fuzzing Test via Large Language Model.
CoRR, 2024

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering.
CoRR, 2024

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm.
CoRR, 2024

NC-NCD: Novel Class Discovery for Node Classification.
CoRR, 2024

MMRA: A Benchmark for Multi-granularity Multi-image Relational Association.
CoRR, 2024

LongIns: A Challenging Long-context Instruction-based Exam for LLMs.
CoRR, 2024

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models.
CoRR, 2024

Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level.
CoRR, 2024

McEval: Massively Multilingual Code Evaluation.
CoRR, 2024

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models.
CoRR, 2024

Towards Real-world Scenario: Imbalanced New Intent Discovery.
CoRR, 2024

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models.
CoRR, 2024

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series.
CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.
CoRR, 2024

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model.
CoRR, 2024

The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis.
CoRR, 2024

Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging.
CoRR, 2024

MLAD: A Unified Model for Multi-system Log Anomaly Detection.
CoRR, 2024

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models.
CoRR, 2024

Fractional-order rate-dependent thermoelastic diffusion theory based on new definitions of fractional derivatives with non-singular kernels and the associated structural transient dynamic responses analysis of sandwich-like composite laminates.
Commun. Nonlinear Sci. Numer. Simul., 2024

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

DDK: Distilling Domain Knowledge for Efficient Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Scripts.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VRDistill: Vote Refinement Distillation for Efficient Indoor 3D Object Detection.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Compressing Large Language Models by Joint Sparsification and Quantization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

OWL: A Large Language Model for IT Operations.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts.
Proceedings of the Computer Vision - ECCV 2024, 2024

LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

NC<sup>2</sup>D: Novel Class Discovery for Node Classification.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Towards Real-world Scenario: Imbalanced New Intent Discovery.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

UniCoder: Scaling Code Large Language Model via Universal Code.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

E2-LLM: Efficient and Extreme Length Extension of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
GeometryMotion-Transformer: An End-to-End Framework for 3D Action Recognition.
IEEE Trans. Multim., 2023

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models.
CoRR, 2023

Multilingual Entity and Relation Extraction from Unified to Language-specific Training.
CoRR, 2023

ICD-Face: Intra-class Compactness Distillation for Face Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Achieving Fine-grained Word Sense Disambiguation with Context Hypergraph and Sememe Hypergraph.
Proceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering, 2023

M2C: Towards Automatic Multimodal Manga Complement.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction.
Proceedings of the Database Systems for Advanced Applications, 2023

GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

Multilingual Entity and Relation Extraction from Unified to Language-specific Training.
Proceedings of the International Conference on Electronics, 2023

Adaptive Contrastive Knowledge Distillation for BERT Compression.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Improved Harris Hawks Optimization for Configuration of PV Intelligent Edge Terminals.
IEEE Trans. Sustain. Comput., 2022

APSNet: Toward Adaptive Point Sampling for Efficient 3D Action Recognition.
IEEE Trans. Image Process., 2022

3D-Pruning: A Model Compression Framework for Efficient 3D Action Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2022

JointPruning: Pruning Networks Along Multiple Dimensions for Efficient Point Cloud Processing.
IEEE Trans. Circuits Syst. Video Technol., 2022

A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy.
IEICE Trans. Inf. Syst., 2022

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation.
CoRR, 2022

LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation.
CoRR, 2022

Cross-Lingual Cross-Modal Consolidation for Effective Multilingual Video Corpus Moment Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

Computer-Aided Tuberculosis Diagnosis with Attribute Reasoning Assistance.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, 2022

LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

OneFace: One Threshold for All.
Proceedings of the Computer Vision - ECCV 2022, 2022

CoupleFace: Relation Matters for Face Recognition Distillation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Deep 3D Vessel Segmentation based on Cross Transformer Network.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2022

AnchorFace: Boosting TAR@FAR for Practical Face Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Block Proposal Neural Architecture Search.
IEEE Trans. Image Process., 2021

GeometryMotion-Net: A Strong Two-Stream Baseline for 3D Action Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2021

Inter-class Discrepancy Alignment for Face Recognition.
CoRR, 2021

Improved adaptive gray wolf genetic algorithm for photovoltaic intelligent edge terminal optimal configuration.
Comput. Electr. Eng., 2021

DAM: Discrepancy Alignment Metric for Face Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
A Unified End-to-End Framework for Efficient Deep Image Compression.
CoRR, 2020

A Conflict-Aware Capacity Control Mechanism for Last-Level Cache.
Proceedings of the Eighth International Symposium on Computing and Networking Workshops, 2020

Learning to Auto Weight: Entirely Data-Driven and Highly Efficient Weighting Framework.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
LAW: Learning to Auto Weight.
CoRR, 2019

Correlation Congruence for Knowledge Distillation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Knowledge Distillation via Route Constrained Optimization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Aggregate Signature Consensus Scheme Based on FPGA.
Proceedings of the Blockchain and Trustworthy Systems - First International Conference, 2019

2018
New 3D 16-Ary Signal Constellations and Their Symbol Error Probabilities in AWGN and Rayleigh Fading Channels.
Wirel. Commun. Mob. Comput., 2018


  Loading...