Jinfa Huang

Orcid: 0000-0002-0081-4106

According to our database¹, Jinfa Huang authored at least 34 papers between 1987 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

MagicTime: Time-Lapse Video Generation Models as Metamorphic Simulators.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

A Survey on Latent Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

GPT-4V(ision) as A Social Media Analysis Engine.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., June, 2025

LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

Aligning, Autoencoding and Prompting Large Language Models for Novel Disease Reporting.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2025

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, May, 2025

TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration.

[BibT_eX]

[DOI]

CoRR, May, 2025

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension.

[BibT_eX]

[DOI]

CoRR, March, 2025

Autoregressive Models in Vision: A Survey.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

A multimodal multidomain multilingual medical foundation model for zero shot clinical diagnosis.

[BibT_eX]

[DOI]

npj Digit. Medicine, 2025

CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Identity-Preserving Text-to-Video Generation by Frequency Decomposition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Identity-Preserving Text-to-Video Generation by Frequency Decomposition.

[BibT_eX]

[DOI]

CoRR, 2024

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension.

[BibT_eX]

[DOI]

CoRR, 2024

A Survey of Camouflaged Object Detection and Beyond.

[BibT_eX]

[DOI]

CoRR, 2024

MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

LLMBind: A Unified Modality-Task Integration Framework.

[BibT_eX]

[DOI]

CoRR, 2024

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

Improving Scene Graph Generation with Superpixel-Based Interaction Learning.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, 2022

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2020

Guoym at SemEval-2020 Task 8: Ensemble-based Classification of Visuo-Lingual Metaphor in Memes.

[BibT_eX]

[DOI]

Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020

LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding.

[BibT_eX]

[DOI]

Proceedings of the ICMI '20: International Conference on Multimodal Interaction, 2020

1987

A Chinese Mandarin speech output system.

[BibT_eX]

[DOI]

Yuhang Mao

Jinfa Huang

Guozhen Zhang

Proceedings of the European Conference on Speech Technology, 1987

Jinfa Huang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...