From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval.

[BibT_eX]

[DOI]

Jian Jia

Jingtong Gao

CoRR, February, 2025

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MOTION: Multi-object Video Editing with Training-Free Attention Guidance.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2025

Music Grounding by Short Video.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Improving Preference Alignment of LLM with Inference-Free Self-Refinement.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

SweetTokenizer: Semantic-Aware Spatial-Temporal Tokenizer for Compact Visual Discretization.

[BibT_eX]

[DOI]

CoRR, 2024

Text-Video Multi-Grained Integration for Video Moment Montage.

[BibT_eX]

[DOI]

CoRR, 2024

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy.

[BibT_eX]

[DOI]

CoRR, 2024

Video to Music Moment Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application.

[BibT_eX]

[DOI]

CoRR, 2024

Knowledge Condensation and Reasoning for Knowledge-based VQA.

[BibT_eX]

[DOI]

CoRR, 2024

Spatiotemporal Fine-grained Video Description for Short Videos.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Cross-view Semantic Alignment for Livestreaming Product Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Cross-Domain Product Representation Learning for Rich-Content E-Commerce.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2021

Boosting Image Outpainting with Semantic Layout Prediction.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Progressive Feature Polishing Network for Salient Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Progressive Feature Polishing Network for Salient Object Detection.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Semantic Human Matting.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Quan Chen

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...