Han Xiao

Orcid: 0000-0003-0259-5688

Affiliations:
  • TU München, Germany (Ph.D.)


According to our database1, Han Xiao authored at least 34 papers between 2009 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers.
CoRR, May, 2026

mlx-vis: GPU-Accelerated Dimensionality Reduction and Visualization on Apple Silicon.
CoRR, March, 2026

jina-embeddings-v5-text: Task-Targeted Embedding Distillation.
CoRR, February, 2026

Embedding Inversion via Conditional Masked Diffusion Language Models.
CoRR, February, 2026

Embedding Compression via Spherical Coordinates.
CoRR, February, 2026

2025
Jina-VLM: Small Multilingual Vision Language Model.
CoRR, December, 2025

jina-reranker-v3: Last but Not Late Interaction for Document Reranking.
CoRR, September, 2025

Efficient Code Embeddings from Code Generation Models.
CoRR, August, 2025

jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval.
CoRR, June, 2025

ReaderLM-v2: Small Language Model for HTML to Markdown and JSON.
CoRR, March, 2025

Jina Embeddings V3: Multilingual Text Encoder with Low-Rank Adaptations.
Proceedings of the Advances in Information Retrieval, 2025

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images.
CoRR, 2024

jina-embeddings-v3: Multilingual Embeddings With Task LoRA.
CoRR, 2024

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models.
CoRR, 2024

Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever.
CoRR, 2024

Jina CLIP: Your CLIP Model Is Also Your Text Retriever.
CoRR, 2024

Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings.
CoRR, 2024

2023
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents.
CoRR, 2023

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models.
CoRR, 2023

2017
A Classification Model for Diverse and Noisy Labelers.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2017

2015
From Adversarial Learning to Reliable and Scalable Learning.
PhD thesis, 2015

Feature Selection and Extraction for Malware Classification.
J. Inf. Sci. Eng., 2015

Support vector machines under adversarial label contamination.
Neurocomputing, 2015

Learning better while sending less: Communication-efficient online semi-supervised learning in client-server settings.
Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, 2015

2013
Learning from Multiple Observers with Unknown Expertise.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2013

Efficient Online Sequence Prediction with Side Information.
Proceedings of the 2013 IEEE 13th International Conference on Data Mining, 2013

OPARS: Objective Photo Aesthetics Ranking System.
Proceedings of the Advances in Information Retrieval, 2013

Lazy Gaussian Process Committee for Real-Time Online Regression.
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012
Evasion Attack of Multi-class Linear Classifiers.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2012

Adversarial Label Flips Attack on Support Vector Machines.
Proceedings of the ECAI 2012, 2012

2010
Efficient Collapsed Gibbs Sampling for Latent Dirichlet Allocation.
Proceedings of the 2nd Asian Conference on Machine Learning, 2010

2009
Constructing Parallel Corpus from Movie Subtitles.
Proceedings of the Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy, 2009

Injecting Structured Data to Generative Topic Model in Enterprise Settings.
Proceedings of the Advances in Machine Learning, 2009


  Loading...