Lindong Lu

According to our database1, Lindong Lu authored at least 4 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM.
CoRR, November, 2025

AICC: Parse HTML Finer, Make Models Better - A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser.
CoRR, November, 2025

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing.
CoRR, September, 2025

2024
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset.
CoRR, 2024


  Loading...