Taishi Nakamura

According to our database¹, Taishi Nakamura authored at least 15 papers between 2024 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources.

[BibT_eX]

[DOI]

Aleksandra Krasnodebska

CoRR, September, 2025

Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison.

[BibT_eX]

[DOI]

CoRR, September, 2025

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks.

[BibT_eX]

[DOI]

CoRR, August, 2025

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code.

[BibT_eX]

[DOI]

CoRR, May, 2025

Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search.

[BibT_eX]

[DOI]

CoRR, March, 2025

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Agent Skill Acquisition for Large Language Models via CycleQD.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs.

[BibT_eX]

[DOI]

Kazuki Fujii

Taishi Nakamura

Rio Yokota

CoRR, 2024

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities.

[BibT_eX]

[DOI]

CoRR, 2024

Building a Large Japanese Web Corpus for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order.

[BibT_eX]

[DOI]

CoRR, 2024

Taishi Nakamura

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...