Yu Bao (鲍宇)

Profile Photo

Research Scientist@ByteDance Seed: MultiModal, LLM, AI for Science and NLP research

About Me

I am currently a Research Scientist at ByteDance Seed, specializing in Large Language Model (LLM), AI for Science (AI4S), and Natural Language Processing (NLP) research.

Before this, I earned my Ph.D. in March 2022 from the Natural Language Processing Group of Nanjing University, co-supervised by Prof. Shujian Huang and Prof. Jiajun Chen. During my doctoral studies, I interned at ByteDance AI Lab, mentored by Prof. Zhou Hao and Prof. Lei Li, where I researched deep generative modeling (e.g., non-autoregressive text generation, latent variable modeling).

🔴 Long-term hiring! Now join us to build cutting-edge AI systems. Open positions include: (1) Top Seed Program (rolling recruitment for PhD candidates & recent graduates, internships & full-time roles available), reach out via baoyu.001@bytedance.com or (2) direct applications via job listings. Feel free to inquire about role details anytime.

News

  • [2025.08] 📢 A paper is accepted by EMNLP 2025; our latest work on LLM preference optimization (DuPO) is now available as a preprint.
  • [2025.07] 🚀 Two key projects released: (1) Seed-X: 7B-parameter multilingual translation LLM with open-sourced models and demos. Available on Hugging Face and Demo; (2) Seed LiveInterpret 2.0: End-to-end simultaneous speech-to-speech translation system with 3-second latency (70% reduction from prior solutions) and voice cloning, see Technical Report and Demo.

Awards

  • 2022, Excellent Doctoral Paper Award, Jiangsu Association of Artificial Intelligence (JSAI)
  • 2020, Outstanding Ph.D. Candidate, Nanjing University
  • 2019, Outstanding Graduate Student, Nanjing University

Technical Reports

  1. ByteDance Seed Team, Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice, [Homepage, Demo], 2025
  2. ByteDance Seed Team, Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters, [HF, Demo], 2025

Publications/Preprints

[Full list] [*: equal contributions] [interns/students I mentored]

  1. Shuaijie She, Yu Bao, Yu Lu, Lu Xu, Tao Li, Wenhao Zhu, Shujian Huang, Shanbo Cheng, Lu Lu, Yuxuan Wang, DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization, Preprint 2025
  2. Shimao Zhang, Yu Bao, Shujian Huang, EDT: Improving Large Language Models by Entropy-based Dynamic Temperature Sampling, Preprint 2024
  3. Xiangxin Zhou*, Xiwei Cheng*, Yuwei Yang, Yu Bao, Liang Wang, Quanquan Gu, DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization, ICLR 2024
  4. Jiaqi Guan*, Xiangxin Zhou*, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, Quanquan Gu, DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design, ICML 2023
  5. Min Liu, Yu Bao, Chengqi Zhao, Shujian Huang, Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation, AAAI 2023
  6. Yu Bao, Hao Zhou, Shujian Huang, Dongqi Wang, Lihua Qian, Xinyu Dai, Jiajun Chen, Lei Li, latent-GLAT: Glancing at Latent Variables for Parallel Text Generation, ACL 2022
  7. Yu Bao, Shujian Huang, Tong Xiao, Dongqi Wang, Xinyu Dai, Jiajun Chen, Non-Autoregressive Translation by Learning Target Categorical Codes, NAACL-HLT 2021
  8. Jiahuan Li*, Yu Bao*, Shujian Huang, Xinyu Dai, Jiajun Chen, Explicit Semantic Decomposition for Definition Generation, ACL 2020
  9. Yu Bao, Hao Zhou, Jiangtao Feng, Mingxuan Wang, Shujian Huang, Jiajun Chen, Lei Li, PNAT: Non-Autoregressive Transformer by Position Learning, Preprint 2019
  10. Yu Bao*, Hao Zhou*, Shujian Huang, Lei Li, Lili Mou, Olga Vechtomova, Xinyu Dai, Jiajun Chen, Generating Sentences from Disentangled Syntactic and Semantic Spaces, ACL 2019