Introduction
Brian (Bo) Li is a final-year Ph.D. student in Computer Science at Nanyang Technological University, advised by Prof. Ziwei Liu. His research focuses on multimodal models and building artificial intelligence systems.
He co-founded LMMs-Lab with Prof. Ziwei Liu, a non-profit open-source community advancing multimodal AI through fully open models, data, and tools. Since 2024, they have made significant contributions to the field, including LLaVA-OneVision (performance matching commercial models, fully open), OneVision-Encoder (codec style vision encoder), LMMs-Eval (unified multimodal evaluation infrastructure), LMMs-Engine (unified multimodal models training infrastructure), and Multimodal-SAE (safety and interpretability research).
Beyond research, he occasionally writes science fiction exploring AI consciousness and the nature of understanding:
- 压缩 (2025)
Selected publications
- LMMs Engine for Unified Multimodal Training Open-source Project A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale. Lead codebase design and core maintainer
- LLaVA-OneVision 1.5: Democratized Multimodal Training Open-source Project Fully open-source code, data, checkpoints and training logs; Provided a better open-source ViT; Proved the idea that simple scaling dense captions would improve overall multimodal tasks performance
- Aero-1-Audio Technical Blog Open models for wide range of audio tasks, trained on only 50K hours data yet achieving excellent performance, suggesting smart data > massive training; Lead development
- LLaVA-OneVision: Easy Visual Task Transfer TMLR 2025 SOTA-level fully open models (models/data/code) achieving GPT-4o-level performance across 30+ image and video tasks. Lead codebase, data curation, and evaluation
- LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models NAACL 2025 Open-source evaluation frameworks spanning text, image, video, and audio tasks with 2.4K GitHub stars; contributed core framework and major code
- LLaVA-NeXT: Improved reasoning, OCR, and world knowledge Technical Blog Code First open models achieving GPT-4V-level performance, trained for 24 hours on 32 A100 GPUs. Proposed the idea that massive evaluation leads to better models
- MIMIC-IT: Multi-modal In-Context Instruction Tuning TPAMI 2025 Code Early (2023-10) experiment on a vision-language-agent (VLA) model with RLHF; proposed the idea and drafted the training code
- Benchmarking and Analyzing Generative Data for Visual Recognition TPAMI 2025 Code Early (2022-12) experiment using synthetic data for visual recognition
- Sparse Mixture-of-Experts are Domain Generalizable Learners ICLR 2023 (Oral) Code First batch (2022-05) theoretical analysis of the mixture-of-experts architecture from a generalization perspective
- Rethinking distributional matching based domain adaptation arXiv preprint arXiv:2006.13352
Professional experience
-
Aug. 2025 – Present: ByteDance Seed, Singapore. With Haoqi Fan, working on unified multimodal models.
-
Oct. 2024 – Aug. 2025: TikTok AI Innovation Center, Singapore. With Dr. Wei Li and Dr. Zejun Ma.
-
Dec. 2023 – Aug. 2024: ByteDance Seed, Singapore. With Dr. Chunyuan Li, building open-source multimodal models.
-
Dec. 2022 – Aug. 2023: Microsoft Research, Redmond. With Dr. Chunyuan Li, collaborated with Haotian Liu on the LLaVA project.
-
Sep. 2020 – Dec. 2021: Microsoft Research, Shanghai. With Dr. Dongsheng Li.
-
Oct. 2019 – Aug. 2020: Berkeley AI Research, CA, USA. With Prof. Kurt Keutzer, Prof. Sicheng Zhao, Prof. Xiangyu Yue, Prof. Shanghang Zhang, and Dr. Colorado Reed.
-
May 2018 – Oct. 2019: DiDi Visual Perception Team, Beijing.
Professional services
Talks and lectures
- Multimodal Models @ Jump Trading (2025), hosted by Weifeng Liu
- Guest Lecture: Multimodal Models @ UMich EECS 542, hosted by Stella X. Yu
- Multimodal Models @ TwelveLabs (2024), hosted by James Le
- Otter & MIMICIT @ Alibaba Damo Academy (2023), hosted by Dr. Lidong Bing
Administrative roles
- Cluster Administrator, S-Lab @ NTU (70+ users, 400+ GPUs)
- Organizer, The AI Talk
Peer review
Conferences: ICCV (2021, 2023), NeurIPS (2022), BMVC (2023), AAAI (2023), CVPR (2022, 2023), AISTATS (2023), ICML (2023)
Journals: Pattern Recognition (PR), IEEE Transactions on Multimedia (TMM), IEEE TPAMI, IJCV
This page is styled after Wikipedia.