Introduction

Citations GitHub Stars Twitter Follow

Brian (Bo) Li is a final-year Ph.D. student in Computer Science at Nanyang Technological University, advised by Prof. Ziwei Liu. His research focuses on multimodal models and building artificial intelligence systems.

He co-founded LMMs-Lab with Prof. Ziwei Liu, a non-profit open-source community advancing multimodal AI through fully open models, data, and tools. Since 2024, they have made significant contributions to the field, including LLaVA-OneVision (performance matching commercial models, fully open), OneVision-Encoder (codec style vision encoder), LMMs-Eval (unified multimodal evaluation infrastructure), LMMs-Engine (unified multimodal models training infrastructure), and Multimodal-SAE (safety and interpretability research).

Beyond research, he occasionally writes science fiction exploring AI consciousness and the nature of understanding:

- [Selected publications](#selected-publications) - [Professional experience](#professional-experience) - [Professional services](#professional-services) - [Talks and lectures](#talks-and-lectures) - [Administrative roles](#administrative-roles) - [Peer review](#peer-review)

Selected publications

  1. LMMs Engine for Unified Multimodal Training Open-source Project A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale. Lead codebase design and core maintainer
  2. LLaVA-OneVision 1.5: Democratized Multimodal Training Open-source Project Fully open-source code, data, checkpoints and training logs; Provided a better open-source ViT; Proved the idea that simple scaling dense captions would improve overall multimodal tasks performance Xiang An, Yin Xie, Kaicheng Yang, Changrui Chen, Huajie Tan, Chunyuan Li, Zizhen Yan, Ziyong Feng, Ziwei Liu, Bo Li*, Jiankang Deng
  3. Aero-1-Audio Technical Blog Open models for wide range of audio tasks, trained on only 50K hours data yet achieving excellent performance, suggesting smart data > massive training; Lead development Bo Li*, Chen Change Loy, Fanyi Pu, Jingkang Yang, Kaichen Zhang*, Kairui Hu, Luu Minh Thang*, Nguyen Quang Trung*, Pham Ba Cong*, Shuai Liu, Yezhen Wang*, Ziwei Liu
  4. LLaVA-OneVision: Easy Visual Task Transfer TMLR 2025 SOTA-level fully open models (models/data/code) achieving GPT-4o-level performance across 30+ image and video tasks. Lead codebase, data curation, and evaluation Bo Li*, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li
  5. LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models NAACL 2025 Open-source evaluation frameworks spanning text, image, video, and audio tasks with 2.4K GitHub stars; contributed core framework and major code Kaichen Zhang*, Bo Li*, Peiyuan Zhang*, Fanyi Pu*, Joshua Adrian Cahyono*, Kairui Hu*, Shuai Liu*, Yuanhan Zhang*, Jingkang Yang*, Chunyuan Li*, Ziwei Liu*
  6. LLaVA-NeXT: Improved reasoning, OCR, and world knowledge Technical Blog Code First open models achieving GPT-4V-level performance, trained for 24 hours on 32 A100 GPUs. Proposed the idea that massive evaluation leads to better models Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, Yong Jae Lee
  7. MIMIC-IT: Multi-modal In-Context Instruction Tuning TPAMI 2025 Code Early (2023-10) experiment on a vision-language-agent (VLA) model with RLHF; proposed the idea and drafted the training code Bo Li*, Yuanhan Zhang*, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu
  8. Benchmarking and Analyzing Generative Data for Visual Recognition TPAMI 2025 Code Early (2022-12) experiment using synthetic data for visual recognition Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu
  9. Coordinating Multiple Vision-Language Models for Visual Reasoning NeurIPS 2023 Liangyu Chen*, Bo Li*, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, Ziwei Liu
  10. Sparse Mixture-of-Experts are Domain Generalizable Learners ICLR 2023 (Oral) Code First batch (2022-05) theoretical analysis of the mixture-of-experts architecture from a generalization perspective Bo Li*, Yifei Shen*, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu
  11. Invariant information bottleneck for domain generalization AAAI 2022 Code Bo Li, Yifei Shen, Yezhen Wang, Wenzhen Zhu, Dongsheng Li, Kurt Keutzer, Han Zhao
  12. Energy-Based Open-World Uncertainty Modeling for Confidence Calibration ICCV 2021 Code Yezhen Wang, Bo Li, Tong Che, Kaiyang Zhou, Ziwei Liu, Dongsheng Li
  13. Learning invariant representations and risks for semi-supervised domain adaptation CVPR 2021 Code Bo Li, Yezhen Wang, Shanghang Zhang, Dongsheng Li, Kurt Keutzer, Trevor Darrell, Han Zhao
  14. MADAN: multi-source adversarial domain aggregation network for domain adaptation IJCV 2021 Sicheng Zhao, Bo Li, Pengfei Xu, Xiangyu Yue, Guiguang Ding, Kurt Keutzer
  15. Rethinking distributional matching based domain adaptation arXiv preprint arXiv:2006.13352 Bo Li, Yezhen Wang, Tong Che, Shanghang Zhang, Yoshua Bengio, Kurt Keutzer
  16. Multi-source domain adaptation for semantic segmentation NeurIPS 2019 Code Sicheng Zhao*, Bo Li*, Xiangyu Yue*, Yang Gu, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer

Professional experience

Professional services

Talks and lectures

Administrative roles

Peer review

Conferences: ICCV (2021, 2023), NeurIPS (2022), BMVC (2023), AAAI (2023), CVPR (2022, 2023), AISTATS (2023), ICML (2023)

Journals: Pattern Recognition (PR), IEEE Transactions on Multimedia (TMM), IEEE TPAMI, IJCV


This page is styled after Wikipedia.