Yixiao Ge
I am currently a principal researcher at Tencent ARC Lab, leading an effort on multimodal foundation models, open-world visual comprehension, and efficient AI.
I got my Ph.D. degree from Multimedia Lab (MMLab), the Chinese University of Hong Kong,
and my B.Eng. degree from Huazhong University of Science and Technology.
Actively looking for self-motivated interns to work on related research topics. Feel free to reach out if you are interested.
News:
- [Sep 2024] One paper is accepted to NeurIPS 2024 as a spotlight presentation.
- [July 2024] Two papers are accepted to ECCV 2024, and one paper is accepted to TMLR.
- [July 2024] Excited to release two open-source projects, MLLM-NPU and Open-MAGVIT2.
- [May 2024] One paper is accepted to the main conference of ACL 2024.
- [Apr 2024] Excited to release SEED-X, the latest version of the SEED series.
- [Feb 2024] Nine papers are accepted to CVPR 2024.
- [Feb 2024] Excited to release YOLO-World, a real-time open-vocabulary object detector.
- [Jan 2024] One paper is accepted to ICLR 2024.
- [Jan 2024] Excited to release LLaMA Pro, the SOTA model among the LLaMA family.
- [Dec 2023] One paper is accepted to AAAI 2024.
- [Nov 2023] Glad to launch SEED-Bench-2 and ViT-Lens-2!
- [Oct 2023] Excited to unveil SEED-LLaMA (SEED-2), featuring in-context emergent capabilities.
- [Sep 2023] Three papers are accepted to NeurIPS 2023.
- [Aug 2023] Glad to release ViT-Lens, advancing omni-modal representation learning.
- [Aug 2023] Glad to release SEED-Bench, the most comprehensive MLLM benchmark to date.
- [July 2023] Glad to release SEED, an image tokenizer tailored for LLM.
- [Jan-July 2023] 11 papers were accepted by ICLR/CVPR/ICML/KDD/ICCV 2023.
- [Jan-Nov 2022] 11 papers were accepted by ICLR/CVPR/IJCAI/ECCV 2022 and AAAI 2023, 2 of which were oral.
- [Mar-Jul 2021] 5 papers were accepted by CVPR/ICCV 2021.
- [Jan-Sep 2020] 3 papers were accepted by ICLR/ECCV/NeurIPS 2020, 1 of which was spotlight.
Open-source Projects
-
Chen Li, Tianheng Cheng, Yuying Ge, Teng Wang, Yixiao Ge
-
Zhuoyan Luo, Fengyuan Shi, Yixiao Ge
Publications
Selected Preprints:
-
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and GenerationThe latest version of the SEED series, towards multimodal models in the real world.Yuying Ge*, Sijie Zhao*, Jinguo Zhu*, Yixiao Ge#, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan[Tech Report] [Code]
2024:
-
GrootVL: Tree Topology is All You Need in State Space ModelYicheng Xiao, Lin Song, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan
-
Vision-language instruction tuning: A review and analysisChen Li, Yixiao Ge, Dian Li, Ying Shan
-
ST-LLM: Large Language Models Are Effective Temporal LearnersRuyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li
-
DreamDiffusion: Generating High-Quality Images from Brain EEG SignalsYunpeng Bai, Xintao Wang, Yan-pei Cao, Yixiao Ge, Chun Yuan, Ying Shan
-
LLaMA Pro: Progressive LLaMA with Block ExpansionSOTA foundation models among the LLaMA family, excelling in general tasks, code, and math.Chengyue Wu, Yukang Gan, Yixiao Ge#, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan
-
YOLO-World: Real-Time Open-Vocabulary Object DetectionA real-time open-vocabulary object detector with SOTA performance.Tianheng Cheng*, Lin Song*#, Yixiao Ge#, Wenyu Liu, Xinggang Wang#, Ying Shan
-
ViT-Lens: Towards Omni-modal RepresentationsAdvancing omni-modal representation learning with modality lens. Support 3D point cloud, depth, audio, tactile, EEG. Enable any-modality to text and image generation.Weixian Lei, Yixiao Ge#, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou#
-
SEED-Bench: Benchmarking Multimodal Large Language ModelsComprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions, including the evaluation of both text and image generation.Bohao Li*, Yuying Ge*, Yixiao Ge#, Guangzhi Wang, Rui Wang, Ruimao Zhang#, Ying Shan
-
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image RecognitionXiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan
-
BT-Adapter: Video Conversation is Feasible Without Video Instruction TuningRuyang Liu, Chen Li, Yixiao Ge, Ying Shan, Thomas H. Li, Ge Li
-
SmartEdit: Exploring Complex Instruction-based Image Editing with Large Language ModelsYuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan
-
Rethinking the Objectives of Vector-Quantized Tokenizers for Image SynthesisYuchao Gu, Xintao Wang, Yixiao Ge, Ying Shan, Mike Zheng ShouCVPR, 2024 [Paper]
-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other ModalitiesYiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue
-
LoRA-Sparse: Low-Rank Approximation for Sparse Large Language ModelsLin Song, Yukang Chen, Shuai Yang, Xiaohan Ding, Yixiao Ge, Ying-Cong Chen, Ying ShanCVPR, 2024 [Paper (Coming soon)]
-
Making LLaMA SEE and Draw with SEED TokenizerOffers unified multimodal comprehension and generation, featuring multi-turn in-context emergent capabilities, akin to an AI aide.Yuying Ge*, Sijie Zhao*, Ziyun Zeng, Yixiao Ge#, Chen Li, Xintao Wang, Ying Shan
-
Cached Transformers: Improving Transformers with Differentiable Memory CacheZhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping LuoAAAI, 2024 [Paper]
2023:
-
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instructionRui Yang, Lin Song, Yanwei Li, Sijie Zhao, Yixiao Ge, Xiu Li, Ying Shan
-
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion ModelsYuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou
-
Meta-Adapter: An Online Few-shot Learner for Vision-Language ModelCheng Cheng, Lin Song, Ruoyi Xue, Hang Wang, Hongbin Sun, Yixiao Ge, Ying Shan
-
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video GenerationJay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
-
Exploring Model Transferability through the Lens of Potential EnergyXiaotong Li, Zixuan Hu, Yixiao Ge, Ying Shan, Lingyu Duan
-
BoxSnake: Polygonal Instance Segmentation with Box SupervisionRui Yang, Lin Song, Yixiao Ge, Xiu Li
-
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object DetectionYuxin Fang*, Shusheng Yang*, Shijie Wang*, Yixiao Ge, Ying Shan, Xinggang Wang
-
Binary Embedding-based Retrieval at TencentYukang Gan*, Yixiao Ge*, Chang Zhou*, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan
-
π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task InterpolationChengyue Wu, Teng Wang, Yixiao Ge#, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo
-
Accelerating Vision-Language Pretraining with Free Language ModelingTeng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo
-
Masked Visual Reconstruction in Language Semantic SpaceShusheng Yang, Yixiao Ge#, Kun Yi, Dian Li, Ying Shan, Xiaohu Qie, Xinggang Wang#
-
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeZiyun Zeng*, Yuying Ge*, Xihui Liu, Bin Chen#, Ping Luo, Shu-Tao Xia, Yixiao Ge#
-
All in One: Exploring Unified Video-Language Pre-trainingAlex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
-
Masked Image Modeling with Denoising ContrastKun Yi*, Yixiao Ge*#, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, Xiaohu Qie
-
Darwinian Model Upgrades: Model Evolving with Selective CompatibilityBinjie Zhang*, Shupeng Su*, Yixiao Ge#, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying ShanAAAI, 2023 [Paper]
-
Video-Text Pre-training with Learned RegionsRui Yan, Mike Zheng Shou, Yixiao Ge, Alex Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang
2022:
-
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalYuying Ge, Yixiao Ge, Xihui Liu, Jinpeng Wang, Jianping Wu, Ying Shan, Xiaohu Qie, Ping Luo
-
Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher SpaceWenqi Shao#, Xun Zhao, Yixiao Ge#, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo
-
mc-BEiT: Multi-choice Discretization for Image BERT Pre-trainingXiaotong Li, Yixiao Ge, Kun Yi, Zixuan Hu, Ying Shan, Lingyu Duan
-
Towards Universal Backward-Compatible Representation LearningBinjie Zhang, Yixiao Ge#, Yantao Shen, Shupeng Su, Fanzi Wu, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan
-
Bridging Video-text Retrieval with Multiple Choice QuestionsYuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, Xiaohu Qie, Ping Luo
-
Object-aware Video-language Pre-training for RetrievalAlex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou
-
Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image RetrievalBinjie Zhang, Yixiao Ge#, Yantao Shen, Yu Li, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan
-
Dynamic Token Normalization Improves Vision TransformerWenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo
-
Uncertainty Modeling for Out-of-Distribution GeneralizationXiaotong Li, Yongxing Dai, Yixiao Ge, Jun Liu, Ying Shan, Lingyu Duan
-
Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-IDYixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li
2021:
-
Progressive Correspondence Pruning by Consensus LearningChen Zhao*, Yixiao Ge*, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann
-
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identificationYi Zheng, Shixiang Tang, Guolong Teng, Yixiao Ge, Kaijian Liu, Donglian Qi, Jing Qin, Dapeng ChenICCV, 2021 [Paper]
-
Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identificationXiao Zhang*, Yixiao Ge*, Yu Qiao, Hongsheng LiCVPR, 2021 [Paper]
-
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial NetworkRui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li
-
Mutual CRF-GNN Network for Few-shot LearningShixiang Tang, Dapeng Chen, Lei Bai, Kaijian Liu, Yixiao Ge, Wanli OuyangCVPR 2021 [Paper]
2020:
-
Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-IDYixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Hongsheng Li
-
Self-supervising Fine-grained Region Similarities for Large-scale Image LocalizationYixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li
-
Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identificationYixiao Ge, Dapeng Chen, Hongsheng Li
Before 2020:
-
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identificationYixiao Ge*, Zhuowan Li*, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li