Yixiao Ge


image

I am currently a senior researcher at Tencent ARC Lab and Tencent AI Lab, leading an effort on vision and multimodal foundation models. Previously, I got my Ph.D. degree from Multimedia Lab (MMLab), the Chinese University of Hong Kong, advised by Prof. Hongsheng Li and Prof. Xiaogang Wang. We are actively looking for self-motivated interns to work on related research topics. Please feel free to reach out if you are interested.


News

  • [Aug 2023] Glad to release ViT-Lens. Stay tuned for more updates!
  • [Aug 2023] Glad to release SEED-Bench, the most comprehensive MLLM benchmark to date.
  • [July 2023] Glad to release our SEED. Stay tuned for more updates!
  • [July 2023] Four papers are accepted to ICCV 2023.
  • [May 2023] One paper is accepted to KDD 2023.
  • [Apr 2023] One paper is accepted to ICML 2023.
  • [Feb 2023] Four papers are accepted to CVPR 2023.
  • [Jan 2023] One paper is accepted to ICLR 2023.
  • [Nov 2022] Two papers are accepted to AAAI 2023.
  • [Jul 2022] Three papers are accepted to ECCV 2022.
  • [Apr 2022] One paper is accepted to IJCAI 2022 as a Long oral presentation.
  • [Mar 2022] Two papers are accepted to CVPR 2022 with one Oral presentation.
  • [Jan 2022] Three papers are accepted to ICLR 2022.

Publications [Full List]

( *equal contribution   #corresponding author )

Selected Preprints:
  • ViT-Lens: Towards Omni-modal Representations
    Advancing omni-modal representation learning with modality lens.
    Weixian Lei, Yixiao Ge#, Jianfeng Zhang, Dylan Sun, Kun Yi, Ying Shan, Mike Zheng Shou#
  • SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
    Consists of 19K multiple-choice questions with accurate human annotations, spans 12 evaluation dimensions in terms of both spatial and temporal comprehension.
    Bohao Li*, Rui Wang*, Guangzhi Wang*, Yuying Ge#, Yixiao Ge#, Ying Shan
  • Planting a SEED of Vision in Large Language Model
    Empowers Large Language Models (LLMs) with the emergent ability to see and draw.
    Yuying Ge*, Yixiao Ge*#, Ziyun Zeng, Xintao Wang, Ying Shan
  • TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter
    Enabling new ViTs plugged into the framework (e.g., BLIP-2) with other modules untouched and a performance boost.
    Binjie Zhang, Yixiao Ge#, Xuyuan Xu, Ying Shan, Mike Zheng Shou#
  • What Makes for Good Visual Tokenizers for Large Language Models?
    Rather than simply applying CLIP models, we systematically investigate proper pre-training methods to build good visual tokenizers, making LLMs powerful multimodal LLMs.
    Guangzhi Wang, Yixiao Ge#, Xiaohan Ding, Mohan Kankanhalli, Ying Shan
  • TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
    Producing general-purpose video features that work out of the box. We surpass InternVideo and ImageBind on zero-shot and linear tasks.
    Ziyun Zeng, Yixiao Ge#, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan
  • GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
    We for the first time enable Vicuna-13B to use visual models via self-instruct tuning. The system can be deployed on local machines without APIs.
    Lin Song, Yanwei Li, Rui Yang, Sijie Zhao, Yixiao Ge, Ying Shan
2023:
  • Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
    Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
    ICCV, 2023 [Project] [Paper] [Demo] [Code] GitHub stars
  • Exploring Model Transferability through the Lens of Potential Energy
    Xiaotong Li, Zixuan Hu, Yixiao Ge, Ying Shan, Lingyu Duan
    ICCV, 2023 [Paper] [Code] GitHub stars
  • BoxSnake: Polygonal Instance Segmentation with Box Supervision
    Rui Yang, Lin Song, Yixiao Ge, Xiu Li
    ICCV, 2023 [Paper] [Code] GitHub stars
  • Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
    Yuxin Fang*, Shusheng Yang*, Shijie Wang*, Yixiao Ge, Ying Shan, Xinggang Wang
    ICCV, 2023 [Paper] [Code] GitHub stars
  • Binary Embedding-based Retrieval at Tencent
    Yukang Gan*, Yixiao Ge*, Chang Zhou*, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan
    KDD, 2023 [Paper] [Code] GitHub stars
  • π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
    Chengyue Wu, Teng Wang, Yixiao Ge#, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo
    ICML, 2023 [Paper] [Code] GitHub stars
  • Accelerating Vision-Language Pretraining with Free Language Modeling
    Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo
    CVPR, 2023 [Paper] [Code] GitHub stars
  • Masked Visual Reconstruction in Language Semantic Space
    Shusheng Yang, Yixiao Ge#, Kun Yi, Dian Li, Ying Shan, Xiaohu Qie, Xinggang Wang#
    CVPR, 2023 [Paper] [Code] GitHub stars
  • Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
    Ziyun Zeng*, Yuying Ge*, Xihui Liu, Bin Chen#, Ping Luo, Shu-Tao Xia, Yixiao Ge#
    CVPR, 2023 [Paper] [Code] GitHub stars
  • All in One: Exploring Unified Video-Language Pre-training
    Alex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
    CVPR, 2023 [Paper] [Code] GitHub stars
  • Masked Image Modeling with Denoising Contrast
    Kun Yi*, Yixiao Ge*#, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, Xiaohu Qie
    ICLR, 2023 [Paper] [Code] GitHub stars
  • Darwinian Model Upgrades: Model Evolving with Selective Compatibility
    Binjie Zhang*, Shupeng Su*, Yixiao Ge#, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying Shan
    AAAI, 2023 [Paper]
  • Video-Text Pre-training with Learned Regions
    Rui Yan, Mike Zheng Shou, Yixiao Ge, Alex Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang
    AAAI, 2023 [Paper] [Code] GitHub stars
2022:
  • MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
    Yuying Ge, Yixiao Ge, Xihui Liu, Jinpeng Wang, Jianping Wu, Ying Shan, Xiaohu Qie, Ping Luo
    ECCV, 2022 [Paper] [Code] GitHub stars
  • Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space
    Wenqi Shao#, Xun Zhao, Yixiao Ge#, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo
    ECCV, 2022 [Paper] [Code] GitHub stars
  • mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
    Xiaotong Li, Yixiao Ge, Kun Yi, Zixuan Hu, Ying Shan, Lingyu Duan
    ECCV, 2022 [Paper] [Code] GitHub stars
  • Towards Universal Backward-Compatible Representation Learning
    Binjie Zhang, Yixiao Ge#, Yantao Shen, Shupeng Su, Fanzi Wu, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan
    IJCAI, 2022 (Long oral) [Paper] [Code] GitHub stars
  • Bridging Video-text Retrieval with Multiple Choice Questions
    Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, Xiaohu Qie, Ping Luo
    CVPR, 2022 (Oral) [Paper] [Code] GitHub stars
  • Object-aware Video-language Pre-training for Retrieval
    Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou
    CVPR, 2022 [Paper] [Code] GitHub stars
  • Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval
    Binjie Zhang, Yixiao Ge#, Yantao Shen, Yu Li, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan
    ICLR, 2022 [Paper] [Code] GitHub stars
  • Dynamic Token Normalization Improves Vision Transformer
    Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo
    ICLR, 2022 [Paper] [Code] GitHub stars
  • Uncertainty Modeling for Out-of-Distribution Generalization
    Xiaotong Li, Yongxing Dai, Yixiao Ge, Jun Liu, Ying Shan, Lingyu Duan
    ICLR, 2022 [Paper] [Code] GitHub stars
  • Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-ID
    Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li
    IEEE TNNLS, 2022 [Project] [Paper]
2021:
  • Progressive Correspondence Pruning by Consensus Learning
    Chen Zhao*, Yixiao Ge*, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann
    ICCV, 2021 [Project] [Paper] [Code] GitHub stars
  • Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification
    Yi Zheng, Shixiang Tang, Guolong Teng, Yixiao Ge, Kaijian Liu, Donglian Qi, Jing Qin, Dapeng Chen
    ICCV, 2021 [Paper]
  • Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification
    Xiao Zhang*, Yixiao Ge*, Yu Qiao, Hongsheng Li
    CVPR, 2021 [Paper]
  • DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
    Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li
    CVPR, 2021 [Paper] [Code] GitHub stars
  • Mutual CRF-GNN Network for Few-shot Learning
    Shixiang Tang, Dapeng Chen, Lei Bai, Kaijian Liu, Yixiao Ge, Wanli Ouyang
    CVPR 2021 [Paper]
2020:
  • Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
    Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Hongsheng Li
    NeurIPS, 2020 [Project] [Paper] [Code] GitHub stars
  • Self-supervising Fine-grained Region Similarities for Large-scale Image Localization
    Yixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li
    ECCV, 2020 (Spotlight) [Project] [Paper] [Code] GitHub stars
  • Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification
    Yixiao Ge, Dapeng Chen, Hongsheng Li
    ICLR, 2020 [Project] [Paper] [Code] GitHub stars
Before 2020:
  • FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
    Yixiao Ge*, Zhuowan Li*, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li
    NeurIPS, 2018 [Project] [Paper] [Code] GitHub stars