Yixiao Ge


image

I am currently a senior researcher at Tencent ARC Lab and Tencent AI Lab, leading an effort on vision and multimodal foundation models with a particular interest in generative comprehension. Previously, I got my Ph.D. degree from Multimedia Lab (MMLab), the Chinese University of Hong Kong, advised by Prof. Hongsheng Li and Prof. Xiaogang Wang. We are actively looking for research interns to work on related research topics, including but not limited to large-scale pretraining, vision and language. Please feel free to reach out if you are interested.


News

  • [May 2023] One paper is accepted to KDD 2023.
  • [Apr 2023] One paper is accepted to ICML 2023.
  • [Apr 2023] We release several interesting projects towards generative comprehension: TagGPT, VLog, and GPT4Tools. Welcome to check them out!
  • [Feb 2023] Four papers are accepted to CVPR 2023.
  • [Jan 2023] One paper is accepted to ICLR 2023.
  • [Nov 2022] Two papers are accepted to AAAI 2023.
  • [Jul 2022] Three papers are accepted to ECCV 2022.
  • [Apr 2022] One paper is accepted to IJCAI 2022 as a Long oral presentation.
  • [Mar 2022] Two papers are accepted to CVPR 2022 with one Oral presentation.
  • [Jan 2022] Three papers are accepted to ICLR 2022.

Projects

Welcome to check out our interesting projects

2023:
  • GPT4Tools: Teaching LLM to Use Tools via Self-instruction
    We for the first time enable Vicuna-13B to use visual models via self-instruct tuning. The system can be deployed on local machines without APIs.
    Lin Song, Yanwei Li, Rui Yang, Sijie Zhao, Yixiao Ge, Ying Shan
  • VLog: Video as a Long Document
    Given a long video, we turn it into a document containing visual + audio info. By sending this document to ChatGPT, we can chat over the video!
    [Demo] [Code] GitHub stars
  • TagGPT: Large Language Models are Zero-shot Multimodal Taggers
    TagGPT is a fully automated system capable of tag extraction and multimodal tagging in a completely zero-shot fashion.
    Chen Li, Yixiao Ge, Jiayong Mao, Dian Li, Ying Shan
  • Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
    Given a video-text pair as input, our method, Tune-A-Video, fine-tunes a pre-trained text-to-image diffusion model for text-to-video generation.
    Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
Before 2023:
  • OpenIBL: PyTorch-based Codebase for Image Localization
    Yixiao Ge
    [Code] GitHub stars
  • OpenUnReID: PyTorch-based Codebase for Object Re-ID
    Yixiao Ge, Tong Xiao, Zhiwei Zhang
    [Code] GitHub stars

Publications [Full List]

( *equal contribution   #corresponding author )

2023:
  • Binary Embedding-based Retrieval at Tencent
    Yukang Gan*, Yixiao Ge*, Chang Zhou*, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan
    KDD, 2023 [Paper] [Code] GitHub stars
  • π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
    Chengyue Wu, Teng Wang, Yixiao Ge#, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo
    ICML, 2023 [Paper] [Code] GitHub stars
  • Accelerating Vision-Language Pretraining with Free Language Modeling
    Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo
    CVPR, 2023 [Paper] [Code] GitHub stars
  • Masked Visual Reconstruction in Language Semantic Space
    Shusheng Yang, Yixiao Ge#, Kun Yi, Dian Li, Ying Shan, Xiaohu Qie, Xinggang Wang#
    CVPR, 2023 [Paper] [Code] GitHub stars
  • Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
    Ziyun Zeng*, Yuying Ge*, Xihui Liu, Bin Chen#, Ping Luo, Shu-Tao Xia, Yixiao Ge#
    CVPR, 2023 [Paper] [Code] GitHub stars
  • All in One: Exploring Unified Video-Language Pre-training
    Alex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
    CVPR, 2023 [Paper] [Code] GitHub stars
  • Masked Image Modeling with Denoising Contrast
    Kun Yi*, Yixiao Ge*#, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, Xiaohu Qie
    ICLR, 2023 [Paper] [Code] GitHub stars
  • Darwinian Model Upgrades: Model Evolving with Selective Compatibility
    Binjie Zhang*, Shupeng Su*, Yixiao Ge#, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying Shan
    AAAI, 2023 [Paper]
  • Video-Text Pre-training with Learned Regions
    Rui Yan, Mike Zheng Shou, Yixiao Ge, Alex Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang
    AAAI, 2023 [Paper] [Code] GitHub stars
2022:
  • MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
    Yuying Ge, Yixiao Ge, Xihui Liu, Jinpeng Wang, Jianping Wu, Ying Shan, Xiaohu Qie, Ping Luo
    ECCV, 2022 [Paper] [Code] GitHub stars
  • Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space
    Wenqi Shao#, Xun Zhao, Yixiao Ge#, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo
    ECCV, 2022 [Paper] [Code] GitHub stars
  • mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
    Xiaotong Li, Yixiao Ge, Kun Yi, Zixuan Hu, Ying Shan, Lingyu Duan
    ECCV, 2022 [Paper] [Code] GitHub stars
  • Towards Universal Backward-Compatible Representation Learning
    Binjie Zhang, Yixiao Ge#, Yantao Shen, Shupeng Su, Fanzi Wu, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan
    IJCAI, 2022 (Long oral) [Paper] [Code] GitHub stars
  • Bridging Video-text Retrieval with Multiple Choice Questions
    Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, Xiaohu Qie, Ping Luo
    CVPR, 2022 (Oral) [Paper] [Code] GitHub stars
  • Object-aware Video-language Pre-training for Retrieval
    Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou
    CVPR, 2022 [Paper] [Code] GitHub stars
  • Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval
    Binjie Zhang, Yixiao Ge#, Yantao Shen, Yu Li, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan
    ICLR, 2022 [Paper] [Code] GitHub stars
  • Dynamic Token Normalization Improves Vision Transformer
    Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo
    ICLR, 2022 [Paper] [Code] GitHub stars
  • Uncertainty Modeling for Out-of-Distribution Generalization
    Xiaotong Li, Yongxing Dai, Yixiao Ge, Jun Liu, Ying Shan, Lingyu Duan
    ICLR, 2022 [Paper] [Code] GitHub stars
  • Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-ID
    Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li
    IEEE TNNLS, 2022 [Project] [Paper]
2021:
  • Progressive Correspondence Pruning by Consensus Learning
    Chen Zhao*, Yixiao Ge*, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann
    ICCV, 2021 [Project] [Paper] [Code] GitHub stars
  • Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification
    Yi Zheng, Shixiang Tang, Guolong Teng, Yixiao Ge, Kaijian Liu, Donglian Qi, Jing Qin, Dapeng Chen
    ICCV, 2021 [Paper]
  • Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification
    Xiao Zhang*, Yixiao Ge*, Yu Qiao, Hongsheng Li
    CVPR, 2021 [Paper]
  • DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
    Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li
    CVPR, 2021 [Paper] [Code] GitHub stars
  • Mutual CRF-GNN Network for Few-shot Learning
    Shixiang Tang, Dapeng Chen, Lei Bai, Kaijian Liu, Yixiao Ge, Wanli Ouyang
    CVPR 2021 [Paper]
2020:
  • Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
    Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Hongsheng Li
    NeurIPS, 2020 [Project] [Paper] [Code] GitHub stars
  • Self-supervising Fine-grained Region Similarities for Large-scale Image Localization
    Yixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li
    ECCV, 2020 (Spotlight) [Project] [Paper] [Code] GitHub stars
  • Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification
    Yixiao Ge, Dapeng Chen, Hongsheng Li
    ICLR, 2020 [Project] [Paper] [Code] GitHub stars
Before 2020:
  • FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
    Yixiao Ge*, Zhuowan Li*, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li
    NeurIPS, 2018 [Project] [Paper] [Code] GitHub stars