Yixiao Ge
- geyixiao831@gmail.com
- Google Scholar
- Github
- Beijing, China

I am currently a senior researcher at Tencent ARC Lab and Tencent AI Lab, leading an effort on vision and multimodal foundation models with a particular interest in generative comprehension. Previously, I got my Ph.D. degree from Multimedia Lab (MMLab), the Chinese University of Hong Kong, advised by Prof. Hongsheng Li and Prof. Xiaogang Wang. We are actively looking for research interns to work on related research topics, including but not limited to large-scale pretraining, vision and language. Please feel free to reach out if you are interested.
News
- [May 2023] One paper is accepted to KDD 2023.
- [Apr 2023] One paper is accepted to ICML 2023.
- [Apr 2023] We release several interesting projects towards generative comprehension: TagGPT, VLog, and GPT4Tools. Welcome to check them out!
- [Feb 2023] Four papers are accepted to CVPR 2023.
- [Jan 2023] One paper is accepted to ICLR 2023.
- [Nov 2022] Two papers are accepted to AAAI 2023.
- [Jul 2022] Three papers are accepted to ECCV 2022.
- [Apr 2022] One paper is accepted to IJCAI 2022 as a Long oral presentation.
- [Mar 2022] Two papers are accepted to CVPR 2022 with one Oral presentation.
- [Jan 2022] Three papers are accepted to ICLR 2022.
Projects
Welcome to check out our interesting projects
2023:
2023:
-
GPT4Tools: Teaching LLM to Use Tools via Self-instructionWe for the first time enable Vicuna-13B to use visual models via self-instruct tuning. The system can be deployed on local machines without APIs.Lin Song, Yanwei Li, Rui Yang, Sijie Zhao, Yixiao Ge, Ying Shan
-
VLog: Video as a Long DocumentGiven a long video, we turn it into a document containing visual + audio info. By sending this document to ChatGPT, we can chat over the video!
-
TagGPT: Large Language Models are Zero-shot Multimodal TaggersTagGPT is a fully automated system capable of tag extraction and multimodal tagging in a completely zero-shot fashion.Chen Li, Yixiao Ge, Jiayong Mao, Dian Li, Ying Shan
-
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video GenerationGiven a video-text pair as input, our method, Tune-A-Video, fine-tunes a pre-trained text-to-image diffusion model for text-to-video generation.Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
Publications [Full List]
( *equal contribution #corresponding author )
2023:
2023:
-
Binary Embedding-based Retrieval at TencentYukang Gan*, Yixiao Ge*, Chang Zhou*, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan
-
π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task InterpolationChengyue Wu, Teng Wang, Yixiao Ge#, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo
-
Accelerating Vision-Language Pretraining with Free Language ModelingTeng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo
-
Masked Visual Reconstruction in Language Semantic SpaceShusheng Yang, Yixiao Ge#, Kun Yi, Dian Li, Ying Shan, Xiaohu Qie, Xinggang Wang#
-
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeZiyun Zeng*, Yuying Ge*, Xihui Liu, Bin Chen#, Ping Luo, Shu-Tao Xia, Yixiao Ge#
-
All in One: Exploring Unified Video-Language Pre-trainingAlex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
-
Masked Image Modeling with Denoising ContrastKun Yi*, Yixiao Ge*#, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, Xiaohu Qie
-
Darwinian Model Upgrades: Model Evolving with Selective CompatibilityBinjie Zhang*, Shupeng Su*, Yixiao Ge#, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying ShanAAAI, 2023 [Paper]
-
Video-Text Pre-training with Learned RegionsRui Yan, Mike Zheng Shou, Yixiao Ge, Alex Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang
-
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalYuying Ge, Yixiao Ge, Xihui Liu, Jinpeng Wang, Jianping Wu, Ying Shan, Xiaohu Qie, Ping Luo
-
Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher SpaceWenqi Shao#, Xun Zhao, Yixiao Ge#, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo
-
mc-BEiT: Multi-choice Discretization for Image BERT Pre-trainingXiaotong Li, Yixiao Ge, Kun Yi, Zixuan Hu, Ying Shan, Lingyu Duan
-
Towards Universal Backward-Compatible Representation LearningBinjie Zhang, Yixiao Ge#, Yantao Shen, Shupeng Su, Fanzi Wu, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan
-
Bridging Video-text Retrieval with Multiple Choice QuestionsYuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, Xiaohu Qie, Ping Luo
-
Object-aware Video-language Pre-training for RetrievalAlex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou
-
Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image RetrievalBinjie Zhang, Yixiao Ge#, Yantao Shen, Yu Li, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan
-
Dynamic Token Normalization Improves Vision TransformerWenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo
-
Uncertainty Modeling for Out-of-Distribution GeneralizationXiaotong Li, Yongxing Dai, Yixiao Ge, Jun Liu, Ying Shan, Lingyu Duan
-
Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-IDYixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li
-
Progressive Correspondence Pruning by Consensus LearningChen Zhao*, Yixiao Ge*, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann
-
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identificationYi Zheng, Shixiang Tang, Guolong Teng, Yixiao Ge, Kaijian Liu, Donglian Qi, Jing Qin, Dapeng ChenICCV, 2021 [Paper]
-
Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identificationXiao Zhang*, Yixiao Ge*, Yu Qiao, Hongsheng LiCVPR, 2021 [Paper]
-
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial NetworkRui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li
-
Mutual CRF-GNN Network for Few-shot LearningShixiang Tang, Dapeng Chen, Lei Bai, Kaijian Liu, Yixiao Ge, Wanli OuyangCVPR 2021 [Paper]
-
Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-IDYixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Hongsheng Li
-
Self-supervising Fine-grained Region Similarities for Large-scale Image LocalizationYixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li
-
Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identificationYixiao Ge, Dapeng Chen, Hongsheng Li
-
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identificationYixiao Ge*, Zhuowan Li*, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li