Gege's Website

Figure 1. Visual records of Gege's workplaces during her PhD journey in Zürich and Tübingen.

Bio

Gege Gao is a Ph.D. student in Computer Science since November 2022, advised by Prof. Dr. Bernhard Schölkopf at ETH Zürich and Prof. Dr. Andreas Geiger at University of Tübingen and Tübingen AI Center, as shown in Figure 1. Before that, she was a Research Staff at Institute of Automation, Chinese Academy of Sciences since 2020. She received her master's degree of M.Sc. in Applied Statistics from Renmin University of China in 2020, supervised by Prof. Dr. Xiaoling Lu. Her research interests include controllable content creation, inverse graphics, and representation learning and reasoning.

Publications

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, Bernhard Schölkopf

arXiv Paper Website

As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized text embeddings are inherently unable to capture a complex description with multiple entities and relationships. Holistic 3D modeling of the entire scene further prevents accurate grounding of text entities and concepts. To address this limitation, we propose GraphDreamer, a novel framework to generate compositional 3D scenes from scene graphs, where objects are represented as nodes and their interactions as edges. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model and is able to fully disentangle different objects without image-level supervision. To facilitate modeling of object-wise relationships, we use signed distance fields as representation and impose a constraint to avoid inter-penetration of objects. To avoid manual scene graph creation, we design a text prompt for ChatGPT to generate scene graphs based on text inputs. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer in generating high-fidelity compositional 3D scenes with disentangled object entities.

Causal Representation Learning for Context-Aware Face Transfer

Gege Gao, Huaibo Huang, Chaoyou Fu, Ran He

arXiv

Human face synthesis involves transferring knowledge about the identity and identity-dependent shape of a human face to target face images where the context (e.g., facial expressions, head poses, and other background factors) may change dramatically. Human faces are non-rigid, so facial expression leads to deformation of face shape, and head pose also affects the face observed in 2D images. A key challenge in face transfer is to match the face with unobserved new contexts, adapting the identity-dependent face shape (IDFS) to different poses and expressions accordingly. In this work, we find a way to provide prior knowledge for generative models to reason about the appropriate appearance of a human face in response to various expressions and poses. We propose a novel context-aware face transfer model, called CarTrans, that incorporates causal effects of contextual factors into face representation, and thus is able to be aware of the uncertainty of new contexts. We estimate the effect of facial expression and head pose in terms of counterfactuals by designing a controlled intervention trial, thus avoiding the need for dense multi-view observations to cover the pose-expression space well. Moreover, we propose a kernel regression-based encoder that eliminates the identity specificity of the target face when encoding contextual information from the target image. The resulting method shows impressive performance, allowing fine-grained control over face shape and appearance under various contextual conditions.

Information Bottleneck Disentanglement for Identity Swapping

Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He

CVPR 2021

Paper Code

Improving the performance of face forgery detectors often requires more identity-swapped images of higher-quality. One core objective of identity swapping is to generate identity-discriminative faces that are distinct from the target while identical to the source. To this end, properly disentangling identity and identity-irrelevant information is critical and remains a challenging endeavor. In this work, we propose a novel information disentangling and swapping network, called InfoSwap, to extract the most expressive information for identity representation from a pre-trained face recognition model. The key insight of our method is to formulate the learning of disentangled representations as optimizing an information bottleneck trade-off, in terms of finding an optimal compression of the pre-trained latent features. Moreover, a novel identity contrastive loss is proposed for further disentanglement by requiring a proper distance between the generated identity and the target. While the most prior works have focused on using various loss functions to implicitly guide the learning of representations, we demonstrate that our model can provide explicit supervision for learning disentangled representations, achieving impressive performance in generating more identity-discriminative swapped faces.

Education

November 2022

ETH Zürich

D-INFK

Supervisor: Bernhard Schölkopf

May 2024

University of Tübingen, Tübingen AI Center

Autonomous Vision Group

Supervisor: Andreas Geiger

M.Sc. in Applied Statistics, 2020

School of Statistics
Renmin University of China

B.Sc. in Applied Statistics, 2017

School of Mathematics and Statistics
Central University of Finance and Economics

Misc

Gege, originally from Beijing, China, carries the Chinese name 格格. Her family's ancestral surname is Hešeri (赫舍里).

The source code for this website is forked from this repo.