Dr. Bo Dai is a Research Assistant Professor with S-Lab, Nanyang Technological University, Singapore. His research interests include computer vision and machine learning. Recently, he focuses on generative models, video analysis, and cross-modality analysis.
Prior to joining NTU, he worked as a Postdoctoral Research Fellow in Multimedia Laboratory (MMLab), CUHK, from 2018 to 2020. He received his Ph.D. (2014-2018) from Multimedia Laboratory (MMLab) at CUHK, advised by Prof.Dahua Lin. He obtained his B.Eng. (2010-2014) from ACM class of SJTU.
I'm looking for motivated PhDs/Postdocs on generative models, 3D vision and action analysis. Drop me an email if your are interested.
In Scene-aware Generative Network for Human Motion Synthesis, we present a motion synthesis model that emphasizes on the importance of scene context. We formulate this task as a generative task, and factorizes the distribution of human motions into a distribution of movement trajectories and that of body pose dynamics. We further derive discriminators to enforce the compatibility between the human motion and the contextual scene as well as the 3D to 2D projection constraints. The proposed model is able to synthesize human motion with diverse trajectories and body poses, as shown in the Demo.
In Unsupervised 3D Shape Reconstruction from 2D Image GANs, we present the first attempt to directly mine 3D geometric clues from an off-the-shelf 2D GAN that is trained on RGB images only. The core of our framework is an iterative strategy that explores and exploits diverse viewpoint and lighting variations in the GAN image manifold. The framework does not require 2D keypoint or 3D annotations, or strong assumptions on object shapes (e.g. shapes are symmetric), yet it successfully recovers 3D shapes with high precision for human faces, cats, cars, and buildings.
In Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation, we present an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. The key idea is to allow the generator to be fine-tuned on-the-fly in a progressive manner regularized by the discriminator. DGP shows high-quality results for various image restoration and manipulation tasks, and generalizes well to out-of-distribution images.
In Temporal Pyramid Network for Action Recognition, we propose TPN, a general module for action analysis that deals with the varying visual tempos of different action instances. It is a feature-level temporal pyramid rather than a input-level frame pyramid as in SlowFast. TPN brings consistent improvements across backbones, datasets and tasks resulting in 78.9%, 49.0% and 62.0% top-1 on Kinetics-400, SthV1&V2 with only RGB input.