Dr. Bo Dai is a PostDoc in Multimedia Laboratory (MMLab) at CUHK, working with Prof.Dahua Lin. His research interests include computer vision and machine learning. Recently, he focuses on generative models, video analysis, and cross-modality analysis.
He received his Ph.D. (2014-2018) from Multimedia Laboratory (MMLab) at CUHK, advised by Prof.Dahua Lin. He obtained his B.Eng. (2010-2014) from ACM class of SJTU. Previously, he is fortunate to intern at Microsoft Research Asia. He also visited University of Toronto in 2017, working with Prof.Sanja Fidler.
In Real or not Real, that is the Question, we propose a generalization to the standard GAN framework, where we treat the concept of realness as a random variable rather than a single scale. Consequently, the proposed framework, RealnessGAN, can not only train the generator with a loss of maximizing KL divergence, but also train a nonprogressive DCGAN to produce realistic images at 1024x1024 resolution.
In FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding, we build a dataset for benchmarking fine-grained action recognition in the context of gymnastics videos. It collects high-resolution (720P/1024P) and high-quality action videos from professional gymnastics competitions. We further provide rich annotations across multiple semantic and temporal granularities. It reveals the gaps between coarse- and fine-grained action recognition.
In Self-Supervised Scene De-occlusion, we make the first attempt to address scene de-occlusion through a novel and unified framework that recovers hidden scene structures in a self-supervised manner. It divides the problem into progressive ordering recovery, amodal completion and content completion. In this way it achieves comparable results to fully-supervised methods and enables various applications, e.g. image manipulation.
In Temporal Pyramid Network for Action Recognition, we propose TPN, a general module for action analysis that deals with the varying visual tempos of different action instances. It is a feature-level temporal pyramid rather than a input-level frame pyramid as in SlowFast. TPN brings consistent improvements across backbones, datasets and tasks resulting in 78.9%, 49.0% and 62.0% top-1 on Kinetics-400, SthV1&V2 with only RGB input.