Wenzhao Zheng is a third-year PH.D. student at the Intelligent Vision Group (IVG) at Tsinghua University, advised by Prof. Jiwen Lu and Prof. Jie Zhou. His research interests include machine learning, computer vision, and metric learning.
Ph.D. in Automation, 2018 - present
B.Sc in Mathematics and Physics, 2014 - 2018
In this paper, we propose a deep compositional metric learning (DCML) framework for effective and generalizable similarity measurement between images. Conventional deep metric learning methods minimize a discriminative loss to enlarge interclass distances while suppressing intraclass variations, which might lead to inferior generalization performance since samples even from the same class may present diverse characteristics. This motivates the adoption of the ensemble technique to learn a number of sub-embeddings using different and diverse subtasks. However, most subtasks impose weaker or contradictory constraints, which essentially sacriﬁces the discrimination ability of each sub-embedding to improve the generalization ability of their combination. To achieve a better generalization ability without compromising, we propose to separate the sub-embeddings from direct supervisions from the subtasks and apply the losses on different composites of the sub-embeddings. We employ a set of learnable compositors to combine the sub-embeddings and use a self-reinforced loss to train the compositors, which serve as relays to distribute the diverse training signals to avoid destroying the discrimination ability. Experimental results on the CUB200-2011, Cars196, and Stanford Online Products datasets demonstrate the superior performance of our framework.
In this paper, we propose a structural deep metric learning (SDML) method for room layout estimation, which aims to recover the 3D spatial layout of a cluttered indoor scene from a monocular RGB image. Different from existing room layout estimation methods that solve a regression or per-pixel classification problem, we formulate the room layout estimation problem from a metric learning perspective where we explicitly model the structural relations across different images. We propose to learn a latent embedding space where the Euclidean distance can characterize the actual structural difference between the layouts of two rooms. We then minimize the discrepancy between an image and its ground-truth layout in the learned embedding space. We employ a metric model and a layout encoder to map the RGB images and the ground-truth layouts to the embedding space, respectively, and a layout decoder to map the embeddings to the corresponding layouts, where the whole framework is trained in an end-to-end manner. We perform experiments on the widely used Hedau and LSUN datasets and achieve state-of-the-art performance.
This paper presents a hardness-aware deep metric learning (HDML) framework for image clustering and retrieval. Most previous deep metric learning methods employ the hard negative mining strategy to alleviate the lack of informative samples for training. However, this mining strategy only utilizes a subset of training data, which may not be enough to characterize the global geometry of the embedding space comprehensively. To address this problem, we perform linear interpolation on embeddings to adaptively manipulate their hardness levels and generate corresponding label-preserving synthetics for recycled training, so that information buried in all samples can be fully exploited and the metric is always challenged with proper difficulty. As a single synthetic for each sample may still be not enough to describe the unobserved distributions of the training data which is crucial for generalization performance, we further extend HDML to generate multiple synthetics for each sample. We propose a randomly hardness-aware deep metric learning (HDML-R) method and an adaptively hardness-aware deep metric learning (HDML-A) method to sample multiple random and adaptive directions, respectively, for hardness-aware synthesis. Extensive experimental results on the widely used CUB-200-2011, Cars196, Stanford Online Products, In-Shop Clothes Retrieval, and VehicleID datasets demonstrate the effectiveness of the proposed framework.