Learning Implicit Semantic Concepts for Joint Image-Text Embedding
We present a Deep Learning approach for learning the joint embeddings of images and captions, aiming to encode their semantic similarity. For that, we introduce a metric learning scheme that utilizes multiple losses, and propose to learn implicit semantic concepts by applying a semantic centers loss. This loss relates the images and captions corresponding to the same semantic concept to a particular center, and optimizes the separability of the centers’ embeddings. The learnt semantic concepts are implicit as they are learnt without analyzing the captions textually, as in previous works. We also derive a novel metric learning formulation using an adaptive margin hinge loss, that is refined
during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favourably with contemporary state-of-the-art approaches
Last Updated Date : 11/06/2019