Advanced search options

Advanced Search Options 🞨

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in Published in Earliest date Latest date

Sorted by

Results per page:

You searched for subject:(Visual Semantic Embedding). One record found.

Search Limiters

Last 2 Years | English Only

No search limiters apply to these results.

▼ Search Limiters


UCLA

1. Ren, Zhou. Joint Image-Text Representation Learning.

Degree: Computer Science, 2016, UCLA

It was a dream to make computers intelligent. Like humans who are capable of understanding information of multiple modalities such as video, text, audio, etc., teaching computers to jointly understand multi-modal information is a necessary and essential step towards artificial intelligence. And how to jointly represent multi-modal information is critical to such step. Although a lot of efforts have been devoted to exploring the representation of each modality individually, it is an open and challenging problem to learn joint multi-modal representation.In this dissertation, we explore joint image-text representation models based on Visual-Semantic Embedding (VSE). VSE has been recently proposed and shown to be effective for joint representation. The key idea is that by learning a mapping from images into a semantic space, the algorithm is able to learn a compact and effective joint representation. However, existing approaches simply map each text concept and each whole image to single points in the semantic space. We propose several novel visual-semantic embedding models that use (1) text concept modeling, (2) image-level modeling, and (3) object-level modeling. In particular, we first introduce a novel Gaussian Visual-Semantic Embedding (GVSE) model that leverages the visual information to model text concepts as density distributions rather than single points in semantic space. Then, we propose Multiple Instance Visual-Semantic Embedding (MIVSE) via image-level modeling, which discovers and maps the semantically meaningful image sub-regions to their corresponding text labels. Next, we present a fine-grained object-level representation in images, Scene-Domain Active Part Models (SDAPM), that reconstructs and characterizes 3D geometric statistics between object’s parts in 3D scene-domain. Finally, we explore advanced joint representations for other visual and textual modalities, including joint image-sentence representation and joint video-sentence representation.Extensive experiments have demonstrated that the proposed joint representation models are superior to existing methods on various tasks involving image, video and text modalities, including image annotation, zero-shot learning, object and parts detection, pose and viewpoint estimation, image classification, text-based image retrieval, image captioning, video annotation, and text-based video retrieval.

Subjects/Keywords: Computer science; Joint image-text representation; Visual-Semantic Embedding

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Ren, Z. (2016). Joint Image-Text Representation Learning. (Thesis). UCLA. Retrieved from http://www.escholarship.org/uc/item/66f282s6

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Ren, Zhou. “Joint Image-Text Representation Learning.” 2016. Thesis, UCLA. Accessed August 03, 2020. http://www.escholarship.org/uc/item/66f282s6.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Ren, Zhou. “Joint Image-Text Representation Learning.” 2016. Web. 03 Aug 2020.

Vancouver:

Ren Z. Joint Image-Text Representation Learning. [Internet] [Thesis]. UCLA; 2016. [cited 2020 Aug 03]. Available from: http://www.escholarship.org/uc/item/66f282s6.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Ren Z. Joint Image-Text Representation Learning. [Thesis]. UCLA; 2016. Available from: http://www.escholarship.org/uc/item/66f282s6

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

.