Visual Cues for Text

May 1, 2016

Link

In this project, following multi-stage framework was developed to provide visual aid for a sequence of text based instructions in the form of coherent images associated with each of them - (a) For each instruction, visualisable phrases consisting of head action verbs and noun phrases are mined using standard practices like POS tagging, Dependency parsing and Co-reference resolution. (b) For each visualisable phrase, an API query is made to retrieve a set of images from a dataset crawled from sources such as WikiHow, Flickr, Google etc. Phrases and images together dictate the action being conducted in the instruction. (c) Across instructions sharing common information in the form of latent/non-latent entities, coherency is maintained using a graph based matching method utilising Dijkstra’s algorithm. A user study was conducted to validate improvement in understanding of text instructions and resemblance to actual ground truth.

Abhinav Jain

Research Engineer

My research interests include computer vision, machine learning and deep reinforcement learning.

Visual Cues for Text

Abhinav Jain

Research Engineer

Related

Publications

Coherent Visual Description of Textual Instructions