Video Representation Learning for Fine-Grained Scene Recognition and Retrieval.
Model Learning with limited training data.
Extract structured information from unstructured documents.
Enrichment of educational texts with supplementary information.
Provide visual aid for a sequence of text based instructions.