Abhinav Jain

Abhinav Jain

Research Engineer

IBM Research Labs


Hi, my name is Abhinav Jain. I work as a Research Engineer at IBM Research, India. I am broadly interested in multi-modal analytics where deep learning based algorithms are used to analyse content in text, images and videos for reasoning and further decision-making.

I have worked for two years on IBM Watson Compare & Comply service for structured data extraction from business documents. I have also been working on smart data preparation for downstream processing in AI-based systems.


  • Deep Learning
  • Applied Artificial Intelligence
  • Computer Vision
  • Deep Reinforcement Learning


  • BTech in Electrical Engineering, 2017

    Indian Institute of Technology, Kanpur



Staff Research Software Engineer

IBM Research, India

Jul 2019 – Present New Delhi

Software Developer

IBM Research, India

Jul 2017 – Jul 2019 New Delhi


Deep Metric Learning

Video Representation Learning for Fine-Grained Scene Recognition and Retrieval.

Evolving AI

Model Learning with limited training data.

Scanned PDF-to-HTML Conversion

Extract structured information from unstructured documents.

Text Enrichment

Enrichment of educational texts with supplementary information.

Visual Cues for Text

Provide visual aid for a sequence of text based instructions.

Recent Publications

Quickly discover relevant content by filtering publications.

Simultaneous Optimisation of Image Quality Improvement and Text Content Extraction from Scanned Documents

In this paper, we propose to combine the OCR performance into the loss function during training of single image super resolution (SISR) networks for document images.

Learning Convolutional Neural Networks with Deep Part Embeddings

In this paper, we propose a novel way of training CNNs with a small subset of training samples using Deep Part Embeddings.

Radial Loss for Learning Fine-grained Video Similarity Metric

In this paper, we propose the Radial Loss which utilizes category and sub-category labels to learn an order-preserving fine-grained video similarity metric.

Pentuplet Loss for Simultaneous Shots and Critical Points Detection in a Video

In this paper, we propose a novel pentuplet loss to learn the frame image similarity metric through a pentuplet-based deep learning framework.

Content Driven Enrichment of Formal Text using Concept Definitions and Applications

We propose a text enrichment framework that enrichest concepts form input text with their definitions, applications and a pre-requisite concept graph that showcases the inter-dependency within the extracted concepts.

Coherent Visual Description of Textual Instructions

In this paper, we present a novel multistage framework to convert textual instructions into coherent visual descriptions (text instructions annotated with images).