Image Captioning using Deep Learning based System

  • Omkar Sargar, Shakti Kinger
Keywords: Gated Recurrent Unit(GRU), image caption generation, EfficientNet B3, Artificial Intelligence

Abstract

Annotation of images is a time-consuming task. In recent times captioning of images is required in many Artificial Intelligence applications. Captioning Images intends to produce a sound and thorough portrayal that sums up the contents of a picture. Traditional approaches cannot handle the complexity and difficulties of image captioning as well as deep learning-based approaches. In this paper an encoder-decoder version with an attention mechanism integrating the picture feature that generates a semantic and syntactically sound description. One of the available benchmark datasets, Flickr8k is used to evaluate the model. The system incorporates an EfficientNet B3 encoder that draws out features from an image. It uses the mapped caption of images to produce the corpus. For the caption generation process, the model implements the attention and Gated Recurrent Unit (GRU) model to effectively bring about the image description. The various cosine similarity and N-Gram feature extraction means have been used for generating the blue score for the entire testing dataset and show the effectiveness of the proposed system.

Published
2021-08-25
How to Cite
Shakti Kinger, O. S. (2021). Image Captioning using Deep Learning based System. Design Engineering, 3903-3917. Retrieved from http://thedesignengineering.com/index.php/DE/article/view/3752
Section
Articles