Improving Image Captioning with Local Attention Mechanism

Research

Title	Improving Image Captioning with Local Attention Mechanism
Type	Presentation
Keywords	component, deep learning, image captioning, attention mechanism, encoder-decoder
Year	2022
Researchers	، Hassan Khotanlou ،

Abstract

—Image caption generation is field of research between the fields of machine vision and natural language processing. Based on the results of evidence, it is a difficult for the machine to understand an image like a human. Most of the proposed methods in this field of automatic image description production follow the encoder-decoder framework. In these proposed methods, each word in step (n) is generated based on the characteristics or features of the image and the previously generated (pre-generated) words. Recently, the attention mechanism, which usually creates a spatial map that highlights the image areas associated with each word, is widely used in researches. In this paper also uses the encoder-decoder framework. The encoder part of our model uses ResNet101 extract the features and the decoder part of model uses three parts: Attention-LSTM, Language-LSTM, and Attention-Layer. This paper uses a attention mechanism that uses local evidence to better demonstrate image features. Our method was able to generate good captions and also improve the evaluation metrics of METEOR, ROUGH.

Hassan Khotanlou

Research

Abstract