سامانه پژوهشی دانشگاه بوعلی سینا | Improving Image Captioning with Local Attention Mechanism

عنوان	Improving Image Captioning with Local Attention Mechanism
نوع پژوهش	مقاله ارائه شده کنفرانسی
کلیدواژه‌ها	component, deep learning, image captioning, attention mechanism, encoder-decoder
چکیده	—Image caption generation is field of research between the fields of machine vision and natural language processing. Based on the results of evidence, it is a difficult for the machine to understand an image like a human. Most of the proposed methods in this field of automatic image description production follow the encoder-decoder framework. In these proposed methods, each word in step (n) is generated based on the characteristics or features of the image and the previously generated (pre-generated) words. Recently, the attention mechanism, which usually creates a spatial map that highlights the image areas associated with each word, is widely used in researches. In this paper also uses the encoder-decoder framework. The encoder part of our model uses ResNet101 extract the features and the decoder part of model uses three parts: Attention-LSTM, Language-LSTM, and Attention-Layer. This paper uses a attention mechanism that uses local evidence to better demonstrate image features. Our method was able to generate good captions and also improve the evaluation metrics of METEOR, ROUGH.
پژوهشگران	زهرا فامیل ستاری (نفر اول)، حسن ختن لو (نفر دوم)، الهام علی قارداش (نفر سوم)