【自然语言处理】自然语言处理中神经注意模型的批判性回顾|序列|维度|自然语言处理

题目：

Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

作者：

Andrea Galassi, Marco Lippi, Paolo Torroni

Computation and Language (cs.CL)

(Submitted on 4 Feb 2019)

链接：

https://arxiv.org/abs/1902.02181

摘要

注意力机制是一种在各种神经架构中，越来越流行。由于该领域的快速发展，人们仍然缺乏对系统的关注。在本文中，我们为自然语言处理的注意力架构定义了一个统一模型，重点是设计用于处理文本数据的矢量表示的架构。我们讨论提案不同的维度，关注的可能用途，并绘制该领域的主要研究活动和公开挑战。

要点

图1所示。RNNsearch结构（Bahdanau et al., 2015）（左）它的注意力模型（右）。

图2所示。注意力模型的核心。

图3所示。一般类型的注意力模型。

图4所示。注意力在序列到序列模型中的例子。

图5所示。Yang et al. (2016b)(左)，Zhao and Zhang(2018)(中)，Ma et al.(2018)(右)定义的分层输入注意模型。从左到右依次应用不同层次的注意功能。

图6所示。: Lu et al.(2016)(左)和Ma et al.(2017)(右)的粗粒度联合注意模型。

图7所示。dos Santos et al.(2016)(左)和Cui et al.(2017)(右)提出的细粒度共同注意模型。虚线显示了最大池/分布函数是如何执行的(按列或按行)。

英文原文

Attention is an increasingly popular mechanism used in a wide range of neural architectures. Because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures for natural language processing, with a focus on architectures designed to work with vector representation of the textual data. We discuss the dimensions along which proposals differ, the possible uses of attention, and chart the major research activities and open challenges in the area.