查看“线性整流函数”的源代码

{{ request translation }}
[[Image:Ramp function.svg|整流線性單位函数|thumb|325px|right]]
{{机器学习导航栏}}

'''整流線性單位函数'''（Rectified Linear Unit, '''ReLU'''）,又称'''修正线性单元''', 是一种[[人工神经网络]]中常用的激勵函数（activation function），通常指代以[[斜坡函数]]及其变种为代表的非线性函数。

比较常用的线性整流函数有[[斜坡函数]] <math>f(x) = \max(0, x)</math>，以及带泄露整流函数 (Leaky ReLU)，其中 <math>x</math> 为神经元(Neuron)的输入。线性整流被认为有一定的生物学原理<ref name="glorot2011">{{cite conference |authors=Xavier Glorot, Antoine Bordes and [[Yoshua Bengio]] |year=2011 |title=Deep sparse rectifier neural networks |conference=AISTATS |url=http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf}}</ref>，并且由于在实践中通常有着比其他常用激勵函数（譬如[[逻辑函数]]）更好的效果，而被如今的[[深度学习|深度神经网络]]广泛使用于诸如图像识别等[[计算机视觉]]<ref name="glorot2011"/>人工智能领域。

== 定义 ==
通常意义下，线性整流函数指代数学中的[[斜坡函数]]，即

:<math>f(x) = \max(0, x)</math>

而在神经网络中，线性整流作为神经元的激活函数，定义了该神经元在线性变换 <math> \mathbf{w}^T \mathbf{x} + b </math>之后的非线性输出结果。换言之，对于进入神经元的来自上一层神经网络的输入向量 <math>x</math>，使用线性整流激活函数的神经元会输出

:<math>\max(0, \mathbf{w}^T \mathbf{x} + b)</math>

至下一层神经元或作为整个神经网络的输出（取决现神经元在网络结构中所处位置）。

== 变种 ==
线性整流函数在基于[[斜坡函数]]的基础上有其他同样被广泛应用于深度学习的变种，譬如带泄露线性整流(Leaky ReLU)<ref name="leakyrelu">{{cite conference|authors=Andrew L. Maas, Awni Y. Hannum and [[Andrew Ng | Andrew Y. Ng]]|year=2013|title=Rectified Nonlinearities Improve Neural Network Acoustic Models|conference=[[International Conference on Machine Learning|ICML]]|url=https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf}}</ref>， 带泄露随机线性整流(Randomized Leaky ReLU)<ref name="randomizedleakyrelu">{{cite arXiv |last1=Xu |first1=Bing |last2=Wang |first2=Naiyan |last3=Chen |first3=Tianqi |last4=Li |first4=Mu |date=2015 |title=Empirical Evaluation of Rectified Activations in Convolution Network |eprint=1505.00853v2 |url=https://arxiv.org/pdf/1505.00853.pdf}}</ref>，以及噪声线性整流(Noisy ReLU)<ref name="nair2010">{{cite conference |authors=Vinod Nair and [[Geoffrey Hinton]] |year=2010 |title=Rectified linear units improve restricted Boltzmann machines |conference=[[International Conference on Machine Learning|ICML]] |url=http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf |deadurl=yes |archiveurl=https://web.archive.org/web/20140324020659/http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf |archivedate=2014-03-24 }}</ref>.

=== 带泄露线性整流 ===
在输入值 <math>x</math> 为负的时候，'''带泄露线性整流函数'''（Leaky ReLU）的梯度为一个常数 <math>\lambda \in (0, 1)</math>，而不是0。在输入值为正的时候，带泄露线性整流函数和普通斜坡函数保持一致。换言之，

:<math>f(x)  = \begin{cases}
    x & \mbox{if } x > 0 \\
    \lambda x & \mbox{if } x \leq 0
\end{cases}</math>

在深度学习中，如果设定 <math>\lambda</math> 为一个可通过[[反向传播算法]]（Backpropagation）学习的变量，那么带泄露线性整流又被称为'''参数线性整流'''（Parametric ReLU）<ref name="parametricrelu">{{cite arXiv |last1=He |first1=Kaiming |last2=Zhang |first2=Xiangyu |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian |date=2015 |title=Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification|eprint=1502.01852v1|url=https://arxiv.org/pdf/1502.01852.pdf}}</ref>。

=== 带泄露随机线性整流 ===
'''带泄露随机线性整流'''（Randomized Leaky ReLU, '''RReLU'''）最早是在Kaggle全美数据科学大赛（NDSB）中被首先提出并使用的。相比于普通带泄露线性整流函数，带泄露随机线性整流在负输入值段的函数梯度 <math>\lambda</math> 是一个取自[[连续性均匀分布]] <math>U(l,u)</math> 概率模型的随机变量，即

:<math>f(x)  = \begin{cases}
    x & \mbox{if } x > 0 \\
    \lambda x & \mbox{if } x \leq 0
\end{cases}</math>

其中 <math>\lambda \sim U(l,u), l < u</math> 且 <math>l,u \in [0,1)</math>。

=== 噪声线性整流 ===
'''噪声线性整流'''（Noisy ReLU）是修正线性单元在考虑[[高斯噪声]]的基础上进行改进的变种激活函数。对于神经元的输入值 <math>x</math>，噪声线性整流加上了一定程度的正态分布的不确定性，即

:<math>f(x)=\max(0, x+Y)</math>

其中随机变量 <math>Y \sim \mathcal{N}(0, \sigma(x))</math>。目前，噪声线性整流函数在[[受限玻尔兹曼机]]（Restricted Boltzmann Machine）在计算机图形学的应用中取得了比较好的成果<ref name="nair2010"/>。

== 优势 ==
相比于传统的神经网络激活函数，诸如[[逻辑函数]]（Logistic sigmoid）和tanh等[[双曲函数]]，线性整流函数有着以下几方面的优势：
*仿生物学原理：相关大脑方面的研究表明生物神经元的信息编码通常是比较分散及稀疏的<ref name="brainresearch">{{cite conference |authors=David Attwell and Simon B. Laughlin |year=2001 |title=An energy budget for signaling in the grey matter of the brain |conference=JCBFM |url=http://jcb.sagepub.com/content/21/10/1133.long }}{{Dead link|date=2018年6月 |bot=InternetArchiveBot |fix-attempted=no }}</ref>。通常情况下，大脑中在同一时间大概只有1%-4%的神经元处于活跃状态。使用线性修正以及正则化（regularization）可以对机器神经网络中神经元的活跃度（即输出为正值）进行调试；相比之下，逻辑函数在输入为0时达到 <math>\frac{1}{2}</math>，即已经是半饱和的稳定状态，不够符合实际生物学对模拟神经网络的期望<ref name="glorot2011"/>。不过需要指出的是，一般情况下，在一个使用修正线性单元（即线性整流）的神经网络中大概有50%的神经元处于激活态<ref name="glorot2011"/>。

*更加有效率的[[梯度下降法|梯度下降]]以及反向传播：避免了梯度爆炸和[[梯度消失问题|梯度消失]]问题

*简化计算过程：没有了其他复杂激活函数中诸如指数函数的影响；同时活跃度的分散性使得神经网络整体计算成本下降

== 参考资料 ==
<!--- 参见https://zh.wikipedia.org/wiki/Wikipedia:%E8%84%9A%E6%B3%A8以获知如何使用<ref></ref>标签来插入参考资料，它们会自动显示在下方。 -->
{{Reflist}}

== 外部链接 ==
* [https://www.quora.com/What-is-special-about-rectifier-neural-units-used-in-NN-learning Quora: What is special about rectifier neural units used in NN learning?]  


<!--- 分类 --->

[[Category:人工神经网络]]