Element-wise activation functions play a critical role in deep neural networks by affecting expressivity power and learning dynamics . We propose a new perspective of learnable activation function through formulating them with element-wise attention mechanism . Attention-based Rectified Linear Unit (AReLU) significantly boosts the performance of most mainstream network architectures with only two extra learnable parameters per layer introduced . AReLU facilitates fast network training under small learning rates, which makes it especially suited in the case of transfer learning . Our source code has been released (https://://://gong.com/densechen/areLU).

