Abstract:
In order to study the mechanism of the activation function in depth and discuss the properties of a good activation function to improve the generalization ability of the convolutional neural network model, the article reviews the development of the activation function and analyzes the properties that a good activation function should have. Activation functions can be roughly divided into "S-type" activation functions, "ReLU-type" activation functions, combined activation functions, and other types of activation functions. In the early stage of the development of deep learning, the "S-type" activation function has been widely used. With the deepening of the network model, it’s problem of "gradient disappearance" was found grandually. The emergence of the ReLU activation function alleviates this problem, but the negative half-axis of ReLU "set to 0" introduces the problem of "neuronal necrosis". Most of the subsequent improved activation functions were modified based on the negative semi-axis of ReLU to slow down "neuronal necrosis". At the end of the article, taking the multilayer perceptron as an example, the role of a good activation function in forward and backward propagation is deduced, and the properties that it should possess are derived.