Brownian-Swish: A Hybrid Stochastic Activation for Deep Neural Networks
Main Article Content
Abstract
Activation functions play an important role in deep learning, yet through the use of conventional deterministic activation functions, such as ReLU, Swish, and Mish, the ability to achieve smooth optimization and robustness to uncertainties can be difficult. Exploration of stochastic alternative forms of activation functions have been undertaken; however, they suffered from a lack of source literature establishing rigorous theoretical underpinnings, in combination with empirical evidence supporting consistent improvement in performance. In this paper, we discuss the introduction of a hybrid activation function known as Brownian-Swish that combines the smooth, nonlinear characteristics of Swish, with stochastic perturbations derived adaptively via Brownian motion. We establish key properties of Brownian-Swish: differentiability, convergence to Swish, gradient stability, and implicit regularization. In a large number of experiments in various domains, such as vision (CIFAR-10, CIFAR-100), natural language processing (IMDB sentiment, AG News), and sequential forecasting, Brownian-Swish consistently produced greater accuracy, greater stability to noise, and lower generalization gap compared to both deterministic and stochastic baselines; findings were further supported through ablation studies demonstrating similar convergence behaviour. These results underscore the Brownian-Swish activation function as a foundational component of stochastic deep-learning processes, and provide the basis for developing more robust and generalizable models.