# Softplus

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Softplus
> Markdown URL: https://mediated.wiki/source/Softplus.md
> Source: https://en.wikipedia.org/wiki/Softplus
> Source revision: 1339378855
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

{{Short description|Smoothed ramp function}}
[[File:Softplus.svg|thumb|320px|Plot of the '''softplus''' function and the [ramp function](/source/ramp_function)]]

In [mathematics](/source/mathematics) and [machine learning](/source/machine_learning), the '''softplus''' function is

: <math>f(x) = \ln(1 + e^x).</math>

It is a smooth approximation (in fact, an [analytic function](/source/analytic_function)) to the [ramp function](/source/ramp_function), which is known as the ''[rectifier](/source/Rectifier_(neural_networks))'' or ''ReLU (rectified linear unit)'' in machine learning. For large negative <math>x</math> it is <math>\ln(1 + e^x) = \ln (1 + \epsilon) \gtrapprox \ln 1 = 0</math>, so just above 0, while for large positive <math>x</math> it is <math>\ln(1 + e^x) \gtrapprox \ln(e^x) = x</math>, so just above <math>x</math>.

The names ''softplus''<ref>{{Cite journal |last1=Dugas |first1=Charles
 |last2=Bengio |first2=Yoshua
 |last3=Bélisle |first3=François
 |last4=Nadeau |first4=Claude
 |last5=Garcia |first5=René
 |year=2000 |title=Incorporating second-order functional knowledge for better option pricing
 |url=http://papers.nips.cc/paper/1920-incorporating-second-order-functional-knowledge-for-better-option-pricing.pdf
 |journal=Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00)
 |publisher=MIT Press
 |pages=451–457
 |quote=Since the sigmoid ''h'' has a positive first derivative, its primitive, which we call softplus, is convex.
}}</ref><ref>{{Cite journal |last=Glorot |first=Xavier |last2=Bordes |first2=Antoine |last3=Bengio |first3=Yoshua |date=2011-06-14 |title=Deep Sparse Rectifier Neural Networks |url=https://proceedings.mlr.press/v15/glorot11a.html |journal=Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics |language=en |publisher=JMLR Workshop and Conference Proceedings |pages=315–323 |quote=Rectifier and softplus activation functions. The second one is a smooth version of the first.}}</ref> and ''SmoothReLU''<ref>{{Cite web |date=2017 |title=Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer |url=https://software.intel.com/sites/products/documentation/doclib/daal/daal-user-and-reference-guides/daal_prog_guide/GUID-FAC73B9B-A597-4F7D-A5C4-46707E4A92A0.htm |url-status=dead |access-date=2018-12-04 |website=Developer Guide for Intel Data Analytics Acceleration Library |language=en-US}}</ref> are used in machine learning. The name "softplus" (2000), by analogy with the earlier [softmax](/source/softmax) (1989) is presumably because it is a smooth (''soft'') approximation of the positive part of {{mvar|x}}, which is sometimes denoted with a superscript ''plus'', <math>x^+ := \max(0, x)</math>.

{{See also|Rectifier (neural networks)#Softplus}}<!-- (similar but different) -->

==Alternative forms==
This function can be approximated as:
: <math>\ln\left( 1 + e^x \right) \approx \begin{cases} \ln2, & x=0,\\[6pt] \frac x {1-e^{-x/\ln2}}, & x\neq 0 \end{cases}</math>

By making the change of variables <math>x = y\ln(2)</math>, this is equivalent to
: <math>\log_2(1 + 2^y) \approx \begin{cases} 1,& y=0,\\[6pt] \frac{y}{1-e^{-y}}, & y\neq 0. \end{cases}</math>

A sharpness parameter <math>k</math> may be included:
: <math>f(x) = \frac{\ln(1 + e^{kx})} k, \qquad\qquad
f'(x) = \frac{e^{kx}}{1 + e^{kx}} = \frac{1}{1 + e^{-kx}}. </math>
Additionally, the softplus function is equivalent to the log of the sigmoid function in the following way:<blockquote><math>-\ln(\text{sigmoid}(-x)) = -\ln\left(\frac{1}{1+e^x}\right) = \ln\left(1+e^x\right) = \text{softplus}(x)</math></blockquote>

==Related functions==
The derivative of softplus is the [standard logistic function](/source/logistic_function):
:<math>f'(x) = \frac{e^{x}}{1 + e^{x}} = \frac{1}{1 + e^{-x}}</math>

The logistic function or the [sigmoid function](/source/sigmoid_function) is a smooth approximation of the rectifier, the [Heaviside step function](/source/Heaviside_step_function).

===LogSumExp===
{{main|LogSumExp}}

The multivariable generalization of single-variable softplus is the [LogSumExp](/source/LogSumExp) with the first argument set to zero:

: <math>\operatorname{LSE_0}^+(x_1, \dots, x_n) := \operatorname{LSE}(0, x_1, \dots, x_n) = \ln(1 + e^{x_1} + \cdots + e^{x_n}).</math>

The LogSumExp function is

: <math>\operatorname{LSE}(x_1, \dots, x_n) = \ln(e^{x_1} + \cdots + e^{x_n}),</math>

and its gradient is the [softmax](/source/softmax_function); the softmax with the first argument set to zero is the multivariable generalization of the logistic function. Both LogSumExp and softmax are used in machine learning.

===Convex conjugate===
The [convex conjugate](/source/convex_conjugate) (specifically, the [Legendre transformation](/source/Legendre_transformation)) of the softplus function is the negative [binary entropy function](/source/binary_entropy_function) (with base {{var|e}}). This is because (following the definition of the Legendre transformation: the derivatives are inverse functions) the derivative of softplus is the logistic function, whose inverse function is the [logit](/source/logit), which is the derivative of negative binary entropy.

Softplus can be interpreted as [logistic loss](/source/logistic_loss) (as a positive number), so, by [duality](/source/duality_(optimization)), minimizing logistic loss corresponds to maximizing entropy. This justifies the [principle of maximum entropy](/source/principle_of_maximum_entropy) as loss minimization.

==References==
{{Reflist}}

Category:Artificial neural networks
Category:Computational neuroscience
Category:Entropy and information
Category:Exponentials
Category:Functions and mappings
Category:Logistic regression
Category:Loss functions

---
Adapted from the Wikipedia article [Softplus](https://en.wikipedia.org/wiki/Softplus) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Softplus?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.