# Extreme learning machine

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Extreme_learning_machine
> Markdown URL: https://mediated.wiki/source/Extreme_learning_machine.md
> Source: https://en.wikipedia.org/wiki/Extreme_learning_machine
> Source revision: 1354110283
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Type of artificial neural network

Part of a series on Machine learning and data mining Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM) Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL Structured prediction Graphical models Bayes net Conditional random field Hidden Markov Anomaly detection RANSAC k-NN Local outlier factor Isolation forest Neural networks Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural field Neural radiance field Physics-informed neural networks Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM) Reinforcement learning Q-learning Policy gradient SARSA Temporal difference (TD) Multi-agent Self-play Learning with humans Active learning Crowdsourcing Human-in-the-loop Mechanistic interpretability RLHF Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning Journals and conferences AAAI CVPR ECCV ECML PKDD EMNLP ICCV NeurIPS ICML ICLR IJCAI ML JMLR Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning v t e

**Extreme learning machines** are [feedforward neural networks](/source/Feedforward_neural_network) for [classification](/source/Statistical_classification), [regression](/source/Regression_analysis), [clustering](/source/Cluster_analysis), [sparse approximation](/source/Sparse_approximation), compression and [feature learning](/source/Feature_learning) with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need to be tuned. These hidden nodes can be randomly assigned and never updated (i.e. they are [random projection](/source/Random_projection) but with nonlinear transforms), or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

The name "extreme learning machine" (ELM) was given to such models by Guang-Bin Huang who originally proposed for the networks with any type of nonlinear piecewise continuous hidden nodes including biological neurons and different type of mathematical basis functions.[1][2] The idea for artificial neural networks goes back to [Frank Rosenblatt](/source/Frank_Rosenblatt), who not only published a single layer [perceptron](/source/Perceptron) in 1958,[3] but also introduced a [multilayer perceptron](/source/Multilayer_perceptron) with 3 layers: an input layer, a hidden layer with randomized weights that did not learn, and a learning output layer.[4]

According to some researchers, these models are able to produce good generalization performance and learn thousands of times faster than networks trained using [backpropagation](/source/Backpropagation).[5] In literature, it also shows that these models can outperform [support vector machines](/source/Support_vector_machine) in both classification and regression applications.[6][1][7]

## History

From 2001-2010, ELM research mainly focused on the unified learning framework for "generalized" single-hidden layer feedforward neural networks (SLFNs), including but not limited to sigmoid networks, RBF networks, threshold networks,[8] trigonometric networks, fuzzy inference systems, Fourier series,[9][10] Laplacian transform, wavelet networks,[11] etc. One significant achievement made in those years is to successfully prove the universal approximation and classification capabilities of ELM in theory.[9][12][13]

From 2010 to 2015, ELM research extended to the unified learning framework for kernel learning, SVM and a few typical feature learning methods such as [principal component analysis](/source/Principal_component_analysis) (PCA) and [non-negative matrix factorization](/source/Non-negative_matrix_factorization) (NMF). It is shown that SVM actually provides suboptimal solutions compared to ELM, and ELM can provide the whitebox kernel mapping, which is implemented by ELM random feature mapping, instead of the blackbox kernel used in SVM. PCA and NMF can be considered as special cases where linear hidden nodes are used in ELM.[14][15]

From 2015 to 2017, an increased focus has been placed on hierarchical implementations[16][17] of ELM. Additionally since 2011, significant biological studies have been made that support certain ELM theories.[18][19][20]

From 2017 onwards, to overcome low-convergence problem during training [LU decomposition](/source/LU_decomposition), [Hessenberg decomposition](/source/Bartels%E2%80%93Stewart_algorithm#The_Hessenberg–Schur_algorithm) and [QR decomposition](/source/QR_decomposition) based approaches with [regularization](/source/Regularization_(mathematics)) have begun to attract attention[21][22][23]

In 2017, Google Scholar Blog published a list of "Classic Papers: Articles That Have Stood The Test of Time".[24] Among these are two papers written about ELM which are shown in studies 2 and 7 from the "List of 10 classic AI papers from 2006".[25][26][27]

## Algorithms

Given a single hidden layer of ELM, suppose that the output function of the i {\displaystyle i} -th hidden node is h i ( x ) = G ( a i , b i , x ) {\displaystyle h_{i}(\mathbf {x} )=G(\mathbf {a} _{i},b_{i},\mathbf {x} )} , where a i {\displaystyle \mathbf {a} _{i}} and b i {\displaystyle b_{i}} are the parameters of the i {\displaystyle i} -th hidden node. The output function of the ELM for single hidden layer feedforward networks (SLFN) with L {\displaystyle L} hidden nodes is:

f L ( x ) = ∑ i = 1 L β i h i ( x ) {\displaystyle f_{L}({\bf {x}})=\sum _{i=1}^{L}{\boldsymbol {\beta }}_{i}h_{i}({\bf {x}})} , where β i {\displaystyle {\boldsymbol {\beta }}_{i}} is the output weight of the i {\displaystyle i} -th hidden node.

h ( x ) = [ h 1 ( x ) , . . . , h L ( x ) ] {\displaystyle \mathbf {h} (\mathbf {x} )=[h_{1}(\mathbf {x} ),...,h_{L}(\mathbf {x} )]} is the hidden layer output mapping of ELM. Given N {\displaystyle N} training samples, the hidden layer output matrix H {\displaystyle \mathbf {H} } of ELM is given as: H = [ h ( x 1 ) ⋮ h ( x N ) ] = [ G ( a 1 , b 1 , x 1 ) ⋯ G ( a L , b L , x 1 ) ⋮ ⋮ ⋮ G ( a 1 , b 1 , x N ) ⋯ G ( a L , b L , x N ) ] {\displaystyle {\bf {H}}=\left[{\begin{matrix}{\bf {h}}({\bf {x}}_{1})\\\vdots \\{\bf {h}}({\bf {x}}_{N})\end{matrix}}\right]=\left[{\begin{matrix}G({\bf {a}}_{1},b_{1},{\bf {x}}_{1})&\cdots &G({\bf {a}}_{L},b_{L},{\bf {x}}_{1})\\\vdots &\vdots &\vdots \\G({\bf {a}}_{1},b_{1},{\bf {x}}_{N})&\cdots &G({\bf {a}}_{L},b_{L},{\bf {x}}_{N})\end{matrix}}\right]}

and T {\displaystyle \mathbf {T} } is the training data target matrix: T = [ t 1 ⋮ t N ] {\displaystyle {\bf {T}}=\left[{\begin{matrix}{\bf {t}}_{1}\\\vdots \\{\bf {t}}_{N}\end{matrix}}\right]}

Generally speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:

Minimize: ‖ β ‖ p σ 1 + C ‖ H β − T ‖ q σ 2 {\displaystyle {\text{Minimize: }}\|{\boldsymbol {\beta }}\|_{p}^{\sigma _{1}}+C\|{\bf {H}}{\boldsymbol {\beta }}-{\bf {T}}\|_{q}^{\sigma _{2}}}

where σ 1 > 0 , σ 2 > 0 , p , q = 0 , 1 2 , 1 , 2 , ⋯ , + ∞ {\displaystyle \sigma _{1}>0,\sigma _{2}>0,p,q=0,{\frac {1}{2}},1,2,\cdots ,+\infty } .

Different combinations of σ 1 {\displaystyle \sigma _{1}} , σ 2 {\displaystyle \sigma _{2}} , p {\displaystyle p} and q {\displaystyle q} can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.

As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):

- Y ^ = W 2 σ ( W 1 x ) {\displaystyle \mathbf {\hat {Y}} =\mathbf {W} _{2}\sigma (\mathbf {W} _{1}x)}

where **W**1 is the matrix of input-to-hidden-layer weights, σ {\displaystyle \sigma } is an activation function, and **W**2 is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:

1. Fill **W**1 with random values (e.g., [Gaussian random noise](/source/Gaussian_noise));

1. estimate **W**2 by [least-squares fit](/source/Least-squares_fit) to a matrix of response variables **Y**, computed using the [pseudoinverse](/source/Moore%E2%80%93Penrose_pseudoinverse) ⋅+, given a [design matrix](/source/Design_matrix) **X**: 1. W 2 = σ ( W 1 X ) + Y {\displaystyle \mathbf {W} _{2}=\sigma (\mathbf {W} _{1}\mathbf {X} )^{+}\mathbf {Y} }

## Architectures

In most cases, ELM is used as a single hidden layer feedforward network (SLFN) including but not limited to sigmoid networks, RBF networks, threshold networks, fuzzy inference networks, complex neural networks, wavelet networks, Fourier transform, Laplacian transform, etc. Due to its different learning algorithm implementations for regression, classification, sparse coding, compression, feature learning and clustering, multi ELMs have been used to form multi hidden layer networks, [deep learning](/source/Deep_learning) or hierarchical networks.[16][17][28]

A hidden node in ELM is a computational element, which need not be considered as classical neuron. A hidden node in ELM can be classical artificial neurons, basis functions, or a subnetwork formed by some hidden nodes.[12]

## Theories

Both universal approximation and classification capabilities[6][1] have been proved for ELM in literature. Especially, [Guang-Bin Huang](https://en.wikipedia.org/w/index.php?title=Guang-Bin_Huang&action=edit&redlink=1) and his team spent almost seven years (2001-2008) on the rigorous proofs of ELM's universal approximation capability.[9][12][13]

### Universal approximation capability

In theory, any nonconstant piecewise continuous function can be used as activation function in ELM hidden nodes, such an activation function need not be differential. If tuning the parameters of hidden nodes could make SLFNs approximate any target function f ( x ) {\displaystyle f(\mathbf {x} )} , then hidden node parameters can be randomly generated according to any continuous distribution probability, and lim L → ∞ ‖ ∑ i = 1 L β i h i ( x ) − f ( x ) ‖ = 0 {\displaystyle \lim _{L\rightarrow \infty }\left\|\sum _{i=1}^{L}{\boldsymbol {\beta }}_{i}h_{i}({\bf {x}})-f({\bf {x}})\right\|=0} holds with probability one with appropriate output weights β {\displaystyle {\boldsymbol {\beta }}} .

### Classification capability

Given any nonconstant piecewise continuous function as the activation function in SLFNs, if tuning the parameters of hidden nodes can make SLFNs approximate any target function f ( x ) {\displaystyle f(\mathbf {x} )} , then SLFNs with random hidden layer mapping h ( x ) {\displaystyle \mathbf {h} (\mathbf {x} )} can separate arbitrary disjoint regions of any shapes.

## Neurons

A wide range of nonlinear piecewise continuous functions G ( a , b , x ) {\displaystyle G(\mathbf {a} ,b,\mathbf {x} )} can be used in hidden neurons of ELM, for example:

### Real domain

Sigmoid function: G ( a , b , x ) = 1 1 + exp ⁡ ( − ( a ⋅ x + b ) ) {\displaystyle G(\mathbf {a} ,b,\mathbf {x} )={\frac {1}{1+\exp(-(\mathbf {a} \cdot \mathbf {x} +b))}}}

Fourier function: G ( a , b , x ) = sin ⁡ ( a ⋅ x + b ) {\displaystyle G(\mathbf {a} ,b,\mathbf {x} )=\sin(\mathbf {a} \cdot \mathbf {x} +b)}

Hardlimit function: G ( a , b , x ) = { 1 , if a ⋅ x − b ≥ 0 0 , otherwise {\displaystyle G(\mathbf {a} ,b,\mathbf {x} )={\begin{cases}1,&{\text{if }}{\bf {a}}\cdot {\bf {x}}-b\geq 0\\0,&{\text{otherwise}}\end{cases}}}

Gaussian function: G ( a , b , x ) = exp ⁡ ( − b ‖ x − a ‖ 2 ) {\displaystyle G(\mathbf {a} ,b,\mathbf {x} )=\exp(-b\|\mathbf {x} -\mathbf {a} \|^{2})}

Multiquadrics function: G ( a , b , x ) = ( ‖ x − a ‖ 2 + b 2 ) 1 / 2 {\displaystyle G(\mathbf {a} ,b,\mathbf {x} )=(\|\mathbf {x} -\mathbf {a} \|^{2}+b^{2})^{1/2}}

Wavelet: G ( a , b , x ) = ‖ a ‖ − 1 / 2 Ψ ( x − a b ) {\displaystyle G(\mathbf {a} ,b,\mathbf {x} )=\|a\|^{-1/2}\Psi \left({\frac {\mathbf {x} -\mathbf {a} }{b}}\right)} where Ψ {\displaystyle \Psi } is a single mother wavelet function.

### Complex domain

Circular functions:

tan ⁡ ( z ) = e i z − e − i z i ( e i z + e − i z ) {\displaystyle \tan(z)={\frac {e^{iz}-e^{-iz}}{i(e^{iz}+e^{-iz})}}}

sin ⁡ ( z ) = e i z − e − i z 2 i {\displaystyle \sin(z)={\frac {e^{iz}-e^{-iz}}{2i}}}

Inverse circular functions:

arctan ⁡ ( z ) = ∫ 0 z d t 1 + t 2 {\displaystyle \arctan(z)=\int _{0}^{z}{\frac {dt}{1+t^{2}}}}

arccos ⁡ ( z ) = ∫ 0 z d t ( 1 − t 2 ) 1 / 2 {\displaystyle \arccos(z)=\int _{0}^{z}{\frac {dt}{(1-t^{2})^{1/2}}}}

Hyperbolic functions:

tanh ⁡ ( z ) = e z − e − z e z + e − z {\displaystyle \tanh(z)={\frac {e^{z}-e^{-z}}{e^{z}+e^{-z}}}}

sinh ⁡ ( z ) = e z − e − z 2 {\displaystyle \sinh(z)={\frac {e^{z}-e^{-z}}{2}}}

Inverse hyperbolic functions:

arctanh ( z ) = ∫ 0 z d t 1 − t 2 {\displaystyle {\text{arctanh}}(z)=\int _{0}^{z}{\frac {dt}{1-t^{2}}}}

arcsinh ( z ) = ∫ 0 z d t ( 1 + t 2 ) 1 / 2 {\displaystyle {\text{arcsinh}}(z)=\int _{0}^{z}{\frac {dt}{(1+t^{2})^{1/2}}}}

## Reliability

See also: [Explainable AI](/source/Explainable_AI)

The [black-box](/source/Black-box) character of neural networks in general and extreme learning machines (ELM) in particular is one of the major concerns that repels engineers from application in unsafe automation tasks. This particular issue was approached by means of several different techniques. One approach is to reduce the dependence on the random input.[29][30] Another approach focuses on the incorporation of continuous constraints into the learning process of ELMs[31][32] which are derived from prior knowledge about the specific task. This is reasonable, because machine learning solutions have to guarantee a safe operation in many application domains. The mentioned studies revealed that the special form of ELMs, with its functional separation and the linear read-out weights, is particularly well suited for the efficient incorporation of continuous constraints in predefined regions of the input space.

## Controversy

There are two main complaints from academic community concerning this work, the first one is about "reinventing and ignoring previous ideas", the second one is about "improper naming and popularizing", as shown in some debates in 2008 and 2015.[33] In particular, it was pointed out in a letter[34] to the editor of *IEEE Transactions on Neural Networks* that the idea of using a hidden layer connected to the inputs by random untrained weights was already suggested in the original papers on [RBF networks](/source/RBF_network) in the late 1980s; Guang-Bin Huang replied by pointing out subtle differences.[35] In a 2015 paper,[1] Huang responded to complaints about his invention of the name ELM for already-existing methods, complaining of "very negative and unhelpful comments on ELM in neither academic nor professional manner due to various reasons and intentions" and an "irresponsible anonymous attack which intends to destroy harmony research environment", arguing that his work "provides a unifying learning platform" for various types of neural nets,[1] including hierarchical structured ELM.[28] In 2015, Huang also gave a formal rebuttal to what he considered as "malign and attack."[36] Recent research replaces the random weights with constrained random weights.[6][37]

## Open sources

- [Matlab Library](http://www.ntu.edu.sg/home/egbhuang/reference.html)

- Python Library[38]

## See also

- [Reservoir computing](/source/Reservoir_computing)

- [Random projection](/source/Random_projection)

- [Random matrix](/source/Random_matrix)

## References

1. ^ [***a***](#cite_ref-:0_1-0) [***b***](#cite_ref-:0_1-1) [***c***](#cite_ref-:0_1-2) [***d***](#cite_ref-:0_1-3) [***e***](#cite_ref-:0_1-4) Huang, Guang-Bin (2015). ["What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt's Dream and John von Neumann's Puzzle"](https://web.archive.org/web/20170610222724/http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Rosenblatt-Neumann.pdf) (PDF). *Cognitive Computation*. **7** (3): 263–278. [doi](/source/Doi_(identifier)):[10.1007/s12559-015-9333-0](https://doi.org/10.1007%2Fs12559-015-9333-0). [S2CID](/source/S2CID_(identifier)) [13936498](https://api.semanticscholar.org/CorpusID:13936498). Archived from [the original](http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Rosenblatt-Neumann.pdf) (PDF) on 2017-06-10. Retrieved 2015-07-30.

1. **[^](#cite_ref-2)** Huang, Guang-Bin (2014). ["An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels"](http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Randomness-Kernel.pdf) (PDF). *Cognitive Computation*. **6** (3): 376–390. [doi](/source/Doi_(identifier)):[10.1007/s12559-014-9255-2](https://doi.org/10.1007%2Fs12559-014-9255-2). [S2CID](/source/S2CID_(identifier)) [7419259](https://api.semanticscholar.org/CorpusID:7419259).

1. **[^](#cite_ref-3)** [Rosenblatt, Frank](/source/Frank_Rosenblatt) (1958). "The Perceptron: A Probabilistic Model For Information Storage And Organization in the Brain". *Psychological Review*. **65** (6): 386–408. [CiteSeerX](/source/CiteSeerX_(identifier)) [10.1.1.588.3775](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.588.3775). [doi](/source/Doi_(identifier)):[10.1037/h0042519](https://doi.org/10.1037%2Fh0042519). [PMID](/source/PMID_(identifier)) [13602029](https://pubmed.ncbi.nlm.nih.gov/13602029). [S2CID](/source/S2CID_(identifier)) [12781225](https://api.semanticscholar.org/CorpusID:12781225).

1. **[^](#cite_ref-rosenblatt1962_4-0)** [Rosenblatt, Frank](/source/Frank_Rosenblatt) (1962). *Principles of Neurodynamics*. Spartan, New York.

1. **[^](#cite_ref-5)** Huang, Guang-Bin; Zhu, Qin-Yu; Siew, Chee-Kheong (2006). "Extreme learning machine: theory and applications". *Neurocomputing*. **70** (1): 489–501. [CiteSeerX](/source/CiteSeerX_(identifier)) [10.1.1.217.3692](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.217.3692). [doi](/source/Doi_(identifier)):[10.1016/j.neucom.2005.12.126](https://doi.org/10.1016%2Fj.neucom.2005.12.126). [S2CID](/source/S2CID_(identifier)) [116858](https://api.semanticscholar.org/CorpusID:116858).

1. ^ [***a***](#cite_ref-:4_6-0) [***b***](#cite_ref-:4_6-1) [***c***](#cite_ref-:4_6-2) Huang, Guang-Bin; Hongming Zhou; Xiaojian Ding; and Rui Zhang (2012). ["Extreme Learning Machine for Regression and Multiclass Classification"](https://web.archive.org/web/20170829025814/http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Unified-Learning.pdf) (PDF). *IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)*. **42** (2): 513–529. [Bibcode](/source/Bibcode_(identifier)):[2012ITSMC..42..513H](https://ui.adsabs.harvard.edu/abs/2012ITSMC..42..513H). [CiteSeerX](/source/CiteSeerX_(identifier)) [10.1.1.298.1213](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.298.1213). [doi](/source/Doi_(identifier)):[10.1109/tsmcb.2011.2168604](https://doi.org/10.1109%2Ftsmcb.2011.2168604). [PMID](/source/PMID_(identifier)) [21984515](https://pubmed.ncbi.nlm.nih.gov/21984515). [S2CID](/source/S2CID_(identifier)) [15037168](https://api.semanticscholar.org/CorpusID:15037168). Archived from [the original](http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Unified-Learning.pdf) (PDF) on 2017-08-29. Retrieved 2017-08-19.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-7)** Huang, Guang-Bin (2014). ["An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels"](http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Randomness-Kernel.pdf) (PDF). *Cognitive Computation*. **6** (3): 376–390. [doi](/source/Doi_(identifier)):[10.1007/s12559-014-9255-2](https://doi.org/10.1007%2Fs12559-014-9255-2). [S2CID](/source/S2CID_(identifier)) [7419259](https://api.semanticscholar.org/CorpusID:7419259).

1. **[^](#cite_ref-8)** Huang, Guang-Bin, Qin-Yu Zhu, K. Z. Mao, Chee-Kheong Siew, P. Saratchandran, and N. Sundararajan (2006). ["Can Threshold Networks Be Trained Directly?"](https://web.archive.org/web/20170829040414/http://www.ntu.edu.sg/home/egbhuang/pdf/TCASII-ELM-Threshold-Network.pdf) (PDF). *IEEE Transactions on Circuits and Systems II: Express Briefs*. **53** (3): 187–191. [Bibcode](/source/Bibcode_(identifier)):[2006ITCSE..53..187H](https://ui.adsabs.harvard.edu/abs/2006ITCSE..53..187H). [doi](/source/Doi_(identifier)):[10.1109/tcsii.2005.857540](https://doi.org/10.1109%2Ftcsii.2005.857540). [S2CID](/source/S2CID_(identifier)) [18076010](https://api.semanticscholar.org/CorpusID:18076010). Archived from [the original](http://www.ntu.edu.sg/home/egbhuang/pdf/TCASII-ELM-Threshold-Network.pdf) (PDF) on 2017-08-29. Retrieved 2017-08-22.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. ^ [***a***](#cite_ref-:1_9-0) [***b***](#cite_ref-:1_9-1) [***c***](#cite_ref-:1_9-2) Huang, Guang-Bin, Lei Chen, and Chee-Kheong Siew (2006). ["Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes"](https://web.archive.org/web/20170829012641/http://www.ntu.edu.sg/home/egbhuang/pdf/I-ELM.pdf) (PDF). *IEEE Transactions on Neural Networks*. **17** (4): 879–892. [Bibcode](/source/Bibcode_(identifier)):[2006ITNN...17..879H](https://ui.adsabs.harvard.edu/abs/2006ITNN...17..879H). [doi](/source/Doi_(identifier)):[10.1109/tnn.2006.875977](https://doi.org/10.1109%2Ftnn.2006.875977). [PMID](/source/PMID_(identifier)) [16856652](https://pubmed.ncbi.nlm.nih.gov/16856652). [S2CID](/source/S2CID_(identifier)) [6477031](https://api.semanticscholar.org/CorpusID:6477031). Archived from [the original](http://www.ntu.edu.sg/home/egbhuang/pdf/I-ELM.pdf) (PDF) on 2017-08-29. Retrieved 2017-08-22.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-10)** Rahimi, Ali, and Benjamin Recht (2008). ["Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning"](https://people.eecs.berkeley.edu/~brecht/papers/08.rah.rec.nips.pdf) (PDF). *Advances in Neural Information Processing Systems*. **21**.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-11)** Cao, Jiuwen, Zhiping Lin, Guang-Bin Huang (2010). "Composite Function Wavelet Neural Networks with Extreme Learning Machine". *Neurocomputing*. **73** (7–9): 1405–1416. [doi](/source/Doi_(identifier)):[10.1016/j.neucom.2009.12.007](https://doi.org/10.1016%2Fj.neucom.2009.12.007).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. ^ [***a***](#cite_ref-:2_12-0) [***b***](#cite_ref-:2_12-1) [***c***](#cite_ref-:2_12-2) Huang, Guang-Bin, Lei Chen (2007). ["Convex Incremental Extreme Learning Machine"](https://web.archive.org/web/20170810165755/http://www3.ntu.edu.sg/home/egbhuang/pdf/CI-ELM.pdf) (PDF). *Neurocomputing*. **70** (16–18): 3056–3062. [doi](/source/Doi_(identifier)):[10.1016/j.neucom.2007.02.009](https://doi.org/10.1016%2Fj.neucom.2007.02.009). Archived from [the original](http://www.ntu.edu.sg/home/egbhuang/pdf/CI-ELM.pdf) (PDF) on 2017-08-10. Retrieved 2017-08-22.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. ^ [***a***](#cite_ref-:3_13-0) [***b***](#cite_ref-:3_13-1) Huang, Guang-Bin, and Lei Chen (2008). ["Enhanced Random Search Based Incremental Extreme Learning Machine"](https://web.archive.org/web/20141014020332/http://www.ntu.edu.sg/home/egbhuang/pdf/EI-ELM.pdf) (PDF). *Neurocomputing*. **71** (16–18): 3460–3468. [CiteSeerX](/source/CiteSeerX_(identifier)) [10.1.1.217.3009](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.217.3009). [doi](/source/Doi_(identifier)):[10.1016/j.neucom.2007.10.008](https://doi.org/10.1016%2Fj.neucom.2007.10.008). Archived from [the original](http://www.ntu.edu.sg/home/egbhuang/pdf/EI-ELM.pdf) (PDF) on 2014-10-14. Retrieved 2017-08-22.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-14)** He, Qing, Xin Jin, Changying Du, Fuzhen Zhuang, Zhongzhi Shi (2014). ["Clustering in Extreme Learning Machine Feature Space"](http://www.intsci.ac.cn/users/jinxin/Mypapers/ELM-Neurocomputing-2013.pdf) (PDF). *Neurocomputing*. **128**: 88–95. [doi](/source/Doi_(identifier)):[10.1016/j.neucom.2012.12.063](https://doi.org/10.1016%2Fj.neucom.2012.12.063). [S2CID](/source/S2CID_(identifier)) [30906342](https://api.semanticscholar.org/CorpusID:30906342).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-15)** Kasun, Liyanaarachchi Lekamalage Chamara, Yan Yang, Guang-Bin Huang, and Zhengyou Zhang (2016). ["Dimension Reduction With Extreme Learning Machine"](http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Dimensionality-Reduction.pdf) (PDF). *IEEE Transactions on Image Processing*. **25** (8): 3906–3918. [Bibcode](/source/Bibcode_(identifier)):[2016ITIP...25.3906K](https://ui.adsabs.harvard.edu/abs/2016ITIP...25.3906K). [doi](/source/Doi_(identifier)):[10.1109/tip.2016.2570569](https://doi.org/10.1109%2Ftip.2016.2570569). [PMID](/source/PMID_(identifier)) [27214902](https://pubmed.ncbi.nlm.nih.gov/27214902). [S2CID](/source/S2CID_(identifier)) [1803922](https://api.semanticscholar.org/CorpusID:1803922).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. ^ [***a***](#cite_ref-:5_16-0) [***b***](#cite_ref-:5_16-1) Huang, Guang-Bin, Zuo Bai, and Liyanaarachchi Lekamalage Chamara Kasun, and Chi Man Vong (2015). ["Local Receptive Fields Based Extreme Learning Machine"](https://web.archive.org/web/20170808031835/http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-LRF.pdf) (PDF). *IEEE Computational Intelligence Magazine*. **10** (2): 18–29. [Bibcode](/source/Bibcode_(identifier)):[2015ICIM...10b..18H](https://ui.adsabs.harvard.edu/abs/2015ICIM...10b..18H). [doi](/source/Doi_(identifier)):[10.1109/mci.2015.2405316](https://doi.org/10.1109%2Fmci.2015.2405316). [S2CID](/source/S2CID_(identifier)) [1417306](https://api.semanticscholar.org/CorpusID:1417306). Archived from [the original](http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-LRF.pdf) (PDF) on 2017-08-08. Retrieved 2017-08-22.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. ^ [***a***](#cite_ref-:6_17-0) [***b***](#cite_ref-:6_17-1) Tang, Jiexiong, Chenwei Deng, and Guang-Bin Huang (2016). ["Extreme Learning Machine for Multilayer Perceptron"](https://web.archive.org/web/20170712092522/http://www.ntu.edu.sg/home/egbhuang/pdf/Multiple-ELM.pdf) (PDF). *IEEE Transactions on Neural Networks and Learning Systems*. **27** (4): 809–821. [Bibcode](/source/Bibcode_(identifier)):[2016ITNNL..27..809T](https://ui.adsabs.harvard.edu/abs/2016ITNNL..27..809T). [doi](/source/Doi_(identifier)):[10.1109/tnnls.2015.2424995](https://doi.org/10.1109%2Ftnnls.2015.2424995). [PMID](/source/PMID_(identifier)) [25966483](https://pubmed.ncbi.nlm.nih.gov/25966483). [S2CID](/source/S2CID_(identifier)) [206757279](https://api.semanticscholar.org/CorpusID:206757279). Archived from [the original](http://www.ntu.edu.sg/home/egbhuang/pdf/Multiple-ELM.pdf) (PDF) on 2017-07-12. Retrieved 2017-08-22.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-18)** Barak, Omri; Rigotti, Mattia; and Fusi, Stefano (2013). ["The Sparseness of Mixed Selectivity Neurons Controls the Generalization-Discrimination Trade-off"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6119179). *Journal of Neuroscience*. **33** (9): 3844–3856. [doi](/source/Doi_(identifier)):[10.1523/jneurosci.2753-12.2013](https://doi.org/10.1523%2Fjneurosci.2753-12.2013). [PMC](/source/PMC_(identifier)) [6119179](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6119179). [PMID](/source/PMID_(identifier)) [23447596](https://pubmed.ncbi.nlm.nih.gov/23447596).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-19)** Rigotti, Mattia; Barak, Omri; Warden, Melissa R.; Wang, Xiao-Jing; Daw, Nathaniel D.; Miller, Earl K.; and Fusi, Stefano (2013). ["The Importance of Mixed Selectivity in Complex Cognitive Tasks"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4412347). *Nature*. **497** (7451): 585–590. [Bibcode](/source/Bibcode_(identifier)):[2013Natur.497..585R](https://ui.adsabs.harvard.edu/abs/2013Natur.497..585R). [doi](/source/Doi_(identifier)):[10.1038/nature12160](https://doi.org/10.1038%2Fnature12160). [PMC](/source/PMC_(identifier)) [4412347](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4412347). [PMID](/source/PMID_(identifier)) [23685452](https://pubmed.ncbi.nlm.nih.gov/23685452).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-20)** Fusi, Stefano, Earl K Miller and Mattia Rigotti (2015). ["Why Neurons Mix: High Dimensionality for Higher Cognition"](http://www.ntu.edu.sg/home/egbhuang/pdf/Why-Neurons-Mix-ELM.pdf) (PDF). *Current Opinion in Neurobiology*. **37**: 66–74. [doi](/source/Doi_(identifier)):[10.1016/j.conb.2016.01.010](https://doi.org/10.1016%2Fj.conb.2016.01.010). [PMID](/source/PMID_(identifier)) [26851755](https://pubmed.ncbi.nlm.nih.gov/26851755). [S2CID](/source/S2CID_(identifier)) [13897721](https://api.semanticscholar.org/CorpusID:13897721).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-:29_21-0)** Kutlu, Yakup Kutlu, Apdullah Yayık, and Esen Yıldırım, and Serdar Yıldırım (2017). "LU triangularization extreme learning machine in EEG cognitive task classification". *Neural Computation and Applications*. **31** (4): 1117–1126. [doi](/source/Doi_(identifier)):[10.1007/s00521-017-3142-1](https://doi.org/10.1007%2Fs00521-017-3142-1). [S2CID](/source/S2CID_(identifier)) [6572895](https://api.semanticscholar.org/CorpusID:6572895).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-:30_22-0)** Apdullah Yayık; Yakup Kutlu; Gökhan Altan (12 July 2019). "Regularized HessELM and Inclined Entropy Measurement forCongestive Heart Failure Prediction". [arXiv](/source/ArXiv_(identifier)):[1907.05888](https://arxiv.org/abs/1907.05888) [[cs.LG](https://arxiv.org/archive/cs.LG)].

1. **[^](#cite_ref-:31_23-0)** Altan, Gökhan Altan, Yakup Kutlu, Adnan Özhan Pekmezci and Apdullah Yayık (2018). ["Diagnosis of Chronic Obstructive Pulmonary Disease using Deep Extreme Learning Machines with LU Autoencoder Kernel"](https://www.researchgate.net/publication/325617941). *International Conference on Advanced Technologies*.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-CP_1_24-0)** ["Classic Papers: Articles That Have Stood The Test of Time"](https://www.nottingham.ac.uk/education/news/news-items/news1617/classic-papers.aspx). [University of Nottingham](/source/University_of_Nottingham). 15 June 2017. Retrieved 21 December 2023.

1. **[^](#cite_ref-CPL_1_25-0)** [""List of 10 classic AI papers from 2006""](https://scholar.google.com/citations?view_op=list_classic_articles&hl=en&by=2006&vq=eng_artificialintelligence). 2017. Retrieved 21 December 2023.

1. **[^](#cite_ref-ELM_1_26-0)** Huang, G.B.; Zhu, Q.Y.; Siew, C.K. (December 2006). ["Extreme learning machine: theory and applications"](https://www.sciencedirect.com/science/article/abs/pii/S0925231206000385). *Neurocomputing*. **70** (1–3): 489–501. [doi](/source/Doi_(identifier)):[10.1016/j.neucom.2005.12.126](https://doi.org/10.1016%2Fj.neucom.2005.12.126). [ISSN](/source/ISSN_(identifier)) [0925-2312](https://search.worldcat.org/issn/0925-2312). [S2CID](/source/S2CID_(identifier)) [116858](https://api.semanticscholar.org/CorpusID:116858). Retrieved 21 December 2023.

1. **[^](#cite_ref-FA_1_27-0)** Liang, N.Y.; Huang, G.B.; Saratchandran, P.; Sundararajan, N. (November 2006). "A fast and accurate online sequential learning algorithm for feedforward networks". *IEEE Transactions on Neural Networks*. **17** (6): 1411–1423. [Bibcode](/source/Bibcode_(identifier)):[2006ITNN...17.1411L](https://ui.adsabs.harvard.edu/abs/2006ITNN...17.1411L). [doi](/source/Doi_(identifier)):[10.1109/TNN.2006.880583](https://doi.org/10.1109%2FTNN.2006.880583). [PMID](/source/PMID_(identifier)) [17131657](https://pubmed.ncbi.nlm.nih.gov/17131657). [S2CID](/source/S2CID_(identifier)) [7028394](https://api.semanticscholar.org/CorpusID:7028394).

1. ^ [***a***](#cite_ref-:7_28-0) [***b***](#cite_ref-:7_28-1) Zhu, W.; Miao, J.; Qing, L.; Huang, G. B. (2015-07-01). "Hierarchical Extreme Learning Machine for unsupervised representation learning". *2015 International Joint Conference on Neural Networks (IJCNN)*. pp. 1–8. [doi](/source/Doi_(identifier)):[10.1109/IJCNN.2015.7280669](https://doi.org/10.1109%2FIJCNN.2015.7280669). [ISBN](/source/ISBN_(identifier)) [978-1-4799-1960-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4799-1960-4). [S2CID](/source/S2CID_(identifier)) [14222151](https://api.semanticscholar.org/CorpusID:14222151).

1. **[^](#cite_ref-29)** Neumann, Klaus; Steil, Jochen J. (2011). ["Batch intrinsic plasticity for extreme learning machines"](https://pub.uni-bielefeld.de/download/2141968/2904481). *Proc. Of International Conference on Artificial Neural Networks*: 339–346.{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-30)** Neumann, Klaus; Steil, Jochen J. (2013). ["Optimizing extreme learning machines via ridge regression and batch intrinsic plasticity"](https://pub.uni-bielefeld.de/download/2465823/2903542). *Neurocomputing*. **102**: 23–30. [doi](/source/Doi_(identifier)):[10.1016/j.neucom.2012.01.041](https://doi.org/10.1016%2Fj.neucom.2012.01.041).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-31)** Neumann, Klaus; Rolf, Matthias; Steil, Jochen J. (2013). ["Reliable integration of continuous constraints into extreme learning machines"](https://pub.uni-bielefeld.de/record/2547909). *International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems*. **21** (supp02): 35–50. [doi](/source/Doi_(identifier)):[10.1142/S021848851340014X](https://doi.org/10.1142%2FS021848851340014X). [ISSN](/source/ISSN_(identifier)) [0218-4885](https://search.worldcat.org/issn/0218-4885).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-32)** Neumann, Klaus (2014). [*Reliability*](https://pub.uni-bielefeld.de/download/2656403/2656405). University Library Bielefeld. pp. 49–74.

1. **[^](#cite_ref-33)** ["The Official Homepage on Origins of Extreme Learning Machines (ELM)"](http://elmorigin.wixsite.com/originofelm). Retrieved 15 December 2018.

1. **[^](#cite_ref-34)** Wang, Lipo P.; Wan, Chunru R. (2008). "Comments on "The Extreme Learning Machine"". *IEEE Transactions on Neural Networks*. **19** (8): 1494–5, author reply 1495–6. [Bibcode](/source/Bibcode_(identifier)):[2008ITNN...19.1494W](https://ui.adsabs.harvard.edu/abs/2008ITNN...19.1494W). [CiteSeerX](/source/CiteSeerX_(identifier)) [10.1.1.217.2330](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.217.2330). [doi](/source/Doi_(identifier)):[10.1109/TNN.2008.2002273](https://doi.org/10.1109%2FTNN.2008.2002273). [PMID](/source/PMID_(identifier)) [18701376](https://pubmed.ncbi.nlm.nih.gov/18701376).

1. **[^](#cite_ref-35)** Huang, Guang-Bin (2008). "Reply to "comments on 'the extreme learning machine' "". *IEEE Transactions on Neural Networks*. **19** (8): 1495–1496. [doi](/source/Doi_(identifier)):[10.1109/tnn.2008.2002275](https://doi.org/10.1109%2Ftnn.2008.2002275). [S2CID](/source/S2CID_(identifier)) [14720232](https://api.semanticscholar.org/CorpusID:14720232).

1. **[^](#cite_ref-36)** Guang-Bin, Huang (2015). ["WHO behind the malign and attack on ELM, GOAL of the attack and ESSENCE of ELM"](http://www.ntu.edu.sg/home/egbhuang/pdf/Huang-GB-Statement.pdf) (PDF). *www.extreme-learning-machines.org*.

1. **[^](#cite_ref-37)** Zhu, W.; Miao, J.; Qing, L. (2014-07-01). "Constrained Extreme Learning Machine: A novel highly discriminative random feedforward neural network". *2014 International Joint Conference on Neural Networks (IJCNN)*. pp. 800–807. [doi](/source/Doi_(identifier)):[10.1109/IJCNN.2014.6889761](https://doi.org/10.1109%2FIJCNN.2014.6889761). [ISBN](/source/ISBN_(identifier)) [978-1-4799-1484-5](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4799-1484-5). [S2CID](/source/S2CID_(identifier)) [5769519](https://api.semanticscholar.org/CorpusID:5769519).

1. **[^](#cite_ref-38)** Akusok, Anton; Bjork, Kaj-Mikael; Miche, Yoan; Lendasse, Amaury (2015). ["High-Performance Extreme Learning Machines: A Complete Toolbox for Big Data Applications"](https://doi.org/10.1109%2Faccess.2015.2450498). *IEEE Access*. **3**: 1011–1025. [Bibcode](/source/Bibcode_(identifier)):[2015IEEEA...3.1011A](https://ui.adsabs.harvard.edu/abs/2015IEEEA...3.1011A). [doi](/source/Doi_(identifier)):[10.1109/access.2015.2450498](https://doi.org/10.1109%2Faccess.2015.2450498).{{[cite journal](https://en.wikipedia.org/wiki/Template:Cite_journal)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

---
Adapted from the Wikipedia article [Extreme learning machine](https://en.wikipedia.org/wiki/Extreme_learning_machine) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Extreme_learning_machine?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.