Relevance vector machine

{{Short description|Machine learning technique}} {{Machine learning|Supervised learning}} In mathematics, a '''Relevance Vector Machine (RVM)''' is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification.<ref>{{cite journal | last=Tipping | first=Michael E. |title=Sparse Bayesian Learning and the Relevance Vector Machine |year=2001 |journal = Journal of Machine Learning Research |volume=1 |pages=211–244 |url=http://jmlr.csail.mit.edu/papers/v1/tipping01a.html }}</ref> A greedy optimisation procedure and thus fast version were subsequently developed.<ref>{{cite journal |last1=Tipping |first1=Michael |last2=Faul |first2=Anita |title=Fast Marginal Likelihood Maximisation for Sparse Bayesian Models |journal=Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics |date=2003 |pages=276–283 |url=https://proceedings.mlr.press/r4/tipping03a.html |access-date=21 November 2024}}</ref><ref>{{cite journal |last1=Faul |first1=Anita |last2=Tipping |first2=Michael |title=Analysis of Sparse Bayesian Learning |journal=Advances in Neural Information Processing Systems |date=2001 |url=https://proceedings.neurips.cc/paper_files/paper/2001/file/02b1be0d48924c327124732726097157-Paper.pdf |access-date=21 November 2024}}</ref> The RVM has an identical functional form to the support vector machine, but provides probabilistic classification.

It is actually equivalent to a Gaussian process model with covariance function: :<math>k(\mathbf{x},\mathbf{x'}) = \sum_{j=1}^N \frac{1}{\alpha_j} \varphi(\mathbf{x},\mathbf{x}_j)\varphi(\mathbf{x}',\mathbf{x}_j) </math> where <math>\varphi</math> is the kernel function (usually Gaussian), <math>\alpha_j</math> are the variances of the prior on the weight vector <math>w \sim N(0,\alpha^{-1}I)</math>, and <math>\mathbf{x}_1,\ldots,\mathbf{x}_N</math> are the input vectors of the training set.<ref>{{cite thesis |type=Ph.D. |last=Candela |first=Joaquin Quiñonero |date=2004 |title=Learning with Uncertainty - Gaussian Processes and Relevance Vector Machines |publisher=Technical University of Denmark |url=http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3237/pdf/imm3237.pdf |chapter=Sparse Probabilistic Linear Models and the RVM |access-date=April 22, 2016 }}</ref>

Compared to that of support vector machines (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an expectation maximization (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard sequential minimal optimization (SMO)-based algorithms employed by SVMs, which are guaranteed to find a global optimum (of the convex problem).

The relevance vector machine was patented in the United States by Microsoft (patent expired September 4, 2019).<ref>{{cite patent |country = US |number = 6633857 |title = Relevance vector machine |inventor = Michael E. Tipping }}</ref>

== See also == * Kernel trick * Platt scaling: turns an SVM into a probability model

== References == {{reflist}}

Category:Classification algorithms Category:Kernel methods for machine learning Category:Nonparametric Bayesian statistics