Pitch detection algorithm

{{short description|Algorithm to estimate signal frequency}} {{redirect|Pitch tracking|the baseball term|Glossary of baseball (P)#pitch tracking}} A '''pitch detection algorithm''' ('''PDA''') is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or oscillating signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain, the frequency domain, or both.

PDAs are used in various contexts (e.g. phonetics, music information retrieval, speech coding, musical performance systems) and so there may be different demands placed upon the algorithm. There is as yet{{When|date=October 2018}} no single ideal PDA, so a variety of algorithms exist, most falling broadly into the classes given below.<ref>D. Gerhard. [https://www2.cs.uregina.ca/~gerhard/publications/TRdbg-Pitch.pdf Pitch Extraction and Fundamental Frequency: History and Current Techniques], technical report, Dept. of Computer Science, University of Regina, 2003.</ref>

A PDA typically estimates the period of a quasiperiodic signal, then inverts <ref>"Inverts" in this case means dividing the signal's estimated period by 1. I.e., <code>f = 1/estimatedPeriod</code></ref> that value to give the frequency.

==General approaches== One simple approach would be to measure the distance between zero crossing points of the signal (i.e. the zero-crossing rate). However, this does not work well with complicated waveforms which are composed of multiple sine waves with differing periods or noisy data. Nevertheless, there are cases in which zero-crossing can be a useful measure, e.g. in some primitive, robotic-sounding text-to-speech applications where a single-frequency source is assumed.{{Cn|date=October 2018}} The algorithm's simplicity makes it "cheap" to implement.

More sophisticated approaches compare segments of the signal with other segments offset by a trial period to find a match. AMDF (average magnitude difference function), ASMDF (Average Squared Mean Difference Function), and other similar autocorrelation algorithms work this way. These algorithms can give quite accurate results for highly periodic signals. However, they have false detection problems (often "''octave errors''"), <ref>'''Octave Error:''' Rather than detecting the true fundamental/root pitch, the currently-loudest frequency (usually one of the overtones of the sound, which includes octaves) - is accidentally detected instead.

This is noticeable in guitar-tuning apps. Strumming a single guitar string will cause the tuner to confusedly jump between different pitches as the string's powerful initial strum has a different loudness spectrum of its overtones compared to the power spectrum (loudness spectrum) of those same overtone frequencies a few seconds later. E.g., the instant the string is strummed, the tuner may think the pitch is 440Hz, but 3 seconds later, the tuner may think the pitch is 5*440Hz, an overtone of the fundamental pitch (440Hz), due to the lower-frequency less-resonant pitches diminishing in loudness much faster than resonant high pitches.</ref> can sometimes cope badly with noisy signals (depending on the implementation), and - in their basic implementations - do not deal well with polyphonic sounds (which involve multiple musical notes of different pitches).<ref>Polyphonic pitches are not required to be overtones nor octaves - they can be any set of simultaneously-occurring pitches</ref>{{Cn|date=October 2018}}

Current{{When|date=October 2018}} time-domain pitch detector algorithms tend to build upon the basic methods mentioned above, with additional refinements to bring the performance more in line with a human assessment of pitch. For example, the YIN algorithm<ref>{{cite journal | last1=de Cheveigné | first1=Alain | last2=Kawahara | first2=Hideki | title=YIN, a fundamental frequency estimator for speech and music | journal=The Journal of the Acoustical Society of America | publisher=Acoustical Society of America (ASA) | volume=111 | issue=4 | year=2002 | issn=0001-4966 | doi=10.1121/1.1458024 | pages=1917–1930| pmid=12002874 | bibcode=2002ASAJ..111.1917D | s2cid=1607434 |url=http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf}}</ref> and the MPM algorithm<ref>P. McLeod and G. Wyvill. [http://www.cs.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf A smarter way to find pitch.] In Proceedings of the International Computer Music Conference (ICMC’05), 2005.</ref> are both based upon autocorrelation.

==Frequency-domain approaches== Frequency domain, polyphonic detection is possible, usually utilizing the periodogram to convert the signal to an estimate of the frequency spectrum<ref>{{cite book |title=Statistical Digital Signal Processing and Modeling |last=Hayes |first=Monson |year=1996 |publisher=John Wiley & Sons, Inc. |isbn=0-471-59431-8 |page=393}}</ref> . This requires more processing power as the desired accuracy increases, although the well-known efficiency of the FFT, a key part of the periodogram algorithm, makes it suitably efficient for many purposes.

Popular frequency domain algorithms include: the harmonic product spectrum;<ref name="cnxpda">[http://cnx.org/content/m11714/latest/ Pitch Detection Algorithms], online resource from Connexions</ref><ref>A. Michael Noll, “Pitch Determination of Human Speech by the Harmonic Product Spectrum, the Harmonic Sum Spectrum and a Maximum Likelihood Estimate,” Proceedings of the Symposium on Computer Processing in Communications, Vol. XIX, Polytechnic Press: Brooklyn, New York, (1970), pp. 779–797.</ref> cepstral analysis<ref>A. Michael Noll, “[https://asa.scitation.org/doi/abs/10.1121/1.1910339 Cepstrum Pitch Determination],” Journal of the Acoustical Society of America, Vol. 41, No. 2, (February 1967), pp. 293–309.</ref> and maximum likelihood which attempts to match the frequency domain characteristics to pre-defined frequency maps (useful for detecting pitch of fixed tuning instruments); and the detection of peaks due to harmonic series.<ref>Mitre, Adriano; Queiroz, Marcelo; Faria, Régis. [http://www.ime.usp.br/~mqz/Mitre_AESBR2006.pdf Accurate and Efficient Fundamental Frequency Determination from Precise Partial Estimates.] Proceedings of the 4th AES Brazil Conference. 113-118, 2006.</ref>

To improve on the pitch estimate derived from the discrete Fourier spectrum, techniques such as spectral reassignment (phase based) or Grandke interpolation (magnitude based) can be used to go beyond the precision provided by the FFT bins. Another phase-based approach is offered by Brown and Puckette <ref>Brown JC and Puckette MS (1993). A high resolution fundamental frequency determination based on phase changes of the Fourier transform. J. Acoust. Soc. Am. Volume 94, Issue 2, pp. 662–667 [https://archive.today/20130414073448/http://asadl.org/jasa/resource/1/jasman/v94/i2/p662_s1?isAuthorized=no ]</ref>

==Spectral/temporal approaches== Spectral/temporal pitch detection algorithms, e.g. the YAAPT pitch tracking algorithm,<ref>{{cite journal | last1=Zahorian | first1=Stephen A. | last2=Hu | first2=Hongbing | title=A spectral/temporal method for robust fundamental frequency tracking | journal=The Journal of the Acoustical Society of America | publisher=Acoustical Society of America (ASA) | volume=123 | issue=6 | year=2008 | issn=0001-4966 | doi=10.1121/1.2916590 | pages=4559–4571| pmid=18537404 | bibcode=2008ASAJ..123.4559Z |url=http://bingweb.binghamton.edu/~hhu1/paper/Zahorian2008spectral.pdf}}</ref><ref>Stephen A. Zahorian and Hongbing Hu. [http://ws2.binghamton.edu/zahorian/yaapt.htm YAAPT Pitch Tracking MATLAB Function]</ref> are based upon a combination of time domain processing using an autocorrelation function such as normalized cross correlation, and frequency domain processing utilizing spectral information to identify the pitch. Then, among the candidates estimated from the two domains, a final pitch track can be computed using dynamic programming. The advantage of these approaches is that the tracking error in one domain can be reduced by the process in the other domain.

==Speech pitch detection== The fundamental frequency of speech can vary from 40 Hz for low-pitched voices to 600 Hz for high-pitched voices.<ref name=huang>{{cite book |last=Huang |first=Xuedong |author2=Alex Acero |author3=Hsiao-Wuen Hon |title=Spoken Language Processing |year=2001 |publisher=Prentice Hall PTR |isbn=0-13-022616-5 |page=325 }}</ref>

Autocorrelation methods need at least two pitch periods to detect pitch. This means that in order to detect a fundamental frequency of 40 Hz, at least 50 milliseconds (ms) of the speech signal must be analyzed. However, during 50 ms, speech with higher fundamental frequencies may not necessarily have the same fundamental frequency throughout the window.<ref name=huang/>

==References== {{reflist}}

==External links== * [http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf Alain de Cheveigne and Hideki Kawahara: YIN, a fundamental frequency estimator for speech and music] * [http://www.audiocontentanalysis.org/code/pitch-tracking/compute-pitch/ AudioContentAnalysis.org: Matlab code for various pitch detection algorithms]

{{DEFAULTSORT:Pitch Detection Algorithm}} Category:Audio engineering Category:Digital signal processing