A Discriminative Model for Polyphonic Piano Transcription

Poliner, Graham E.; Ellis, Daniel P. W.

doi:10.1155/2007/48317

Research Article
Open access
Published: 01 December 2006

A Discriminative Model for Polyphonic Piano Transcription

Graham E. Poliner¹ &
Daniel P. W. Ellis¹

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 048317 (2006) Cite this article

2808 Accesses
90 Citations
Metrics details

Abstract

We present a discriminative model for polyphonic piano transcription. Support vector machines trained on spectral features are used to classify frame-level note instances. The classifier outputs are temporally constrained via hidden Markov models, and the proposed system is used to transcribe both synthesized and real piano recordings. A frame-level transcription accuracy of 68% was achieved on a newly generated test set, and direct comparisons to previous approaches are provided.

References

Moorer JA: On the transcription of musical sound by computer. Computer Music Journal 1977,1(4):32–38.
Google Scholar
Rossi L, Girolami G, Leca M: Identification of polyphonic piano signals. Acustica 1997,83(6):1077–1084.
Google Scholar
Sterian AD: Model-based segmentation of time-frequency images for musical transcription, Ph.D. thesis. University of Michigan, Ann Arbor, Mich, USA; 1999.
Google Scholar
Dixon S: On the computer recognition of solo piano music. Proceedings of Australasian Computer Music Conference, July 2000, Brisbane, Australia 31–37.
Google Scholar
Bello JP, Daudet L, Sandler M: Time-domain polyphonic transcription using self-generating databases. Proceedings of the 112th Convention of the Audio Engineering Society, May 2002, Munich, Germany
Google Scholar
Klapuri A: A perceptually motivated multiple-f0 estimation method. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '05), October 2005, New Paltz, NY, USA
Google Scholar
Ryynänen M, Klapuri A: Polyphonic music transcription using note event modeling. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '05), October 2005, New Paltz, NY, USA
Google Scholar
Marolt M: A connectionist approach to automatic transcription of polyphonic piano music. IEEE Transactions on Multimedia 2004,6(3):439–449. 10.1109/TMM.2004.827507
Article Google Scholar
Godsill S, Davy M: Bayesian harmonic models for musical pitch estimation and analysis. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1769–1772.
Google Scholar
Cemgil AT, Kappen HJ, Barber D: A generative model for music transcription. IEEE Transactions on Speech and Audio Processing 2006,14(2):679–694.
Article Google Scholar
Kashino K, Godsill SJ: Bayesian estimation of simultaneous musical notes based on frequency domain modelling. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 4: 305–308.
Google Scholar
Ellis DPW, Poliner GE: Classification-based melody transcription. to appear in Machine Learning, https://doi.org/10.1007/s10994-006-8373-9 to appear in Machine Learning,
Article Google Scholar
Platt J: Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning. Edited by: Scholkopf B, Burges CJC, Smola AJ. MIT Press, Cambridge, Mass, USA; 1999:185–208.
Google Scholar
Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, Calif, USA; 2000.
Google Scholar
National Institute of Standards and Technology Spring 2004 (RT-04S) rich transcription meeting recognition evaluation plan, 2004. https://doi.org/nist.gov/speech/tests/rt/rt2004/spring/
Taskar B, Guestrin C, Koller D: Max-margin Markov networks. Proceedings of Neural Information Processing Systems Conference (NIPS '03), December 2003, Vancouver, Canada
Google Scholar
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC: Estimating the support of a high-dimensional distribution. Neural Computation 2001,13(7):1443–1471. 10.1162/089976601750264965
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Recognition and Organization of Speech and Audio, Department of Electrical Engineering, Columbia University, New York, NY, 10027, USA
Graham E. Poliner & Daniel P. W. Ellis

Authors

Graham E. Poliner
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. W. Ellis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Graham E. Poliner.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Poliner, G.E., Ellis, D.P.W. A Discriminative Model for Polyphonic Piano Transcription. EURASIP J. Adv. Signal Process. 2007, 048317 (2006). https://doi.org/10.1155/2007/48317

Download citation

Received: 06 December 2005
Revised: 17 June 2006
Accepted: 29 June 2006
Published: 01 December 2006
DOI: https://doi.org/10.1155/2007/48317

A Discriminative Model for Polyphonic Piano Transcription

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords