Skip to main content
  • Research Article
  • Open access
  • Published:

A Discriminative Model for Polyphonic Piano Transcription

Abstract

We present a discriminative model for polyphonic piano transcription. Support vector machines trained on spectral features are used to classify frame-level note instances. The classifier outputs are temporally constrained via hidden Markov models, and the proposed system is used to transcribe both synthesized and real piano recordings. A frame-level transcription accuracy of 68% was achieved on a newly generated test set, and direct comparisons to previous approaches are provided.

References

  1. Moorer JA: On the transcription of musical sound by computer. Computer Music Journal 1977,1(4):32–38.

    Google Scholar 

  2. Rossi L, Girolami G, Leca M: Identification of polyphonic piano signals. Acustica 1997,83(6):1077–1084.

    Google Scholar 

  3. Sterian AD: Model-based segmentation of time-frequency images for musical transcription, Ph.D. thesis. University of Michigan, Ann Arbor, Mich, USA; 1999.

    Google Scholar 

  4. Dixon S: On the computer recognition of solo piano music. Proceedings of Australasian Computer Music Conference, July 2000, Brisbane, Australia 31–37.

    Google Scholar 

  5. Bello JP, Daudet L, Sandler M: Time-domain polyphonic transcription using self-generating databases. Proceedings of the 112th Convention of the Audio Engineering Society, May 2002, Munich, Germany

    Google Scholar 

  6. Klapuri A: A perceptually motivated multiple-f0 estimation method. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '05), October 2005, New Paltz, NY, USA

    Google Scholar 

  7. Ryynänen M, Klapuri A: Polyphonic music transcription using note event modeling. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '05), October 2005, New Paltz, NY, USA

    Google Scholar 

  8. Marolt M: A connectionist approach to automatic transcription of polyphonic piano music. IEEE Transactions on Multimedia 2004,6(3):439–449. 10.1109/TMM.2004.827507

    Article  Google Scholar 

  9. Godsill S, Davy M: Bayesian harmonic models for musical pitch estimation and analysis. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1769–1772.

    Google Scholar 

  10. Cemgil AT, Kappen HJ, Barber D: A generative model for music transcription. IEEE Transactions on Speech and Audio Processing 2006,14(2):679–694.

    Article  Google Scholar 

  11. Kashino K, Godsill SJ: Bayesian estimation of simultaneous musical notes based on frequency domain modelling. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 4: 305–308.

    Google Scholar 

  12. Ellis DPW, Poliner GE: Classification-based melody transcription. to appear in Machine Learning, https://doi.org/10.1007/s10994-006-8373-9 to appear in Machine Learning,

    Article  Google Scholar 

  13. Platt J: Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning. Edited by: Scholkopf B, Burges CJC, Smola AJ. MIT Press, Cambridge, Mass, USA; 1999:185–208.

    Google Scholar 

  14. Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, Calif, USA; 2000.

    Google Scholar 

  15. National Institute of Standards and Technology Spring 2004 (RT-04S) rich transcription meeting recognition evaluation plan, 2004. https://doi.org/nist.gov/speech/tests/rt/rt2004/spring/

  16. Taskar B, Guestrin C, Koller D: Max-margin Markov networks. Proceedings of Neural Information Processing Systems Conference (NIPS '03), December 2003, Vancouver, Canada

    Google Scholar 

  17. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC: Estimating the support of a high-dimensional distribution. Neural Computation 2001,13(7):1443–1471. 10.1162/089976601750264965

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graham E. Poliner.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Poliner, G.E., Ellis, D.P.W. A Discriminative Model for Polyphonic Piano Transcription. EURASIP J. Adv. Signal Process. 2007, 048317 (2006). https://doi.org/10.1155/2007/48317

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/2007/48317

Keywords