Computational Music Processing

SIS code:

NPFL144

Semester:

winter

E-credits:

Examination:

Oral

Instructor:

Jan Hajič jr.

Annotation (from SIS)

The subject introduces participants to computational music processing both in the industrial and academic areas: music representations, from audio to symbolic representations (MIDI, MusicXML) to the visual domain (sheet music), and methods from signal processing to machine learning. This subject is a good basis for music-related software projects or theses. Knowing music theory and notation is not required (the essentials will be explained), but if you never had any contact with music, we recommend reading up on terms like harmony or musical form. The subject will be taught in English.

Note on schedule: this is not a centrally scheduled subject, so scheduling will happen once we know who signed up (more or less democratically, depending on how many people are interested).

The subject's discord channel is already open -- you can join and ask for whatever you might want. (Joining the discord of course doesn't obligate you in any way to join the subject, it's there also to help you decide if you want to take it.)

Learning outcomes

A successful student of NPFL144 will:

Understand how music is represented for computational processing
Know what are the main applications of computational music processing, the applicable methods, what resources are available, and what the approximate state of the art is
Be able to search, select, and read literature from the field
Be able to combine this understanding to design systems for music processing, and independently read up on appropriate methods

and find out if they perhaps feel inspired by a topic to do a project of their own.

Literature

Müller, Meinard. Fundamentals of Music Processing Using Python and Jupyter Notebooks. Cham: Springer, 2021.
https://link.springer.com/content/pdf/10.1007/978-3-030-69808-9.pdf

Müller, Meinard, and Frank Zalkow. "libfmp: A Python package for fundamentals of music processing." Journal of Open Source Software 6, no. 63 (2021): 3326.
https://joss.theoj.org/papers/10.21105/joss.03326.pdf

Lerch, Alexander. An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications. 2nd Edition. New York: Wiley-IEEE Press, 2021.
Freely available as slides: https://github.com/alexanderlerch/ACA-Slides
and accompanying code: https://github.com/alexanderlerch/pyACA
and website: https://www.audiocontentanalysis.org/

Knees, Peter, and Markus Schedl. Music similarity and retrieval: an introduction to audio-and web-based strategies. Vol. 36. Heidelberg: Springer, 2016.
https://link.springer.com/content/pdf/10.1007/978-3-662-49722-7.pdf

A recent tutorial Deep Learning 101 in Audio-based MIR (ISMIR 2024, San Francisco) has an accompanying online book and code for google collabs:
https://geoffroypeeters.github.io/deeplearning-101-audiomir_book/front.html

As you have probably noted, some of the instructional literature on computational music processing pre-dates the boom of deep learning. However, the methods presented there are often still valid and in many application scenarios good enough, and make for good baselines before wheeling out the deep learning artillery.

Required readings

Note: please always check on the day of the lecture whether you can access the required reading (usually by using your university login), and if you can't, contact me immediately (ideally via discord, because it implies others will also likely have issues accessing the article).

Lecture 1. Small, Christopher. 1999. “Musicking — the Meanings of Performing and Listening. A Lecture.” Music Education Research 1 (1): 9–22. doi:10.1080/1461380990010102. https://www.tandfonline.com/doi/abs/10.1080/1461380990010102

Lecture 2.1 P. Smaragdis and J. C. Brown, "Non-negative matrix factorization for polyphonic music transcription," 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 2003, pp. 177-180. https://ieeexplore.ieee.org/document/1285860

Lecture 2.2 N. Bertin, R. Badeau and E. Vincent, "Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 538-549, March 2010. https://ieeexplore.ieee.org/document/5410052

Lecture 3. Foscarin, Francesco, Jan Schlüter, and Gerhard Widmer. "Beat this! Accurate beat tracking without DBN postprocessing." arXiv preprint arXiv:2407.21658 (2024). https://arxiv.org/pdf/2407.21658

Lecture 4. ...anything from the slides so far.

Lecture 5. Nakamura, Eita, Kazuyoshi Yoshii, and Haruhiro Katayose. "Performance Error Detection and Post-Processing for Fast and Accurate Symbolic Music Alignment." In ISMIR, pp. 347-353. 2017. http://sap.ist.i.kyoto-u.ac.jp/members/yoshii/papers/ismir-2017-nakamura.pdf

Lecture 6. Jeong, D., Kwon, T., Kim, Y. & Nam, J.. (2019). Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance. Proceedings of the 36th International Conference on Machine Learning 97:3060-3070 http://proceedings.mlr.press/v97/jeong19a.html

Lecture 7.1 Byrd & Simonsen: Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images. Journal of New Music Research, 2015, 44, 169-195 https://www.tandfonline.com/doi/full/10.1080/09298215.2015.1045424

Lecture 7.2 Torras, Pau, Sanket Biswas, and Alicia Fornés. "A unified representation framework for the evaluation of Optical Music Recognition systems." International Journal on Document Analysis and Recognition (IJDAR) 27, no. 3 (2024): 379-393. https://link.springer.com/article/10.1007/s10032-024-00485-8

Lecture 8. Mauch Matthias, MacCallum Robert M., Levy Mark and Leroi Armand M. 2015. The evolution of popular music: USA 1960–2010R. Soc. Open Sci.2150081 https://royalsocietypublishing.org/doi/10.1098/rsos.150081

Lecture 9. Varnum, Michael EW, Jaimie Arona Krems, Colin Morris, Alexandra Wormley, and Igor Grossmann. "Why are song lyrics becoming simpler? A time series analysis of lyrical complexity in six decades of American popular music." PloS one 16, no. 1 (2021). https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244576

Lecture 10. ...try out a music generation system of your choice & read the paper for that.

Lecture 11. Karlijn Dinnissen and Christine Bauer. 2023. Amplifying Artists’ Voices: Item Provider Perspectives on Influence and Fairness of Music Streaming Platforms. In Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization (UMAP '23). ACM, New York, NY, USA, 238–249. https://dl.acm.org/doi/pdf/10.1145/3565472.3592960

Lecture 12. McBride, John M., Sam Passmore, and Tsvi Tlusty. "Convergent evolution in a large cross-cultural database of musical scales." Plos one 18, no. 12 (2023). https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0284851

Syllabus

1. Music and its formalizations. Sound vs. tone. Elementary musical features (tempo, beat, harmony, melody). User roles (listener, distributor, musician).

2. Basics of musical audio processing: signal, sampling, convolution and the deconvolution problem. Resonance, harmonic row and timbre.

3. Audio feature extraction. Tones and pitches. Automated transcription of monophonic and polyphonic recordings, melody extraction, harmony and genre. Beat tracking, downbeat and tempo estimation. Source separation.

4. Symbolic music description. MIDI, matrix view. Music notation formats: ABC, humdrum, LilyPond, MusicXML, MEI. Selected databases of symbolic music.

5. Visual representations of music: notation. OMR worldwide and at matfyz.

6. Musical similarity in symbolic representations and in audio. Search, mulitmodality — query by humming.

7. Multimodality and performance. Score following with and without symbolic representations and its applications. Modeling music expression. Automatic adaptive accompaniment.

8. Singing and lyrics. Singing voice detection, singing voice synthesis, automatic transcription and alignment of sung text.

9. Digital music history. Databases: RISM, F-Tempo, Cantus. Examples of digital editions (Mozart, Josquin), popular music databases (Billboard, Million Songs Dataset).

10. Music generation. Algorithmicity, chance, and generative artificial intelligence. Various human-in-the-loop systems.

11. Music distribution. Recommender systems — collaborative filtering, cold start problem. Copyright: ContentID, fingerprinting. Cover song identification.

12. Music cognition. EEG and music, entrainment, music therapy: Parkinson’s disease, depression, Alzheimer disease.

13. Non-European musical cultures. Ragas, Chinese music, Maqams, Arab-Andalusian music. Folk music. Cultural evolution perspectives.

14. The world of computational music processing: industry, academia, important online resources. Worldwide/Europe/Czechia/matfyz. Improtant open-source libraries.

15. Recap for exam, reserve time, discussion space.

(The subject does not touch on digital music production and audio engineering: no live coding, no DAW work, no VST plugins, etc.)

Exam information

The exam is oral. It will probably take place in the corridor in front of my office (S424, 4th floor, corridor leading towards S1). The dates and times will appear in SIS. My preliminary plan is to give you options every workday afternoon in the weeks starting Jan 20th and Jan 27th, and Feb 10-12 as last-minute options before the semester ends.

There are two components of the exam: an in-depth discussion over literature of your choice, as advertised, and two more surface-level questions.

The literature-based part

Here is a reading list compiled from all the literature that was mentioned in the talk (and referenced in the slides).

The exam requires specific preparation to be completed and submitted 36 hours in advance of the exam slot (which will also be the cutoff for signing out of the exam; if SIS doesn't let me set 36 hours, then the cutoff will be 48 h). Make sure you read the following instructions thoroughly, as well as the supplementary information below. If something is unclear, ask in the Discord. These requirements on preparation are designed based on the course's learning outcomes, so that you're coming to the exam with a very good chance of passing.

All the materials that you prepare will be available to you during the exam.

Imagine being in the position of solving some computational music processing task. (You can of course choose some of the tasks that was mentioned in the lecture, if you do not really have anything specific in mind that you would personally be interested in.) What problem does it solve (or what opportunity does it exploit)? Who are the users, and what value should it bring to them? ("Just me" is a valid answer, if you have a specific need in your own musicking -- for instance, I started working on OMR just because having to typeset my handwritten compositions was really annoying.) What main usage scenarios do you envision? What constraints should your solution operate under? (Are there e.g. copyright or privacy or reliability/latency issues that force the solution to run on a device - phone, hearing aid, headphones, etc. - or in other computationally constrained settings? Or perhaps not, because it's running on servers of a huge music distribution company?) Finally, what do you anticipate as the main challenges of the solution? (Primarily the technical challenges of the solution, not of startup-ish things like marketing etc.) Write this short specification into the task form.
Next, pick your Article no. 1 from the reading list that is relevant to one of the challenges you identified and was not required reading. If the relevance is not immediately obvious from the title & abstract, write a sentence or two on how the article might address your challenge before reading more than the abstract.
Read Article 1. I recommend copying out the structure of the exam lit review form and making notes as you go, but don't submit it yet, because there is a question on the relationship to the other papers you will be reviewing.
Then pick two more articles on the same topic, at least one of which is not on the list. Part of the learning outcomes is to find relevant literature on your own. Google Scholar is a good starting point. You can for instance look through papers that cite your Article 1, or something that Article 1 mentions in related work or discussions. If you want a more interesting approach, try to find something that Article 1 missed. For each of these, again, write a sentence or two on how you expect them to help address the challenge you identified for Article 1, beyond what Article 1 is offering. Again, do this based on the title & abstract.
Read Article 2 and Article 3.
Complete the exam literature review forms for all three articles. As a rule of thumb, each of the long-form questions should contain about a paragraph of text (3-6 sentences). Don't hesitate to go beyond this into details, though, if you find something relevant. You will have these responses available to you for the exam itself, and there is a field for writing down any extra notes that you'd find helpful to have there.

Submit all materials (your imaginary task & challenge specification and the literature review forms) at least 36 hours before the start your selected exam block. (I have to (re-)read the articles too. If 3-4 of you sign up for the same day, that can be up to ~12 hours of work, which is 1 1/2 workdays.)

Exam prep checklist for convenience:

(ctrl+c, ctrl+v into your favorite note-taking app?)

[ ] Have I read the exam preparation instructions carefully?
[ ] Have I read the supplementary information (below) just as carefully?
[ ] Do I have all the slides?
[ ] Do I want to do this in English, or in Czech?
[ ] Have I completed and submitted the task form?
[ ] Have I selected Article 1 from the reading list? Am I sure it's not a required reading article?
[ ] Have I written a sentence or two about why I think this article is relevant to my priority challenge I identified in the task form?
[ ] Have I read Article 1 and made notes according to the literature review form?
[ ] [ ] Have I selected Article 2 and Article 3? Am I sure at least one of them is not on the reading list?
[ ] [ ] Have I written a sentence or two about why I think these two articles are relevant to my priority challenge?
[ ] Have I completed and submitted the literature review form for Article 1 (including the comparative advantages to Arts. 2 & 3)?
[ ] Have I completed and submitted the literature review form for Article 2 (dtto)?
[ ] Have I completed and submitted the literature review form for Article 3 (dtto)?

Supplementary exam information

The selected task itself is not evaluated on anything else than internal consistency. It has no bearing on the exam result if your task is just for your own fun, or if it is for curing Parkinson's disease. You can do weird things or standard things, the originality of the task itself is not evaluated at all. If you feel like you are stuck on this, choose randomly, or write on Discord and I will give you some ideas. :)

This is still just a structure for an exam, so there is no requirement to pick the most state-of-the-art, best performing thing. Imagine that you are doing this literature review in the year when the latest of your selected articles came out. (This is especially because not everyone has done deep learning, which is currently the dominant methodology for most computational music processing tasks. As an exercise, "no deep learning" can easily be one of the requirements you identify in Step 1, if you can justify it.)

If you find an obviously relevant article that the originally selected Article 1 missed and it's a long journal paper or a book chapter (beyond ~20 pages), it can count as both Article 2 and 3. But this is likely the less interesting thing to do, and might in fact be more work, because if something takes over 20 pages there's probably a lot of important details in the math.

All the written materials submitted in preparation for the exam can be written in Czech as well, if it is a problem for you to write these in English. The exam can take place in Czech or English. You will probably want to submit the required forms in your preferred language, but you can choose the language of the exam on the spot.

You are free to use whatever tools you want in your preparation, by which I am of course hinting at the various breeds of LLMs. Note that I will also ask about details in the papers, so don't expect getting away with not actually reading them yourselves. :) But of course feel free to ask the garden of GPTs & co. to point you in the right direction if that is a good way for you to get unstuck on things during preparation.

If you fail, you can re-use the topic and literature for the next attempt -- to the extent to which we agree on, to allow for adaptations to issues such as underspecified task, key challenge selected that does not in fact address the task, literature that is not as relevant as you thought, etc.

The non-literature part

Aside from discussing your selected topic in depth, I will ask 2-3 random questions about things that are in the slides. If you do well on your in-depth topic, there is almost no way this part fails you. Example questions: "What is a spectrogram and what are some of its parameters?" "What are the main approaches to optical music recognition?" "What is the PKSpell algorithm for?" "How would you synthesize data for a live score following task?" "How is automatic music transcription usually evaluated?" "What is the relationship between pitch and f0?"

Finally, help each other out. Ask about stuff from your papers on discord. If you have issues understanding something, odds are someone else has the same issue, so you're likely helping someone else. Conversely, the energy you put into answering someone else's question is rarely mis-spent even for yourself. Organize study groups. Do dry runs of your prepared stuff with each other. Especially being in the position of pretend examiner and trying to come up with tricky questions to ask is an excellent exercise.

Have fun!

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form