I Built a Music Sheet Transcriber — Here’s How - JOYK Joy of Geek, Geek News, Link all geek

I Built a Music Sheet Transcriber — Here’s How

Translating from notes to ABC notation has never been so easy!

Nov 26 ·7min read

The fields of machine learning and deep learning have undergone enormous transformations in the past few years and have brought about many useful applications in many areas. One area of interest is Optical Music Recognition (OMR). According to Wikipedia, OMR is a field of research that investigates how to computationally read music notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. So let’s do just that!

The end product — an annotated music sheet with notes translated in ABC notation

A Quick Crash Course on Music

Before I get into the code, I’ll start off with a brief introduction on music notation. Many of us started music by going through the slow and painful process of learning how to read notes. In fact, many people convert the notes into the ABC notation by writing down the letter each note corresponds to on the music sheet.

Converting between notes and ABC notation

Having experienced this process myself, I decided it would be awesome if I could build a web app that could automatically translate the notes into the ABC notation, and annotate the ABC letters onto the sheet of music!

The Deep Learning Model

Thus began my search for appropriate deep learning architectures that could perform this task. Prior to this, I did not have much experience with optical recognition models, so I wasn’t sure if there would be any existing work done on this topic. To my surprise, I found a very wonderful research paper published by Calvo-Zaragoza et al. in 2018 under the title End-to-End Neural Optical Music Recognition of Monophonic Scores in the Applied Sciences Journal. They even curated a dataset: Printed Images of Music Staves (PrIMuS) containing more than 80,000 real scores in common western notation!

The model proposed by Calvo-Zaragoza et al. consisted of a Convolutional Neural Network (CNN) for feature extraction from the input image, followed by a bidirectional Long Short Term Memory (BLSTM) Network to work with sequences, with a single staff considered one sequence. The output of the last layer of the CNN is connected to the input of the first layer of the BLSTM, forming a Convolutional Recurrent Neural Network (CRNN). The researchers used a special loss function, the Connectionist Temporal Classification (CTC) loss function , which provides a means to optimize the CRNN parameters so that it is likely to give the correct sequence y given an input x. Here, input x represents a single staff image and y is its corresponding sequence of music symbols.

z6VBbma.png!web

Graphical scheme of the CRNN taken from the paper .

Note however that the model does not output information about the exact location of each note and only the sequence in which the notes appear. Nevertheless, this isn’t really important because although music readers might not know which note corresponds to which letter, they will be able to follow based on the number of letters outputted.

For more information about the CRNN architecture and experiment details, check out their paper here for more information.

Deploying on the Web

If you just want to get to the code, click here .

Alright, so now that we have briefly gone through the model architecture, it is now time for the implementation! The researchers have uploaded their models, implemented in Tensorflow, and code onto Github . On the basis of that, I was able to quickly set up the model and get ready to deploy it on the web. Firstly, make sure you have tensorflow v1, flask and OpenCV installed. Then, download the semantic model trained by the researchers as well as the semantic vocabulary . Also download the font Aaargh.ttf as it is needed to annotate the image with the ABC notation. (If you want to train the model yourself, head over to the tensorflow model Github repository for instructions and download the PrIMuS dataset ). The semantic vocabulary is basically a dictionary that maps the integer representation into the actual note, such as how index 348 gives you note-A2_quarter . However, because the vocabulary of the model contains way more information than needed (such as time signature, barline and so on) which do not have to be annotated since the player can simply see it on the score itself without requiring any musical background knowledge, I postprocessed the outputs of the model to only include the ABC letters with the following code:

notes=[]
 for i in array_of_notes: // array_of_notes contains the full output
 if i[0:5]==”note-”:
 if not i[6].isdigit():
 notes.append(i[5:7])
 else:
 notes.append(i[5])

Luckily, all the notes were marked with “note-” as the first 5 characters so it was easy to only grab those pertaining the ABC letters.

What Does the Web App Do?

Having obtained the array containing the relevant notes, I then used the PIL ( Python Imaging Library) library to annotate the notes onto the picture itself. This involved creating a new completely white image with the same width and 1.5 times the height of the original image to extend the original image. I then copied the original image onto the white image using the Image.paste() function. Having extended the original image with white space below, I could then print the ABC letters to go below the stave.

Original Image Annotated Image

As mentioned earlier, the model doesn’t contain information about the exact location of each note and instead only prints out a sequence of letters, so I had to do some calculations to make the ABC letters line up quite nicely below the stave. It doesn’t align perfectly, which is definitely an area for future improvement, but it isn’t a huge problem since music players would know which letter corresponds to which note by virtue of the order of appearance.

On the basis of the tensorflow predict.py code provided by the researchers, I implemented my web app using Flask. Flask is a really handy web application framework that enables everyone to port their python code onto a web app in the shortest time possible. All Flask requires is your main python code, the html files and css templates and you’re good to go!

Taking a Closer Look at Flask

The Flask python file only requires you to make some minor additions to your current machine learning python file. Firstly, you have to add the line

app = Flask(__name__)

to above your machine learning code after importing Flask to create an instance of the Flask class for the web app. Then, you add the following line to the back:

if __name__==”__main__”:
 app.run()

When python runs the file, it assigns the name "__main__" to the script. Hence, __name == "__main__" is satisfied and app.run() executes. After this, you would have to define some functions and map them to some routes, such as defining how the user will be redirected to /predict containing the annotated music sheet after they upload the music sheet. Check out the Flask documentation and my code for more information.

The Finishing Touches

Now that the python file is set up nicely, all that is left are the html files and css files. The folder structure is typically as follows:

app.py
├── templates
|   ├── html files here
├── static
|   ├── css
|        └── css files here

In this project, the website is stylized with bulma , a great open source CSS framework that I highly recommend to everyone! It is simple to use, doesn’t require much, if any, Javascript and looks really good.

In this project, there are some additional files required — the deep learning model, semantic vocabulary file, font, and any images you have that you want to test. So if you have downloaded all the files, you should organize your folders as such:

app.py
vocabulary_semantic.txt
Aaargh.ttf
├── Semantic-Model
|   ├── semantic_model.meta
|   └── semantic_model.index
|   └── semantic_model.data-00000-of-00001
├── templates
|   ├── index.html
|   └── result.html
├── static
|   ├── css
|        └── bulma.min.css

And that’s it! Once everything has been set up as above, head over to your terminal / command prompt. Change the directory to the directory with your app.py file and run python app.py . Wait for a few seconds and you should receive a message on the link you should go to in order to view the web app. Go to the URL, upload your music sheet and get the result! The annotated sheet will be saved to the same directory as your app.py file.

Do note that currently, the model is only able to work with monodic scores ( scores that consist of a single melodic line). So don’t give it too complicated scores!

Give this a shot! I hope you have fun with it, and tell me how it goes! If you have any other awesome ideas on how to use this model, do also share in the comments section!

I Built a Music Sheet Transcriber — Here’s How