Perception-Inspired Graph Convolution for Music Understanding Tasks | by Emmanouil Karystinaios | Jul, 2024 – Towards Data Science

This article discusses MusGConv, a perception-inspired graph convolution block for symbolic musical applications 10 min read

In the field of Music Information Research (MIR), the challenge of understanding and processing musical scores has continuously been introduced to new methods and approaches. Most recently many graph-based techniques have been proposed as a way to target music understanding tasks such as voice separation, cadence detection, composer classification, and Roman numeral analysis.

This blog post covers one of my recent papers in which I introduced a new graph convolutional block, called MusGConv, designed specifically for processing music score data. MusGConv takes advantage of music perceptual principles to improve the efficiency and the performance of graph convolution in Graph Neural Networks applied to music understanding tasks.

Traditional approaches in MIR often rely on audio or symbolic representations of music. While audio captures the intensity of sound waves over time, symbolic representations like MIDI files or musical scores encode discrete musical events. Symbolic representations are particularly valuable as they provide higher-level information essential for tasks such as music analysis and generation.

However, existing techniques based on symbolic music representations often borrow from computer vision (CV) or natural language processing (NLP) methodologies. For instance, representing music as a pianoroll in a matrix format and treating it similarly to an image, or, representing music as a series of tokens and treating it with sequential models or transformers. These approaches, though effective, could fall short in fully capturing the complex, multi-dimensional nature of music, which includes hierarchical note relation and intricate pitch-temporal relationships. Some recent approaches have been proposed to model the musical score as a graph and apply Graph Neural Networks to solve various tasks.

The fundamental idea of GNN-based approaches to musical scores is to model a musical score as a graph where notes are the vertices and edges are built from the temporal relations between the notes. To create a graph from a musical score we can consider four types of edges (see Figure below for a visualization of the graph on the score):

A GNN can treat the graph created from the notes and these four types of relations.

MusGConv is designed to leverage music score graphs and enhance them by incorporating principles of music perception into the graph convolution process. It focuses on two fundamental dimensions of music: pitch and rhythm, considering both their relative and absolute representations.

Absolute representations refer to features that can be attributed to each note individually such as the notes pitch or spelling, its duration or any other feature. On the other hand, relative features are computed between pairs of notes, such as the music interval between two notes, their onset difference, i.e. the time on which they occur, etc.

The importance and coexistence of the relative and absolute representations can be understood from a transpositional perspective in music. Imagine the same music content transposed. Then, the intervalic relations between notes stay the same but the pitch of each note is altered.

To fully understand the inner workings of the MusGConv convolution block it is important to first explain the principles of Message Passing.

In the context of GNNs, message passing is a process where vertices within a graph exchange information with their neighbors to update their own representations. This exchange allows each node to gather contextual information from the graph, which is then used to for predictive tasks.

The message passing process is defined by the following steps:

MusGConv alters the standard message passing process mainly by incorporating both absolute features as node features and relative musical features as edge features. This design is tailored to fit the nature of musical data.

The MusGConv convolution is defined by the following steps:

By designing the message passing mechanism in this way, MusGConv attempts to preserve the relative perceptual properties of music (such as intervals and rhythms), leading to more meaningful representations of musical data.

Should edge features are absent or deliberately not provided then MusGConv computes the edge features between two nodes as the absolute difference between their node features. The version of MusGConv with the edges features is named MusGConv(+EF) in the experiments.

To demonstrate the potential of MusGConv I discuss below the tasks and the experiments conducted in the paper. All models independent of the task are designed with the pipeline shown in the figure below. When MusGConv is employed the GNN blocks are replaced by MusGConv blocks.

I decided to apply MusGConv to four tasks: voice separation, composer classification, Roman numeral analysis, and cadence detection. Each one of these tasks presents a different taxonomy from a graph learning perspective. Voice separation is a link prediction task, composer classification is a global classification task, cadence detection is a node classification task, and Roman numeral analysis can be viewed as a subgraph classification task. Therefore we are exploring the suitability of MusGConv not only from a musical analysis perspective but through out the spectrum of graph deep learning task taxonomy.

Read the original:

Perception-Inspired Graph Convolution for Music Understanding Tasks | by Emmanouil Karystinaios | Jul, 2024 - Towards Data Science

Related Posts

Comments are closed.