MIDIZ: content based indexing and retrieving MIDI files

Cavalcanti, Maria Cláudia Reis; Machado, Marcelo Trannin; Cerqueira, Alessandro de Almeida Castro; Araujo Júnior, Nelson Sampaio; Xexéo, Geraldo

doi:10.1590/S0104-65001999000300002

Abstract

This paper presents a search engine for musical files in the Internet, based on the description of a musical passage, regardless of the key it is played in. Furthermore, the system allows some errors on the musical passage description, according to a parameter available in the interface.

multimedia; content-based information retrieval

MIDIZ

^¹ 1 MIDIZ, in Portuguese, is a pun on the carioca way of saying the phrase "ME DIZ" (mee-djiz), which means "TELL ME".

: content based indexing and retrieving MIDI files

Maria Cláudia Reis Cavalcanti ^{1, 2} , Marcelo Trannin Machado ¹, Alessandro de Almeida Castro Cerqueira ¹, Nelson Sampaio Araujo Júnior ¹ and Geraldo Xexéo ^{1, 3}

¹COPPE Sistemas - ²NCE - ³IM DCC

Universidade Federal do Rio de Janeiro

e-mail: {yoko, marcelot, aacc, nelsonjr, xexeo}@cos.ufrj.br

Abstract

This paper presents a search engine for musical files in the Internet, based on the description of a musical passage, regardless of the key it is played in. Furthermore, the system allows some errors on the musical passage description, according to a parameter available in the interface.

Keywords: multimedia, content-based information retrieval.

1 Introduction

Most people, once in a while, wonder what the song is that they are hearing in their mind. To answer this question there are few options: a lengthy search in a music library, or asking a friend with better musical memory or knowledge.

Motivated by the great success achieved by text searching sites in the Internet, and by the increasing development of techniques for searching non-textual information, we decided to create a system to search for musical files. In this article we present the results of our work: a system (MIDIZ) to store, index, search and recover musical files based on the description of a short music passage, without regard to the key and allowing for some of errors in the description of the passage.

The digital storage of music is a highly standardized area divided into two sub-areas: one that uses audio files, and other MIDI files. Audio files store a digital sampling of the music, which consumes vast amounts of memory even when using highly compressed formats, such as MP3. MIDI (Musical Instrument Digital Interface) [11], which is the standard for digital music storage, do not reproduce the sound signal itself, but actually a series of commands necessary to generate them, consuming very small amounts of memory when compared with audio standards. Eventually, a MIDI synthesizer is used to generate the sounds. This format facilitates the analysis of the music, making it simpler to extract patterns and to identify characteristics belonging to a sequence of notes. MIDI is the de facto standard in the musical instrument industry and in music software (editors and sequencers), while there are many different standards for audio files. These characteristics make MIDI files the most recommended for computer musical record, and, consequently, the most adequate target for musical indexing systems.

Figueiredo et al.[5] indexed MIDI files by analyzing the attributes and descriptive texts present in the files header. This type of search is useful when one knows some attribute of the music, such as the name of the author, but it is not useful when searching based on auditory memory. In that case it is necessary to do a "content based search", i.e., a search based on the musical record. Ghias et al.[9] proposed a musical index based on an abstraction of the musical landscape known as UDS. To obtain the UDS code from a sequence of notes, each note is substituted by the word up, down or same, which describe its position relative to the previous. The main shortcoming of their proposal was that it was not possible to use the concept of similarity between two musical UDS sequences, i.e., only an exact match could be searched.

Although it has not resulted from Beeferman [1], our work is strongly influenced by his proposal, which is more forgiving at user mistakes than the previous authors. In his work, a sliding window in the melody defines musical fragments, creating vectors that are stored in a multidimensional data structure. However, that work only considered the pitch of the notes and not to their duration.

The next section describes the process used to analyze the MIDI files. The third section presents the indexing structures that were studied. Sections 4 and 5 describe the MIDIZ system, architecture, and implementation. A brief discussion on the performance of the system is presented in section 6. Finally, the last section concludes the article, including some possible improvements for the system.

2 Identifying Points in the MIDI Files

Applications for indexing and recovering music ought to allow for non-exact queries, forgiving user mistakes in the representation of a passage. Also, the key of a query can be different than the key used to record and stored the music in a repository. Music can be transposed at will, i.e., all notes of a composition can be moved equally up or down a number of tones without changing the music. Another important requirement is the ability to recover a musical file based on a sequence similar to any sequence inside it.

Dirst and Weigend [3] proposed the following solution to allow for the automatic transposition: use the difference between the pitches of the notes instead of identifying the music as a sequence of pitches. In addition, to achieve a greater flexibility it is also necessary to allow the user to make small errors when querying, such as to be wrong on the pitch of some notes.

The indexing scheme proposed in [1], adapted for MIDIZ, tries to fulfill all these requirements using a wavelet transform and a sliding window in the melody.

The window defines a note sequence of size 2^k and slides through the entire song, moving note by note. Each fragment of size 2^k is converted into a vector of 2^k-1 values, following a transform similar to the Haar transform used in [4]. By using the sliding window we can index every music fragment of size 2^k, of indexing every possible sequence of the music. Therefore, if we fix k to 3, we will have a window of size 8. Sliding the window in a song with T notes, we identify have T-2^k+1.

The transform proposed by Beeferman [1] uses the following algorithm:

1. The first note receives the value 1. Then, the values of the following notes are determined as the number of notes in the natural scale separating the current note pitch from the first, plus 1. Consequently, if the pitch of a note is equal to the first one, it receives the value 1; if it is one note higher, it receives the value two; and so on.

2. The values of the notes in the sequence are added, two by two, generating a second and a third level of values. The adding of the sequence (n₁,n₂,n₃,n₄,n₅,n₆,n₇,n₈) generates two new sequences, (n₉,n₁₀,n₁₁,n₁₂) and (n₁₃,n₁₄), where:

3. These values are used to calculate the final vector, with 2^k-1 coordinates, (c₁,c₂,c₃,c₄,c₅,c₆,c₇), where:

Let us use for example the following 8 notes of the song "Happy Birthday": C C D C F D C C. After applying the first step of the algorithm we have the sequence (1,1,2,1,4,3,1,1). After the second step, we have the sequences (2,3,7,2) and (5,9). Finally, after the third step we obtain the vector (-4,5, -1,0,1,1,0), corresponding to a coordinate in a 7-dimensional space. Therefore, each MIDI file in a musical repository will correspond to many points in a space with 7 dimensions.

In our system, we changed the preceding algorithm to take into account all the chromatic scale instead of being restricted to the natural scale, achieving higher precision in the resulting points, both in indexing and in querying. Therefore, while the previous sequence generated the vector (1,1,2,1,4,3,1,1) in the first step of the original algorithm, our algorithm generates (1,1,3,1,6,5,1,1), with the final vector (-7,9, -2,0,1,2,0).

To submit a query the user should present a sequence of notes ofwith the exact length of the sliding window. This sequence is transformed to a point, using the last transformation, which is then compared with the points in the repository. At this step, the system should offer some flexibility, by returning to the user not only the musics file containing that point, butl also all the musicfiles containing similar points. As a consequence, one should be able to calculate a measure of distance between two points and select those points in the repository that distateare near the submitted one. In our case tWe selected the Euclidean distance as the measure was selectedmeasure. In MIDIZ, the user is able to select the maximum range for a point to be considered as similar, i.e., the degree of error allowed in the description of the musical fragment.

Table 1 shows an example of the Euclidean distance between two points. From a correct musical fragment, we generate an incorrect, but similar, fragment, missing the fourth and fifth notes by one semitone. Both transformed points are then obtained and used to calculate the Euclidean distance.

Thumbnail

Table 1:

Certain mistakes have more influence on of the Euclidean distance. In these cases, to obtain the sought after music, the user will have to increase the magnitude of the acceptable error, allowing for farther points to be recovered.

To complement our work, we also decided to use the same scheme to analyze the song based on the duration of the notes. In MIDI files, a number defined in the header of the file identifies the time unit

Thumbnail

Table 2:

To conclude, the indexing of a MIDI file generates two sets of points, one from the pitch sequence, the other from the duration sequence. Both should then be stored in a data structure that is suitable for spatial searches. The next section discusses some of the options available, and the adopted solution.

3 Indexing Structures

Our fundamental requirement for a searching and retrievaling musical information is that the result should be formed by a list of songs containing fragments similar to the one described by the user. In our scheme, this means providing a list of musical files containing points near the point described, and not only those containing equal pointsthe exact point. With this in mind, it is easy to realize that in a baserepository containing millions of points, it is impossible to naively attempt calculate the distance of every point to a given point. We need to deal with this problem as a spatial search problem. The solution is to use multi-dimensional access methods [7].

The methods known as bucket methods [14] fall in this description. They were developed to collect points in sets, called buckets, corresponding to units of disk storage (i.e. pages). As a bucket get filled, its structure is automatically reorganized, creating new buckets and redistributing the points among them. An additional directory structure is used to organize the access to these buckets, making it possible to find which bucket register belongs to. Grid files [l2] and R-trees [10] are examples of bucket methods.

R-trees were designed aimaing to represent points grouped in regions that can sobreposeoverlap each other, i.e., non-disjoint. As a k-dimensional structure, various k-dimensional regions called hyper-rectangles form an R-tree. In each level of an R-tree there is a set of hyper-rectangles, sobreposedoverlapped or not, that can contain other hyper-rectangles belonging to the next level of the tree. Figure 2 (a) shows a bi-dimensional R-Tree, where the larger rectangles belong to the first level and the smaller rectangles belong to the second level.

Figure 1:

Applying the original transform to a fragment of "Happy Birthday"

Figure 2:

Initially, we considered to useusing R-Trees in the following manner. Each song, after being analyzed, generates a set of points.We could understand the bounding box for each set of points as a hyper-rectangle representing the music. In that case, when searching for a point, we could navigate the R-Tree structure and generate, as a result, every hyper-rectangle containing that point.

However, nothing garanteesguarantees that those hyper-rectangles generatedare representative of the music. They hyper-rectangles could be generated defined such that there is no differentiation between songs in the hyperspace (Figure 2 (b)). ThereforeIf this happens, we would end, where searching for a point, we would achieve up with aa result with high recall but a small precision.

Grid files [12], as originally proposed, leavesft many open questions, as how to reorganize the bucket access directory, while maintaining linear dependence between the number of stored registers and the required space. Among the works that best solved this problem areis the BANG files (BAlanced and Neste Grid) [8], thatwhich uses a partitioning strategy that avoids empty cells, keeping only references to non-empty buckets in the directory. In this way, BANG files are more adequate in cases where the distribution is not uniform. Figure 3, shows the difference in behavior between Grid and BANG files when this situation occurs. Noticee that, although the partitioning follows the same algorithm, however the BANG files maintain a minimal number of cells, adapting to the concentration of points.

Figure 3:

As in this work we dont know in advance the distribution of points, we opted for using BANG files, also known as BD-Trees [14], with the corresponding algorithms described by Dandamudi [2].

The localization of a point in a BD-Tree is given by a binary sequence describing a region in a multidimensional space. This region is generated from the simulation of the partitioning of the space. Given a point, it is analyzed from left to right, in a cyclic fashion related to its coordinates. For each coordinate of the point, the available space is partitioned in the middle point. If the point is inferior to the middle point, a "0" is added to the right of the sequence, otherwise a "1" is added. As a result, the sequence only describes on which side of each successive dividing line the point is. This sequence is called DZE (Discriminator Zone Expression), and represents all the sets of points belonging to that region.

A DZE labels each node in a BD-Tree. The root has no label. Each node represents a subset of the space. The space is recursively specified as in the following sentences. The root represents the complete space. If a node x represents a subset of the space s and it is labeled by the DZE d, then the left child of x represents the intersection of s and d, while the right child of x represents the intersection of s with the complement of d (Figure 4).

Figure 4:

We can identify two types of nodes in a BD-Tree: internal and external. Internal nodes are non-leave nodes storing a DZE, and external nodes are the leaves of the tree, corresponding to the buckets storing the points. BD-Trees are self-organizing, hence they can be built by a series of insertions.

4 MIDIZ System

MIDIZ comprises two modules: Indexing and Querying. The Indexing Module performs the lexical analysis of a MIDI file set and generates a tree structure (BD-tree) and a reference list, used for further queries. In the Querying Module, the user is allowed to pose pitch and duration queries. Each query starts a search process through the files generated by the indexing module, where the system finds the list of MIDI files satisfying that query.

The next sections describe in detail the system modules. However, for a better understanding, first we introduce the MIDIZ data schema (Figure 5), which includes the indexing structure (BD-tree), using UML notation [6]. Basically, these are the classes of MIDIZ schema: BDtree, Node, Point, Reference and Song. The BDtree class is composed of a set of Nodes. Each Node can be classified as internal or external. An internal Node is connected to two other nodes, one to the right, called OUT, and one to the left, called IN. An external Node contains a set of Points. Each Point can be Referenced more than once in a Song, and by different Songs. Each Reference object contains the song position, track and channel in which the point appears.

Figure 5:

Figure 6:

Indexing Module

4.1 Indexing Module

Two other modules comprise the Indexing Module, as shown in Figure 5: Lexical Analysis and Tree Creation.

The Lexical Analysis module is responsible for the reading of one or more user-selected MIDI files. These files names are recorded in the MIDIdocs file. Each MIDI file is analyzed, extracting pitches and durations, while discarding other MIDI format information. Each 8-pitch /8-duration sequence is converted into two 7-dimensional points, as described in the next subsection.

The points generated by the Lexical Analysis module are the input to the Tree Creation module. Each point is inserted into the respective tree structure (pitches or duration tree). The insertion of a point starts a tree traversal, and searches for the external node whose position corresponds to the point coordinates. Since each external node has fixed capacity, if it become full, a re-structuring recursive routine is called, creating new nodes, sons of the overflowed node, which becomes an internal node.

After analyzing all the MIDI files, the indexing structures (trees) are complete. The generated files are the following: Nodes, InternalNodes, Points and References. The Node file contains a list of the internal and external nodes of the tree structure, pre-sorted in pre-order. The InternalNodes file contains the internal nodes DZEs. The Points file contains the point list for each external node. The References file contains, for each point, the song numbers in which it appears, and also each points track, channel numbers, and relative position.

4.2 Querying Module

At first, this module loads the indexing files. The query server remains active, waiting for a query request. When it receives a query sequence, the module proceeds with the sequence lexical analysis, identifying the pitches and duration sequences. Then, it applies the wavelet transform over both sequences, generating the correspondent points, as described in Section 2. Based on the similarity degree chosen by the user, it calculates the valid subspace limits for each point (pitch and duration). The upper limit is obtained by adding the similarity degree to the point coordinates, and the lower limit is obtained by subtracting the similarity degree from the point coordinates. These two limiting points are the basis for the tree traversal. After traversing both pitches and duration trees, the result is a list of valid points, ordered by the distance between them and the submitted point. These lists must then be merged into a single ordered list. The final list is obtained by applying a weighting formula to each pair of points, i.e., the pair formed by each pitch point and its correspondent duration point. This formula uses coefficients based on the user-defined weights for the pitches and duration sequences. Figure 7 describes the formula to obtain the similarity coefficient for each pair of points.

Figure 7:

For each point found, a list of songs is retrieved from the references files. The MIDIdocs file is then searched for MIDI file names. The resulting list also informs, by song, the track and channel numbers and the relative point position in that song. This list is then returned to the user in HTML format. Figure 8 shows the Querying Module architecture.

Figure 8:

The Querying Applet provides the user with a musical notes bar, with a choice of whole note, half note, quarter note, etc. (Figure 9), which the user can select and paste over a grid. This grid is not a musical stave, it does not have a clef, but it allows the user to establish the relative position. Furthermore, the user may choose a query similarity degree, in order to allow more flexible queries. If the user is certain about his query, the similarity degree should be around zero. On the other hand, if the user has doubts, the similarity degree should be increased in order to state a more flexible query.

Figure 9:

The CGI works as an interface between the applet and the querying server. When the CGI receives the query results (already in HTML format) from the query server, it passes these results forward to the web browser that requested the query. Figure 10 shows an example of a query result.

Figure 10:

5 MIDIZ Implementing Issues

MIDIZ was implemented in Unix environment, using C++ and Java 1.1 programming languages. The Indexing Module and the Querying Server were implemented in C++, in order to guarantee good performance. The data schema (BD-Tree) was also implemented in C++, so that it could be used both by the Indexing Module and by the Querying Server. The Querying interface is a Java Applet, which communicates with a CGI implemented in C++.

Some implementation issues related to our analysis process are worth mention: main melody identification, notes overlay, and loose notes duration. The origin of these issues is the fact that most of our MIDI files were generated directly from MIDI devices. MIDI files generated by musical editing software tend to have a much higher quality level, such as ENCOREã MIDI files. Next, we discuss each of these issues and the related adopted solution.

A MIDI file may contain many tracks. Each track contains a note sequence. A MIDIZ query begins with a simple 8-note sequence, which refers to the main melody. Therefore, it is important to identify, for each MIDI file, which track corresponds to the main melody. However, the MIDI standard does not offer the means necessary to do it automatically. This left us with a tough task. In order to do it, we decided to analyze all the tracks, but not the standard percussion track (track 10). Even though this is a simple approach, it uses a lot of computational resources, such as memory, disk and processing time, decreasing the systems performance.

The note overlay happens when two or more notes sound simultaneously. This is a very frequent event in MIDI files. For indexing purposes, we have chosen to extract just a monophonic sequence of each song. Therefore, assuming that the highest voice usually corresponds to the main melody, whenever a note overlay happens, we choose only the highest note in it.

Another very typical problem in analyzing MIDI files is interpreting the duration of a note. Once extracted the time unit that is the basis of the MIDI from the header, it is possible to identify the notes duration. However, the analyzed MIDI files contain many imprecise durations, i. e., the values found do not correspond exactly to the values that represent the musical time (whole note, half note, quarter note, etc.), complicating the duration identification process. In order to provide a solution for this problem, we have used a rounding off strategy, using note dotting. This strategy, however, can generate some distortions, especially for short duration notes. Therefore, we have chose to truncate imprecise duration values that were shorter than the time unit, and to round off imprecise duration values that were longer, dotting them when needed.

6 Results

Table 3 shows the numbers we have gotten from the indexing of a MIDI base of 100 songs. One should note that the number of points identified is much greater than the number of points actually stored. This is due to the number of identical points found in the song analysis. This difference in number is greater in the analysis of durations.

Thumbnail

Table 3:

Concerning the Querying Module, we have done some simulations in order to evaluate the use of the Euclidean Distance as an error coefficient in a note sequence query. For each simulation we have tested some possible common errors. These errors are described in the Table 4. The errors introduced in a pitch sequence query correspond to a uniform or increasing mistake of 3 semitones maximum. The uniform mistake means that each note have the same error difference. The increasing mistake means that the error difference is increasing throughout the sequence. The most serious errors considered are the G, H and J types. The errors introduced in a duration sequence query correspond to lengthening or shortening of some durations.

Thumbnail

Table 4:

Figure 11 shows distance increasing with the severity of the error in the pitch sequence. Note that most of the error types with 1 semitone difference correspond to an Euclidean Distance less than 4 units. For bigger mistakes (2 or 3 semitones), curves become steeper, especially in the region where the most serious errors appear (G, H and J).

Figure 11:

The graphic captured in Figure 12 shows the distance increasing with the severity of the error for the duration sequence. Note that error types in which the duration is reduced by half correspond to an Euclidean Distance shorter than 20 units. The error types where the duration is reduced to one quarter of its correct value correspond to a much longer Euclidean Distance.

Figure 12:

Let d₁ = | 2x- x |, d₂= | x/2 x | and d₃= | x/4 x |, so d₂ < d₃ < d₁. Therefore, mathematically speaking, we may say that reduced duration errors may be considered less serious than increased duration errors. According to this reasoning, the Euclidean Distance seems to be coherent because the error distances in which note duration was doubled are much greater than the ones generated by reduced duration.

7 Conclusion

In this work we present a prototype for content-based indexing and retrieving of MIDI files. We aimed at providing query flexibility by using as a "sliding window" indexing unit. However, by using the identifying features of the song become blurred, because as we consider all the possible fragments of it, we may be considering meaningless fragments. Therefore, queries posed to MIDIZ usually result in high recall and low precision. We tried to balance this by ranking the output according to the result of the weighting formula.

Better song characterization is one possible improvements for the next version of MIDIZ. This can be achieved if the indexing process includes some previous point distribution analysis. Frequent occurring points, i.e., points found in many songs, should make of a sort of stoplist. As in textual indexing applications, where frequent words like prepositions and articles are discarded (stopwords), frequent points in our MIDI analysis should also be discarded (stoppoints). In order to implement this idea, it should be verified whether the point distribution curve is adequate for stoplist generation, like the Zipf distribution [13]. Once it is approved, a boundary interval should be determined over the curve, and a feedback mechanism should be provided so the users can be warned when the query leads to a point which is not considered valid.

We identify some ways to develop a future work. First of all, finding new ways of representing and indexing a MIDI file, taking care to maintain the flexibility provided by the use of wavelets, while increasing the search result precision. One possible solution would be to do a deeper song analysis to identify standard patterns (e.g., chorus) and determinate the main melody fragment.

Next, an important improvement would be the MIDI file pre-processing, especially for those generated by direct recording, which usually do not give good indexing results.

Interface should be another interesting improvement in systems like MIDIZ. The users would be allowed to query by simply singing a "fa, la, la". MIDIZ would be responsible for translating the "fa, la, la" into MIDI format before proceeding with the query process.

Finally, we suggest a performance comparison analysis among indexing structures. MIDIZ and QPD [1] systems use different indexing structures, BD-trees and R-trees, respectively. A reference MIDI file base should be used in order to provide better benchmarking.

Acknowledgements

This work was partially supported by CAPES and CNPq scholarships. The authors would like to thank Márcio de Souza Dias, André Braga and Ana Miccolis from COPPE/UFRJ, and Maria Luiza Campos and José Antônio Borges from IM-NCE/UFRJ for their support throughout the prototype development.

References

[1] D. Beeferman, D. Greentree, P. Larkin. QPD: Query by Pitch Dynamics, 15-829 Course Project, Carnegie Mellon University, 1997. http://www.link.cs.cmu.edu/qpd/doc/note.htm , August, 1999.

[2] S. Dandamudi, P. Sorenson. Algorithms for BD Trees. Software Practice and Experience, 16(12): 1077-1096, 1996

[3] M. Dirst, A. Weigend - On Completing J. S. Bach's Last Fugue. In A. Weigend & N. A. Gershenfeld (ed.), "Time Series Prediction: Forecasting the Future and Understanding the Past", pp. 151-172, Addison Wesley, 1994

[4] C. Faloutsos, M. Ranganathan, Y. Manolopoulos. Fast Subsequence Matching in Time-Series Databases. In Proceedings of SIGMOD'94, pp. 419-429, Minneapolis, USA, 1994

[5] M. Figueiredo, C. Traina Jr, A. Traina, Representação e Recuperação Baseada em Conteúdo de Partituras Musicais em Bases de Dados Orientadas a Objetos. In Proceedings of IV Simpósio Brasileiro de Computação e Música, pp. 125-132, Brasília, DF, 1997

[6] M. Fowler, K. Scott. UML Distilled: Applying the Standard Object Modeling Language. Addison-Wesley, 1997

[7] V. Gaede, O. Günther. Multidimensional Access Methods. ACM Computing Surveys,30(2), pp. 170-231, June 1998

[8] M. Freeston. "The BANG file: a new kind of grid file", In Proceedings of SIGMOD87, pp.260-277, San Francisco, USA, 1987

[9] A. Ghias, J. Logan, D. Chamberlin, B. Smith. Query by Humming Musical Information Retrieval in an Audio Database. In Proceedings of ACM Multimedia 95, pp.231-236, San Francisco, California, 1995

[10] A.Guttman. R-trees: A dynamic index structure for spatial searching .In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 4757, Boston, Massachussetts, USA, 1984.

[11] J. Heckroth. Tutorial on MIDI and Music Synthesis. The MIDI Manufacturers Association, 1995, http://www.midi.org/about-midi/tutorial/tutor.htm , December, 1999.

[12] J. Nievergelt, P. Widmayer. Spatial Data Structures Concepts and Design Choices. In Algorithms Foundations of GIS, eds. Van Kreveld, Nievergelt, Roos, Widmayer, Chapter 6, pp 153-198, Springer Verlag, 1997

[13] G. Salton. Automatic Text Processing. Addison-Wesley, Reading, Massachusetts, 1995.

[14] H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Massachusetts, 1990

² The note that corresponds to one beat in the music

[1] D. Beeferman, D. Greentree, P. Larkin. QPD: Query by Pitch Dynamics, 15-829 Course Project, Carnegie Mellon University, 1997. http://www.link.cs.cmu.edu/qpd/doc/note.htm , August, 1999.
[3] M. Dirst, A. Weigend - On Completing J. S. Bach's Last Fugue. In A. Weigend & N. A. Gershenfeld (ed.), "Time Series Prediction: Forecasting the Future and Understanding the Past", pp. 151-172, Addison Wesley, 1994
[5] M. Figueiredo, C. Traina Jr, A. Traina, Representação e Recuperação Baseada em Conteúdo de Partituras Musicais em Bases de Dados Orientadas a Objetos. In Proceedings of IV Simpósio Brasileiro de Computação e Música, pp. 125-132, Brasília, DF, 1997
[7] V. Gaede, O. Günther. Multidimensional Access Methods. ACM Computing Surveys,30(2), pp. 170-231, June 1998
[9] A. Ghias, J. Logan, D. Chamberlin, B. Smith. Query by Humming Musical Information Retrieval in an Audio Database. In Proceedings of ACM Multimedia 95, pp.231-236, San Francisco, California, 1995
[11] J. Heckroth. Tutorial on MIDI and Music Synthesis. The MIDI Manufacturers Association, 1995, http://www.midi.org/about-midi/tutorial/tutor.htm , December, 1999.
[13] G. Salton. Automatic Text Processing. Addison-Wesley, Reading, Massachusetts, 1995.

2

1 MIDIZ, in Portuguese, is a pun on the carioca way of saying the phrase "ME DIZ" (mee-djiz), which means "TELL ME". . Consider, for example, that the time unit has a resolution of

240, and that the music uses a

4/4 time signature.

shows how the values found in the MIDI file are transformed to values that identify the sequence of durations. To obtain a point corresponding to a sequence of durations we apply the Haar transform over that sequence, in the same way it was done for the pitch sequence. Therefore, if we use

k=3, the sequence

(8,4,4,4,8,8,8,16) generates the point

(-20, -8, -4-8,0,0, -4).

¹

MIDIZ, in Portuguese, is a pun on the carioca way of saying the phrase "ME DIZ" (mee-djiz), which means "TELL ME".

Publication Dates

Publication in this collection
31 July 2000
Date of issue
1999

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] [1] D. Beeferman, D. Greentree, P. Larkin. QPD: Query by Pitch Dynamics, 15-829 Course Project, Carnegie Mellon University, 1997. http://www.link.cs.cmu.edu/qpd/doc/note.htm , August, 1999.

[3] [3] M. Dirst, A. Weigend - On Completing J. S. Bach's Last Fugue. In A. Weigend & N. A. Gershenfeld (ed.), "Time Series Prediction: Forecasting the Future and Understanding the Past", pp. 151-172, Addison Wesley, 1994

[5] [5] M. Figueiredo, C. Traina Jr, A. Traina, Representação e Recuperação Baseada em Conteúdo de Partituras Musicais em Bases de Dados Orientadas a Objetos. In Proceedings of IV Simpósio Brasileiro de Computação e Música, pp. 125-132, Brasília, DF, 1997

[7] [7] V. Gaede, O. Günther. Multidimensional Access Methods. ACM Computing Surveys,30(2), pp. 170-231, June 1998

[9] [9] A. Ghias, J. Logan, D. Chamberlin, B. Smith. Query by Humming Musical Information Retrieval in an Audio Database. In Proceedings of ACM Multimedia 95, pp.231-236, San Francisco, California, 1995

[11] [11] J. Heckroth. Tutorial on MIDI and Music Synthesis. The MIDI Manufacturers Association, 1995, http://www.midi.org/about-midi/tutorial/tutor.htm , December, 1999.

[13] [13] G. Salton. Automatic Text Processing. Addison-Wesley, Reading, Massachusetts, 1995.