The Echo Nest Musical Fingerprint (ENMFP) Code Generator
Release 3.1-2
May 12 2010


###### The Echo Nest Musical Fingerprint (ENMFP)

Read more: 

http://notes.variogr.am/post/544559482/the-echo-nest-musical-fingerprint-enmfp
http://beta.developer.echonest.com/song.html#alpha-identify-song

The ENMFP is a musical fingerprint that uses portions of the Echo Nest Analyze "machine listening" toolkit to extract musical features that allow copies of songs to be identified. This allows you to resolve an unknown song against a database of songs. The Echo Nest currently maintains the only song database compatible with ENMFP codes, however, the design of the service is such that anyone can run and maintain a mirror of the code database to avoid "lock in" to a specific vendor.

The ENMFP is robust to time and pitch modification, is resilient to drastic reductions in fidelity and can even detect uses of audio in remixes, samples and sometimes even cover songs. 

You only need to query for 20 seconds of audio (or less sometimes) to get a result.

The server component of the ENMFP is open source and will be released shortly.

The code generator component of the ENMFP is under the Echo Nest Community Music Code License, which allows free use for all applications provided that the codes are queried for and sent to an authorized server. The authorized servers mirror with each other to share all data. 

There are two ways to use the code generator: link the provided shared libraries with your application and pass it pointers to PCM data from songs, or use a precompiled binary that accepts a filename as an argument and outputs JSON suitable for querying the ENFMP servers.


###### Dynamic library -- libcodegen

The package contains binary libraries for Linux (i686 and x86_64), Mac and Windows. You can compile these libraries into your app to generate codes suitable for querying the ENMFP API based on PCM data. See test.cxx for a very simple test program that links with libcodegen. Put libcodegen somewhere that LD_LIBRARY_PATH knows about and compile the test program:

(this is an example for Linux i686, for other platforms change the library name accordingly)

g++ -o codegen.test test.cxx -lcodegen.Linux-i686

Run it with the example .raw file of 20 seconds of "Billie Jean":

./codegen.test billie.raw 10

Which will output code JSON suitable for querying the ENMFP API:

{"metadata":{"filename":"billie.raw", "samples_decoded":441208, "version":3.10}, "code_count":154, "code":"JxVlIuNwzAMQ1fxCDL133+xo1rnGqNAEcWy/ERa2aKeZmW...

You can POST or this JSON directly to alpha_identify_song, for example:

# curl -F "query=@post_string" http://beta.developer.echonest.com/api/alpha_identify_song?api_key=YOUR_KEY
{"fp_lookup_time_ms": 21, "results": [{"songID": "SOAFVGQ1280ED4E371", "match_type": "fp", "title": "Billie Jean", "artist": "Michael Jackson", 
"artistID": "ARXPPEY1187FB51DF4", "score": 63, "release": "Thriller"}]}

(you can also use GET, see the API description)

Notes about libcodegen:

Code generation takes a buffer of floating point PCM data sampled at 22050 and mono. 

Codegen * pCodegen = new Codegen(_pSamples, _NumberSamples, offset);

the "offset" parameter creates a hint to the server on where the sample is taken from in the original file. If you know this (for example, if you are automatically scanning a large library of audio, choosing specific 20 seconds chunks of audio) the server will use this information but it is not required.

After compute, you want to call pCodegen->getCodeString() to get the code string. (The code string is just a base64 encoding of a zlib compression of the original code string, which is a series of ASCII numbers.)


###### Example code generator

This package also contains an example code generator binary for all 3 platforms. This code generator has more features -- it will output ID3 tag information and uses ffmpeg to decode any type of file. If you don't need to compile libcodegen into your app you can rely on this. Note that you need to have ffmpeg installed and accessible on your path for this to work.

./codegen.Linux-x86_64 billie_jean.mp3 10 10

Will take 10 seconds of audio from 10 seconds into the file and output JSON suitable for querying:

{"metadata":{"artist":"Michael jackson", "release":"800 chansons des ann?es 80", "title":"Billie jean", "genre":"", "bitrate":192, 
"sample_rate":44100, "seconds":294, "filename":"billie_jean.mp3", "samples_decoded":220598, "given_duration":10, "start_offset":10, 
"version":3.10}, "code_count":76, "codes":"JxVlIuNwzAMQ1fxCDL133+xo1rnGqNAEcWy/ERa2aKeZmW...

You can POST this JSON directly to alpha_identify_song, for example:

# curl -F "query=@post_string" http://beta.developer.echonest.com/api/alpha_identify_song?api_key=YOUR_KEY
{"fp_lookup_time_ms": 21, "results": [{"songID": "SOAFVGQ1280ED4E371", "match_type": "fp", "title": "Billie Jean", "artist": "Michael Jackson", 
"artistID": "ARXPPEY1187FB51DF4", "score": 63, "release": "Thriller"}]

(you can also use GET, see the API description)

###### FAQ

Q: I get "Couldn't decode any samples with: ffmpeg" when running the example code generator.

A1: When running the example code generator (codegen.$(PLATFORM)) make sure ffmpeg is accessible to your path. Try running ffmpeg filename.mp3 on the file you are testing the code generator with. If it doesn't work, codegen won't work.

A2: Are you trying to decode billie.raw with the example code generator? billie.raw is only for test.cxx; it is just packed PCM, not a playable file. It is meant to illustrate how to use libcodegen.

Q: The API returned the wrong song as my top result!

A: Right now we have hardcoded show_all to be on. This ensures that all results come back. If you see this happen it is most likely because we do not have your song in our reference database. We are adding new songs from large catalogs as we speak. Also, show_all will be turned off in the near future and you will instead receive an error that there was no match.

Q: I got a warning of "no strong results from the ENMFP"

A: Right now this means of one two things. Either we don't have the song in our catalog or we have not finished de-duplication of our catalog (this is ongoing.) You can play with the &elbow parameter.

Q: The API returned a song by the same artist but the wrong song.

A: Use match_type=enmfp to ensure the API does not return results based on ID3 metadata. If we don't have the song in the catalog we will fall back on matches from metadata if match_type is set to both (which is the default.)


###### Questions, comments

Send to enmfp@googlecode.com


