Generality
AudioSimilarity use the MelFrequency Cepstral Coefficients (MFCC).
Here is a list of process steps.
 Getting music sampling (WAV format) in an array
 Creating 2 seconds windows interlace with 1 second each other
 Running Hann Window
 Calculating the FFT : Fast Fourier Transform to get the frequencies
 Filtring with 58 triangular filters spaced by Mel scale
 Calculating distance of these 57 coefficients (the first is not reprensentative) between all the different musics
Tips
I'm aware that right now for a first version, some results are great and others are less. If you have any tips or ideas, tell me, because there aren't so much information on it. It's difficult to know if we are going in the right direction.
Problems
The script finds musics which the minor differences, like without one instrument, or a shorter track. Sometimes, it changes the genre radicaly, from a calm sound to an other one more rhythmic.
This problem might be due to :

The windows duration (2 seconds interlace of 1 second) too short or too long ?
(method: AudioSimilarity.AudioSimilarity.mfcc())

Doing an average on the coefficients of each windows can be loose information ?
(method: AudioSimilarity.AudioSimilarity.mfcc())What is the best ? Analysing at 25%, at 50%, then at 75% of the track and comparing 3*57 coefficients ?
What do we do about the silence in track ? Currently, the script ignore the silence in the begging of the track, but it's a pure silence (value is 0). Adding a threshold ? 
Are 57 coefficients enought ? More or less ?
(function: AudioSimilarity.bankTriangular())

Favoring lowpitched or highpitched sound for filters ?
(function: AudioSimilarity.bankTriangular())

Using more coefficients on Mel scale than on linear scale, for lowpitched ?
(function: AudioSimilarity.bankTriangular())

Finding another way to calculate the distance between two musics ?
(function: AudioSimilarity.distance())
Documentary sources
I have so much looking for information on Internet that I didn't remember all my sources, but I will try to regroup them :