Using audio fingerprinting to find game SFX files

This post continues my journey to extract the Read Dead Redemption 2 Blackjack dealer and player sounds from the game files. Yesterday I’ve written down how I tried to use speech to text with a pre-trained/machine learned set to get a searchable text. That did not go so well. Please read the other post first, because I will pick up some references here. Otherwise you might not get what’s going on.

After speech to text failed I tried searching for another solution. It looks like another approach is called “audio fingerprinting”. One popular service you might know is Shazam. When you are listening to a song and don’t know the name of it, you simply open the app which records a few seconds and then chances are high that it spits out artist, song title and album. The cool thing is that background noise can be present and it still works. That’s exactly what I need. On one hand I’ve a huge collection with clean, studio quality source files. On the other hand I’ve samples with background noise. So I need to set up a Shazam like service. And after googling around I found open source software promising me that. My focus fell on these two solutions:

Building my own Shazam

So I picked the second one, set it up and was ready to try it out. My assumption was that the files I wanted are located in the PEDS_* group. The collect-fingerprints-of-songs.py applications job is to fingerprint each song in a given directory. But it can’t handle nested directories. My directory structure looks like this:

peds_00/ <-- Group name
  0x0D706B6A/ <-- AWC container name
    0x00AF5000.wav <-- Audio file in AWC container
    0x00F770F5.wav
    [...]
  0x1FE191EA/
    0x00AE6C2A.wav
    0x00CE9B27.wav
    [...]

I’ve noticed pretty early that the wav filenames are not unique. Moving all files from every subdirectory into a single directory was therefore not an option. You’d overwrite sound files. Something you don’t want when you are searching for a specific sound file. The application stores the file name in the database and returns it when a match occures. Printing 0x00CE9B27.wav is simply not enough to identify a file reliable. My hope is that all dealer sounds are stored in one AWC directory. By identifying one sound file I’d probaly stumble over the other sounds as well.

I’ve adapted the python application and my first task was to collect all files recursively:

# Collect files in folder
files = []
path = "mp3/"

for pedsDirectory in os.listdir(path):
  for awcDirectory in os.listdir(os.path.join(path, pedsDirectory)):
    for filename in os.listdir(os.path.join(path, pedsDirectory, awcDirectory)):
      if filename.endswith(".wav"):
        fileInfo = [ os.path.join(path, pedsDirectory, awcDirectory, filename), pedsDirectory + "-" + awcDirectory + "-" + filename ]

        files.append(fileInfo)

If you look closely I’m building one big array. Each filed in there contains anothe array with two fields:

Keep in mind that at this point in time I’ve had no idea if I’m even on the right track. But I know for sure that only the file name from the database is printed out without any path information. Encoding the path into the file name was necessary.
Btw: Don’t get confused with the mp3 folder. The developer focused on mp3 but the application can also handle wav files.

By storing all files in one big array I was also able to implement a basic progress meter:

# fingerprint all files in a directory
counter = 0
amountOfFiles = len(files)

for fileInfo in files:
  counter += 1

  if counter % 100 == 0:
    print 'Status: %d/%d' % ( counter, amountOfFiles )

  reader = FileReader(fileInfo[0])
  audio = reader.parse_audio()

  [...]

I also removed nearly all console output to speed up the process. I deleted the database and started the application. I measured the time again: Processing 100 files took round about 15 seconds. Quick look at the clock: 1 am. Time to go to bed and let it running. I’ve woken up in the night and quickly checked the progress. The application finished. A random thought came across my mind: Maybe it’s not in the PEDS_* group but in S_MISC. So I extracted also that group into the directory and started the application again before going back to sleep.
When I got up the new group was also imported and the application warned about “duplicate entries”. Nothing bad. Just the application telling me it already analyzed the audio file. I cancelled the application.

How many files are now in the database? 312124. My wanted audio files must be in there somewhere!

Getting audio samples and testing

Now that I’ve a database with the clean recordings it’s time to throw my audio sample with background noise like music, talking etc against it. Instead of recording my game I’ve downloaded two videos of people playing Blackjack from YouTube:

Thanks to the uploaders for not talking! :-) I’ve cut out samples and converted them into audio files. I’ve collected a few numbers, “bets please”, “blackjack” and “insurance”. I tried to vary them in length. And sometimes the piano player stops playing! At the end I’ve had 14 samples. I copied them to my Windows machine and tried processing them with recognize-from-file.py. To check how I’d pass the file name I opened the script and to my surprise this was the content:

from libs.reader_microphone import FileReader

song = None
seconds = 5

r = FileReader(123)
r.recognize(seconds=seconds)

print(song)

IT IS INCOMPLETE! I’ve not checked it before. My heart dropped. I’ve tried to read in the code to see if I could fix it. But I have no idea from python and audio processing. I’ve noticed that there are many forks and someone tried to implement it. One that worked for me was this version (🖇️ 🔐) by kusamops. Many thanks at this point! You safed me from thowing my PC out the window.

Time to throw the first file against it:

A match! And because the file path is included in the file name I know where to look. Group peds_02, AWC container 0xCEAEDDB7 file 0x1D7231C2.wav. But the offset value looks strange. Nevermind. Opening it aaaaaaaand it’s a person speaking but now what I searched for. I’ve heard and compared it a couple of times and identified the problem: The piano matches the voice up and downs.
Next sample: The same.
Another sample: The same.
Another sample: Same. Fucking shiiiiiiiittt! Can somebody please shoot the piano player?
Another sample: Same. Similar but not what I wanted. Are my samples to short? They are mostly just one second.
Searched for a sample without the piano and the dealer says “it’s a blackjack” (video 1, round about 07:25). Throwing the sample against the database again.

That’s strange. It’s not in the PEDS_* group. It’s in MISC. I’ve opened the file in VLC and the content did not match. But it was similar and the voice seemed to be the same. So I listened to all files in that AWC container (183 sounds, total of 4 minutes). It starts strange. Screaming “god no please”, “where is the law”, etc. Doesn’t sound like a dealer. And then suddenly: “Insurance bet?", “Blackjack!", “Oh a new player”, etc. All the numbers interruped by screams and/or aggression. WONDERFULL! I found a dealer AWC! After all the hours wasted!

Well played Rockstar! You hid them well. If you are searching: Have a look at S_MISC -> 0x1E9CFF94.awc.

Was it luck? If you ask me: no. It’s not a 100% match, but it pointed me in the right direction! I guess the main problem is that the samples and seeked files are very short. Very little data to hold on for the application.

Taking it a step further: Find Nigel

What? You don’t know who Nigel is? Nigel is the one guy searching for Gavin. Yeah now you know, right? I knew that I could obtain a letter from him but I … well … I did not want to encouter him again so I just shot him in the head instead of lassoing him. But his sound files must also be in there.

Again two approaches:

All samples from the videos worked. Three of five samples of my own voice mimicking the voice also worked. What is the difference to the other samples? They are all a few seconds long. Seems like you can get better matches with longer samples. You can gind Nigel in SS_NZ -> 0xB560AA46.awc

Conclusion

After hours I finally found them. I’ve given up several times in between but at the end I’m happy that I finally was able to solve it and extract my wanted files. I can continue with my own, private Blackjack now! At the end I can’t publish the files or my game since Rockstar of course holds the copyright. It may sound stupid wasting so much time finding them. But a game without sound is not nice.
Did I learn someting? Yes!

Next time I use the WSL for something I’ll invest time to see if I can install and run a ssh server. Because all the necessary data was on Windows and I did not want to sync files or create network shares to work with I went with the WSL. It’s okay but I’d not want to use it on a daily basis.

And at the end: Thanks to Rockstar for developing Red Dead Redemption 2. It’s an amazing game and after 160 hours of gameplay I still find new stuff to try out. The SFX files alone were a big part and for sure cost a fortune to produce. I’m glad I bought the game and never regretted paying 60 € for it. I understand that Rockstar wants to protect their game files and takes countermeasures like storing the files with hex/hashed names. I’d still like to see more support for the modding community. At the end OpenIV (actually used for modding Rockstar games) still finds a way in.

Further reading


Du hast einen Kommentar, einen Wunsch oder eine Verbeserung? Schreib mir doch eine E-Mail! Die Infos dazu stehen hier.

🖇️ = Link zu anderer Webseite
🔐 = Webseite nutzt HTTPS (verschlüsselter Transportweg)
Zurück