Using audio fingerprinting to find game SFX files
Sunday, April 5 2020 · Lesezeit: 10 Minuten · 1928 Wörter Achtung! Dieser Artikel ist älter als ein Jahr. Der Inhalt ist möglicherweise nicht mehr aktuell!This post continues my journey to extract the Read Dead Redemption 2 Blackjack dealer and player sounds from the game files. Yesterday I’ve written down how I tried to use speech to text with a pre-trained/machine learned set to get a searchable text. That did not go so well. Please read the other post first, because I will pick up some references here. Otherwise you might not get what’s going on.
After speech to text failed I tried searching for another solution. It looks like another approach is called “audio fingerprinting”. One popular service you might know is Shazam. When you are listening to a song and don’t know the name of it, you simply open the app which records a few seconds and then chances are high that it spits out artist, song title and album. The cool thing is that background noise can be present and it still works. That’s exactly what I need. On one hand I’ve a huge collection with clean, studio quality source files. On the other hand I’ve samples with background noise. So I need to set up a Shazam like service. And after googling around I found open source software promising me that. My focus fell on these two solutions:
- dejavu (🖇️ 🔐) : Could be nice if I’d have been able to run it. No matter what python mysql module I tried to install it always told me that PyMySQL was not found. A venv and apt package also did not solve the problem. I’d like to mention that this is not the first time I took part in the python + mysql rodeo. And then mysql also behaved strange in WSL. Next!
- audio-fingerprint-identifying-python (🖇️ 🔐) : Also python but it uses a sqlite database. Just start it and it’s running! So easy!
Building my own Shazam
So I picked the second one, set it up and was ready to try it out. My assumption was that the files I wanted are located in the PEDS_* group. The collect-fingerprints-of-songs.py
applications job is to fingerprint each song in a given directory. But it can’t handle nested directories. My directory structure looks like this:
peds_00/ <-- Group name
0x0D706B6A/ <-- AWC container name
0x00AF5000.wav <-- Audio file in AWC container
0x00F770F5.wav
[...]
0x1FE191EA/
0x00AE6C2A.wav
0x00CE9B27.wav
[...]
I’ve noticed pretty early that the wav filenames are not unique. Moving all files from every subdirectory into a single directory was therefore not an option. You’d overwrite sound files. Something you don’t want when you are searching for a specific sound file. The application stores the file name in the database and returns it when a match occures. Printing 0x00CE9B27.wav
is simply not enough to identify a file reliable. My hope is that all dealer sounds are stored in one AWC directory. By identifying one sound file I’d probaly stumble over the other sounds as well.
I’ve adapted the python application and my first task was to collect all files recursively:
# Collect files in folder
files = []
path = "mp3/"
for pedsDirectory in os.listdir(path):
for awcDirectory in os.listdir(os.path.join(path, pedsDirectory)):
for filename in os.listdir(os.path.join(path, pedsDirectory, awcDirectory)):
if filename.endswith(".wav"):
fileInfo = [ os.path.join(path, pedsDirectory, awcDirectory, filename), pedsDirectory + "-" + awcDirectory + "-" + filename ]
files.append(fileInfo)
If you look closely I’m building one big array. Each filed in there contains anothe array with two fields:
fileInfo[0]
contains the full path to the file. For example:mp3/peds_00/0x0D706B6A/0x00AF5000.wav
fileInfo[1]
contains the path in the file name:peds_00-0x0D706B6A-0x00AF5000.wav
Keep in mind that at this point in time I’ve had no idea if I’m even on the right track. But I know for sure that only the file name from the database is printed out without any path information. Encoding the path into the file name was necessary.
Btw: Don’t get confused with the mp3 folder. The developer focused on mp3 but the application can also handle wav files.
By storing all files in one big array I was also able to implement a basic progress meter:
# fingerprint all files in a directory
counter = 0
amountOfFiles = len(files)
for fileInfo in files:
counter += 1
if counter % 100 == 0:
print 'Status: %d/%d' % ( counter, amountOfFiles )
reader = FileReader(fileInfo[0])
audio = reader.parse_audio()
[...]
I also removed nearly all console output to speed up the process. I deleted the database and started the application. I measured the time again: Processing 100 files took round about 15 seconds. Quick look at the clock: 1 am. Time to go to bed and let it running. I’ve woken up in the night and quickly checked the progress. The application finished. A random thought came across my mind: Maybe it’s not in the PEDS_* group but in S_MISC. So I extracted also that group into the directory and started the application again before going back to sleep.
When I got up the new group was also imported and the application warned about “duplicate entries”. Nothing bad. Just the application telling me it already analyzed the audio file. I cancelled the application.
How many files are now in the database? 312124. My wanted audio files must be in there somewhere!
Getting audio samples and testing
Now that I’ve a database with the clean recordings it’s time to throw my audio sample with background noise like music, talking etc against it. Instead of recording my game I’ve downloaded two videos of people playing Blackjack from YouTube:
Thanks to the uploaders for not talking! :-) I’ve cut out samples and converted them into audio files. I’ve collected a few numbers, “bets please”, “blackjack” and “insurance”. I tried to vary them in length. And sometimes the piano player stops playing! At the end I’ve had 14 samples. I copied them to my Windows machine and tried processing them with recognize-from-file.py
. To check how I’d pass the file name I opened the script and to my surprise this was the content:
from libs.reader_microphone import FileReader
song = None
seconds = 5
r = FileReader(123)
r.recognize(seconds=seconds)
print(song)
IT IS INCOMPLETE! I’ve not checked it before. My heart dropped. I’ve tried to read in the code to see if I could fix it. But I have no idea from python and audio processing. I’ve noticed that there are many forks and someone tried to implement it. One that worked for me was this version (🖇️ 🔐) by kusamops. Many thanks at this point! You safed me from thowing my PC out the window.
Time to throw the first file against it:
A match! And because the file path is included in the file name I know where to look. Group peds_02
, AWC container 0xCEAEDDB7
file 0x1D7231C2.wav
. But the offset value looks strange. Nevermind. Opening it aaaaaaaand it’s a person speaking but now what I searched for. I’ve heard and compared it a couple of times and identified the problem: The piano matches the voice up and downs.
Next sample: The same.
Another sample: The same.
Another sample: Same. Fucking shiiiiiiiittt! Can somebody please shoot the piano player?
Another sample: Same. Similar but not what I wanted. Are my samples to short? They are mostly just one second.
Searched for a sample without the piano and the dealer says “it’s a blackjack” (video 1, round about 07:25). Throwing the sample against the database again.
That’s strange. It’s not in the PEDS_* group. It’s in MISC. I’ve opened the file in VLC and the content did not match. But it was similar and the voice seemed to be the same. So I listened to all files in that AWC container (183 sounds, total of 4 minutes). It starts strange. Screaming “god no please”, “where is the law”, etc. Doesn’t sound like a dealer. And then suddenly: “Insurance bet?”, “Blackjack!”, “Oh a new player”, etc. All the numbers interruped by screams and/or aggression. WONDERFULL! I found a dealer AWC! After all the hours wasted!
Well played Rockstar! You hid them well. If you are searching: Have a look at S_MISC -> 0x1E9CFF94.awc.
Was it luck? If you ask me: no. It’s not a 100% match, but it pointed me in the right direction! I guess the main problem is that the samples and seeked files are very short. Very little data to hold on for the application.
Taking it a step further: Find Nigel
What? You don’t know who Nigel is? Nigel is the one guy searching for Gavin. Yeah now you know, right? I knew that I could obtain a letter from him but I … well … I did not want to encouter him again so I just shot him in the head instead of lassoing him. But his sound files must also be in there.
Again two approaches:
- Three samples ripped from a YouTube video
- Actually tried to recreate the shout outs and recorded my voice (5 samples). I’ve recorded them with my windows open so my neighbors probably think that I’m on drugs whining “Gaviiiiiin”, “This is ridiculous mate!” and “Someone must have seen an english man!” :-)
All samples from the videos worked. Three of five samples of my own voice mimicking the voice also worked. What is the difference to the other samples? They are all a few seconds long. Seems like you can get better matches with longer samples. You can gind Nigel in SS_NZ -> 0xB560AA46.awc
Conclusion
After hours I finally found them. I’ve given up several times in between but at the end I’m happy that I finally was able to solve it and extract my wanted files. I can continue with my own, private Blackjack now! At the end I can’t publish the files or my game since Rockstar of course holds the copyright. It may sound stupid wasting so much time finding them. But a game without sound is not nice.
Did I learn someting? Yes!
- Throwing money at a problem still works. It just depends how much you are willing to spend.
- Googles speech to text is really good. So good that I somehow fear its power
- I still dislike python
- The Windows subsystem for Linux (WSL) is quickly installed and asome Linux is started quickly. Accessing it via the Windows terminal is not that nice. Clicking on it for example stops the execution of the script. I’ve wasted an hour of CPU time because I did not notice the application paused because of my click.
- Copy and paste from or to the Windows CMD is also a nightmare.
Next time I use the WSL for something I’ll invest time to see if I can install and run a ssh server. Because all the necessary data was on Windows and I did not want to sync files or create network shares to work with I went with the WSL. It’s okay but I’d not want to use it on a daily basis.
And at the end: Thanks to Rockstar for developing Red Dead Redemption 2. It’s an amazing game and after 160 hours of gameplay I still find new stuff to try out. The SFX files alone were a big part and for sure cost a fortune to produce. I’m glad I bought the game and never regretted paying 60 € for it. I understand that Rockstar wants to protect their game files and takes countermeasures like storing the files with hex/hashed names. I’d still like to see more support for the modding community. At the end OpenIV (actually used for modding Rockstar games) still finds a way in.
Further reading
Du hast einen Kommentar, einen Wunsch oder eine Verbesserung? Schreib mir doch eine E-Mail! Die Infos dazu stehen hier.
🖇️ = Link zu anderer Webseite
🔐 = Webseite nutzt HTTPS (verschlüsselter Transportweg) Zurück