VRSFX’s binaural audio brings true surround sounds to existing headphones
Any regular readers will know that VR’s presence is the holy grail that the industry is working so hard to reach. It’s not just the difference between watching the game on TV versus being at the park. Presence is actually being on the pitch. The Oculus Rift is very, very close to the fidelity required to subvert your visual cortex into suspending its critical faculty, but what of our ears? Who and what is taking on that challenge and can it be met on commodity accessories? One of the industry luminaries delivering an answer to that question is VRSFX, an LA-based outfit currently endeavouring to bring binaural sound effects to VR gaming.
We’ve covered some of the details of binaural audio in our previous article but, to dig a little deeper, we caught up with their CEO, AJ Campbell, to get a better idea of what VR sound effects are and what binaural audio can do to promote the sense of presence.
VR Games meets VR SFX
Unity and Unreal have already confirmed official VR support in the latest versions of their game engines. By replacing a single-lens camera with a binocular one, they’ve given access to the same environment to VR HMDs. However, the same cannot be said for the sound effects, which still rely on hardware-intensive filtering to produce the kind of variance that real spatial geometry brings. This is about to change.
VRSFX are working on VR sound effect packages for game-engines Unity and Unreal. In a nutshell, how does it work?
AJ: We’re aiming to deploy our sample packages to Unity beginning this month in July, and we want to be ready to deploy to the Unreal Marketplace before the end of 2014. We want game designers to be able to take advantage of omni-directional binaural audio, which is the highest quality standard for producing spatial audio presence. The equipment and techniques we use are very similar to those in the Chris Milk video Hello Again (a 360 degree music video starring Beck).
That video is simply amazing and I get the sense that all videos will be shot this way in future (I’m on a mutha-f**king BOAT!). The recording device is a multi-faceted head (pictured below) that reminds me eerily of the Quintessons from Transformers the movie (1982). I guess the sounds adapt to HMD-orientation by rotating the ‘face’ being used?
AJ: Correct. Each of our samples is mixed down into four stereo sub-mixes, one for each facing orientation (north, south, east, and west). We will also include a plugin with each sample package to dynamically handle cross-fading from one sub-mix to the next according to the orientation of the HMD and of the audio source. That way, the playback in the user’s headphones will always coincide with the binaural mic that was facing the appropriate direction when the sound was recorded. Our plugin is currently being tested with the standard character controller included in the Unity package provided by Oculus VR, but it should also integrate easily with just about any character controller (even for platforms other than virtual reality).
Is there any latency in their adaptation?
AJ: Introducing VRSFX binaural sounds into a project should not create any latency as compared to regular mono or stereo surround samples.
Does it get computationally expensive, I mean does the overhead go up dramatically depending on the number of sources? Or is it a linear projection?
AJ: Definitely a linear projection. It’s actually possible to improve CPU performance in some cases when switching to VRSFX samples. This is because many game sound effects achieve spatial identity (i.e. a sense that the sound came from a certain direction) by executing filtering algorithms. These algorithms calculate their filters in real time based upon 3D data surrounding the character. They recalculate this information every frame so they are very flexible, but we still live in a world where no amount of processing power can be squandered. Essentially, these algorithms are spending CPU in an attempt to digitally mimic the sound profile of a binaural mic for a sound that wasn’t recorded with a binaural mic. We saw this and said ‘why not just use an actual binaural mic instead?’
These algorithms can continue to improve spatial quality by spending more and more CPU. They usually capture a sound that is a close enough compromise without eating too much of the processor. At VRSFX, we currently create our samples exclusively with actual omni-binaural mics, not with algorithms. The spatial quality is baked into the audio track, so developers should turn off any other spatial processing plugins when using VRSFX samples because they are not needed (and they will hurt the sound and waste CPU if left on). This is why VRSFX samples tend to require less CPU even while they achieve maximum spatial identity.
What level of accuracy is realistically achievable and how close are you?
AJ: We are aiming for fist-point accuracy of spatial identity for all sound effects, and we have achieved it for most of the sound effects we’ve created so far. When the end user reacts instinctively to a sound because they have no doubt it came from directly behind them, we’ve done our job correctly. The notion of the sound coming from a certain direction is important, but it’s actually just a means to an end. Our goal is for the end user’s subconscious to be convinced the sound is not a simulation. In other words, our mission to enable them to experience audio presence. The beautiful benefit of using high quality binaural equipment is that it captures audio presence intrinsically.
VRSFX have released a sensational binaural demo of some of their sci-fi sounds (grab your headphones before watching). I was skeptical of the vertical spatial field, but even with a dodgy right ear (thankyou, Rock and Roll) I can safely say, they nailed it! Check the video below and follow VRSFX on YouTube to get all their updates
Our goal is for the end user's subconscious to be convinced the sound is not a simulation.
Is there a limit to the number of individual sound sources that can be handled?
AJ: This depends upon the technical needs of the project, and especially the platform on which it will deploy. For an Oculus Rift experience on a state-of-the-art gaming PC, there should be no fear of an upper limit. For mobile apps, we recommend that developers use our sound effects sparingly. While they are not too costly on the CPU, they do have a large storage footprint compared to other sound effects. Mobile app developers must keep the total storage footprint as low as possible to maximize over-the-air download speed. Adding dozens of VRSFX effects would have a noticeable impact on the total footprint of their compiled binaries.
what about environmental interaction? Can reflections/echoes be made to affect the sound?
AJ: We recommend avoiding post processing (reverb, delay, echo, etc.) of VRSFX samples. Since binaural spatial identity is baked into each track, there should rarely be a need for more processing. Any post changes could tamper with the spatial identity of the final mixes. If additional post work is needed, developers should feel free to contact us and see if we can help with custom work.
If additional post work is crucial in a certain case and a direct partnership is not possible, we recommend applying post processing along with a spatial filter to the non-omni-binaural copy of our sound effect, which is included in each package. This will likely result in spatial identity that is somewhat less accurate than the VRSFX omni-binaural version, but it will have the flexibility of enabling further post work. When faced with a choice between binaural equipment versus spatial algorithms, we recognized the tradeoff was quality versus flexibility. Our samples ship with quality as the priority, but we ensure that they have flexibility as an option if needed.
Also of note, some VRSFX sound effects include a baked echo option. This option is binaurally treated in the final mix, so it should work out-of-the-box. It may or may not suit a particular environment if the developer wants ultra-realistic timing, but it is much quicker than resorting to custom post work.
I can see that coming in handy. On that note: do materials baffle/warp or otherwise affect the sound?
AJ: We are experimenting with props as filters during recording. For instance, we are running a test right now where we encase our mic inside enclosures of various size and material while capturing audio from outside. We intend to put this technique to use in some of our Sci Fi packages, where the enclosure represents a cockpit that slightly muffles sounds from outside in space.
Sounds like a great fit for the new EVE: Valkyrie game!
Looking to the future, are there any plans for a web plugin for Unity and UE4’s JS-based export?
AJ: That is a good question. Since our samples are not algorithmic, we expect they will work just fine in any web deployment, but we will certainly be testing this.
Binaural Audio Renaissance
This isn’t the first time binaural audio has been seen as the next big thing in high fidelity. It has in fact been around for decades in one form or another. It is only now that is is being taken seriously in games.
What took so long? Was it processor speeds, the science or the code catching up?
AJ: The code we’re writing is fairly straightforward. We’re including it for convenience so that developers of all skill levels will have equal ease jumping right in and playing with our sounds. The processor speeds have been fast enough to handle our samples for a long time also. VRSFX is not utilizing a breakthrough of new technology, but more of an insight into new ways of using old technology. Binaural mics have been around for ages, but it was really Chris Milk who first decided to string them into an omni-directional array for virtual reality earlier this year. After that, it just took a game developer who also happened to be an audio nut to tie the pieces together, and VRSFX was born.
The ear is incredibly powerful as a spatial identifier. The variation in time delay from one ear to the next is unique from every direction in 3D.
So the requirement just wasn’t there until the advent of VR?
AJ: That’s definitely part of it. Chris Milk’s Hello Again video may never have been imagined if the fine folks at Oculus hadn’t dared us all to dream again, and then VRSFX wouldn’t exist either.
Yep, it’s just a matter of time until Palmer Luckey is GQ’s man of the year! He’s rebooted a tired market and inspired a new generation of bedroom developers. If you’re looking to do some DIY binaural for your indie game, what are the hardware considerations?
AJ: To record custom samples for omni-binaural playback with the VRSFX plugin, one would need an omni binaural mic with a minimum of 8 condensers arranged in north/south/east/west orientation. There are many options to string multiple binaural mics together in an omni array, but when we started VRSFX we wanted a single-unit rig like the one they built for the Hello Again video. In the process of researching a custom build, we discovered that Jeff Anderson of 3Dio Sound (audio engineer for the Hello Again video), was already building the exact design we needed into the FreeSpace Omni.
Is there any argument for specialist headphones, or are any decent pairs good enough to deliver the level of detail required by binaural?
AJ: We’ve heard arguments for every option under the sun, but our general impression is that binaural audio actually sounds great through any ear buds that have a full enough EQ range. The quality doesn’t necessarily degrade with over-ear headphones, but there is a theory going around that ear buds are ideal because they are the closest to mimicking the exact position of the mic relative to its ear replica diaphragm.
That’s good to know. We can’t wait to try it! Are you going to be releasing something any time soon?
AJ: Our first sample package will be Sci Fi themed, and it will be released before the end of July.
How do you see things progressing for VRSFX as a company?
AJ: It’s very early days for us. We are interested in partnerships, especially with game designers in development on Oculus Rift projects. We are definitely interested in partnerships in the 360 degree video space too. The number of projects we take on will partially determine the speed at which we can release new sample packages, but we would like to release at least one new package per month for the foreseeable future. The more developers who download our samples, the faster we can scale up our team and release even more content. We have ideas for dozens of sample packages.
How do you plan to support yourselves with this, i.e. what are your plans for monetisation?
AJ: Our sample packages will be priced competitively with other samples on the Unity Asset Store (VRSFX will offer the only omni binaural samples available to date), and we will be offering promotional pricing for early adopters. Our plugins will always be included with each package. We also intend to release small promotional packages at no charge to help get developers familiar with the VRSFX sound. These will contain 1-3 full quality samples each.
Our first sample package will be Sci Fi themed, and it will be released before the end of July
Is this technology going to be expensive for indie game devs?
AJ: To capture samples of VRSFX quality, you need an experienced recordist/audio engineer who owns several thousand dollars worth of equipment specifically geared toward this purpose. If custom sounds are needed, we are happy to partner up and help developers avoid eating these high costs internally. Indie devs should feel free to contact us, especially this month. We will have limited ability to take on indie projects that don’t have budgets, and it will become more difficult as we scale.
That said, our sample packages are designed with primarily indie devs in mind, although any developer could use them for efficiency. If stock samples will work, our packages are purpose-built in the sub-$100 price range per package. They will vary in size and accordingly in price. We may also release some monstrous package bundles for quantity discounts eventually, but that will be after we have grown out our sample library in multiple categories.
Do you have any industry partners who’ll be showcasing your work?
AJ: We are currently working with a small handful of studios in the Los Angeles area, but it’s too early to announce anything official yet.
As a subjective guess, what level of accuracy can truly be achieved by this approach to fidelity, given the ear absorbs sound from the entire surface area (even around the ear itself?).
AJ: Fist-point accuracy in 3D. You hit the nail on the head as to how it’s possible. The ear as a diaphragm is incredibly powerful as a spatial identifier because it colors the entire EQ spectrum in a unique way at every angle, largely due to its asymmetrical nature. The variation in time delay from one ear to the next is also key because it is unique from every direction in 3D.
Can Binaural handle differences in vertical sound source reproduction?
AJ: Binaural mics can definitely create accurate audio presence along the vertical axis. This is something that spatial algorithms have had trouble with (although they are getting much better).
Why is that?
AJ: It’s the nature of our biology that our ears are aligned horizontally and not vertically, which makes us naturally quicker at spatial identification along the horizontal. Only certain sound effects will actually require a distinctly unique signature along the vertical, but in those cases we play with mic position and post processing extensively until we arrive at a final mix that represents the vertical well. We do actually exercise vertical spatial identification regularly. We know a plane is above us because the sound has a signature that is unique along the vertical, and binaural mics are adept at capturing that signature.
Do you think it will be possible to have a ‘Battlefield’ type scenario where every event (guns, bombs, planes, tanks, men etc.) are all being binaurally generated? If so, how far away are we from this level of quality?
AJ: We’re already at or very close to this level of quality in some AAA titles (although none are treated in omni for VR yet to our knowledge). Most AAA teams use spatial algorithms primarily, so the quality standard is really high for gaming already (higher than other platforms like film and TV). Battlefield stands out as a franchise that is obviously already making spatial audio a priority in the experience. The audio becomes a tactical gameplay element since the player can use it to identify the direction of an approaching enemy before they are necessarily in view. Players who don’t play with headphones may not necessarily realize the benefit because the effect is not as pronounced through speakers.
Also, we don’t recommend assuming binaural treatment is best for ALL sound effects. In some situations, a spatial audio filter will create a better experience of presence due to its flexibility to recalculate according to the changing conditions of gameplay. This means that sounds that need to sustain for extended periods while their position is changing unpredictably are usually better treated with a filter. Generally speaking, though, projects using no binaural sound can always improve the audio experience by treating the appropriate sound effects binaurally. In our sample packages, we choose which sounds to include carefully by picking the ones are sure to sound better in binaural rather than through a spatial filter because those are the sounds that play to the strength of our techniques.
[Ed: There we have it. Like so much in life, moderation is good.]
VR Gaming would like to thank AJ and all at VRSFX for this comprehensive insight into virtual reality sound effects. Keep track of their progress on the VRSFX blog
If you have any thoughts or questions about Binaural audio, let us know in the comments.