Skip to Content

BSE

Give yourself a personal immersive audio environment.


Why binaural audio is stuck in the past


Binaural audio promises the ultimate spatial experience: full 3D positioning delivered through ordinary headphones. The technology has been understood for over a century. Your brain localizes sounds using interaural time differences, level differences, and spectral cues from your outer ears. Reproduce these cues accurately through headphones, and the brain perceives sound in three-dimensional space around you.


The traditional approach uses Head-Related Transfer Functions — measurements of how sound travels from a point in space to your eardrums. Capture these impulse responses for thousands of positions, store them in massive databases, and convolve incoming audio with the appropriate HRTFs in real-time. The math is straightforward; the results can be impressive.


But HRTF convolution carries fundamental limitations. The impulse responses are typically measured on a dummy head or a small sample of human subjects. Butyour head and how you ear is unique. The spectral cues that provide elevation perception vary dramatically between individuals — what sounds "above" for one person might sound "in front" for another using the same generic HRTF.


Beyond personalization, convolution demands significant processing resources. Each source requires continuous FFT operations, multiplications in the frequency domain, and inverse FFTs. The CPU cost scales linearly with source count. Mobile devices struggle with more than a handful of sources. Embedded systems often can't run HRTF convolution at all. And the block-based nature of FFT processing introduces latency — typically 10-50 milliseconds — that breaks the connection between visual and audio events in interactive applications.


The audio industry has accepted these limitations as inherent to binaural rendering. They're not. They're inherent to convolution-based approaches. A different approach yields different tradeoffs.










How BSE works


BSE abandons HRTF convolution entirely. Instead of convolving with measured impulse responses, BSE models the physical and perceptual mechanisms of spatial hearing directly, using parametric filters that reproduce the acoustic phenomena your brain relies on for localization.


This approach draws on decades of psychoacoustic research. We know that interaural time differences dominate localization for frequencies below 1500Hz. We know that interaural level differences become significant above 1500Hz as the head creates acoustic shadowing. We know that the pinna — your outer ear — creates frequency-dependent reflections that encode elevation information. We know that your torso and shoulders create reflections that contribute to front-back discrimination. We know that ground reflections provide additional height cues in many listening situations.


BSE models each of these phenomena explicitly. 


These models are implemented entirely as filters. There's no FFT, no convolution, no block processing. BSE adds just a few samples of latency at any sample rate. The CPU cost is trivial compared to convolution. A single DSP core can render dozens of sources simultaneously.

But here's where BSE becomes genuinely different: personalization. 


Because the processing is parametric rather than data-driven, the parameters can be adjusted. Head width affects interaural time differences. Head height affects torso and ground reflection timing. These aren't abstract numbers — they're physical dimensions that you can measure on your own body. Set them correctly, and BSE's spatial accuracy improves noticeably. 


The binaural rendering adapts to you rather than forcing you to adapt to a generic dummy head.










What sets BSE apart


No HRTF convolution


BSE doesn't convolve with impulse responses because it doesn't need to. The spatial cues are synthesized parametrically, modeling the physics of spatial hearing rather than replaying recorded measurements. This eliminates the CPU burden of FFT processing, the memory burden of HRTF databases, and the latency burden of block-based operations.



Ultra-lightweight processing



Simple filters only. That's the extent of BSE's processing requirements. Any platform that can run basic audio processing can run BSE. Mobile phones, embedded systems, Bluetooth headphones with onboard DSP, automotive head units — all viable BSE platforms. The CPU headroom that convolution would consume remains available for other processing or battery life.



True personalization



Generic HRTFs are a compromise that everyone accepts because personalized HRTF measurement requires expensive equipment and controlled conditions. BSE sidesteps this entirely. Measure your head width with a ruler. Set your height. Enter these values, and BSE's parametric models adapt to your anthropometry. The spatial accuracy improvement is immediate and significant — especially for elevation perception, where individual variation matters most.



Timbre preservation



HRTF convolution often colors the sound noticeably. The impulse responses capture room characteristics from the measurement environment, resonances from the dummy head, and artifacts from the measurement process itself. Some listeners describe HRTF-processed audio as "hollow" or "filtered." BSE's parametric approach avoids this coloration. The filters model spatial cues specifically, not the entire measurement chain. During development, we validated BSE against reference recordings to ensure that spatialization doesn't compromise tonal quality or speech intelligibility.



Unlimited sources



Each mono input becomes an independently positioned source. Add as many as your creative application requires. The processing cost per source is trivial; the practical limit is your audio interface's channel count and your CPU's overall capacity (which BSE barely touches).



HSR integration



BSE pairs naturally with HSR for stereo-to-binaural conversion. HSR analyzes the stereo input and extracts spatial components as discrete sources distributed around the listener. BSE renders each source binaurally. The result: immersive headphone listening from any stereo content, with genuine spatial distribution rather than crude stereo widening.



Head tracking ready



Source positions and head rotation update in real-time with no additional latency. Interactive applications track the listener's head movement and adjust the binaural rendering accordingly. VR systems integrate seamlessly. The few-sample processing delay means audio stays locked to visual spatial information.











Your head, your sound


BSE's personalization isn't a marketing feature — it's a functional necessity that dramatically improves spatial accuracy. The parameters directly modify the physical models underlying BSE's processing.



Head width



The distance between your ears determines interaural time differences. A wider head means larger maximum ITD; a narrower head means smaller. BSE's default assumes average adult head width, but "average" might be significantly different from yours. Measuring your actual head width and entering it into BSE ensures that sources positioned to one side arrive at your ears with timing differences that match what your brain expects from real-world experience.



Head height



Your ear height relative to your shoulders affects torso reflection timing and ground reflection geometry. Taller listeners experience different vertical cues than shorter listeners. BSE's head height parameter adjusts the torso and ground reflection models to match your actual anatomy, improving front-back discrimination and elevation accuracy.



Source positioning



Every source has full 3D positioning: azimuth (left-right angle), elevation (up-down angle), and distance. Position sources anywhere in the sphere around the listener. BSE renders appropriate spatial cues for each position, including the elevation-dependent pinna filtering that most binaural systems get wrong for off-horizontal sources.



Head rotation



For interactive applications, BSE accepts continuous head rotation input. As the listener turns their head, the sound field rotates appropriately. Yaw, pitch, and roll are all supported. The rotation applies to all sources simultaneously with no additional latency, keeping the auditory scene locked to real or virtual head position.



For OEM applications, all parameters can be pre-configured for specific devices. A headphone manufacturer might tune BSE for their particular driver characteristics. An automotive OEM might configure BSE for speaker-based binaural in their specific headrest geometry. A VR headset maker might integrate BSE with their head tracking system and calibration workflow.









From stereo to immersive binaural


The combination of HSR and BSE creates a complete stereo-to-binaural pipeline that no other approach matches.



The challenge is fundamental: stereo recordings encode spatial information as channel differences and level relationships. Headphone playback reproduces these differences directly into each ear, but the brain doesn't interpret this as spatial audio — it interprets it as sound inside your head. Traditional crossfeed attempts to fix this by blending channels, but the result is vague and unconvincing.



HSR + BSE takes a different approach. HSR analyzes the stereo recording and extracts its spatial content as discrete components distributed across the sound field. A vocal panned center becomes a source at 0 degrees. A guitar panned left becomes a source at -30 degrees. Ambient information that existed as diffuse stereo difference becomes sources distributed across the rear hemisphere.



These extracted sources then feed BSE, which renders each one binaurally with full spatial cues. The center vocal gets ITD and ILD appropriate for a frontal source. The left guitar gets cues for its lateral position. The ambient sources surround the listener as they did in the original stereo mix's intent — but now with actual perceptual spatialization rather than crude channel separation.



The workflow runs in real-time. Streaming music gains immersive headphone presentation with no content modification required. Legacy recordings suddenly sound spatial. The listening experience transforms without touching the source material.



Learn more about HSR








Specifications


ParameterValue
InputMono per source (unlimited sources)
OutputBinaural stereo (headphones or speaker-based)
LatencyFew samples
Sample rates44.1 / 48 / 96 / 192 kHz
Bit depth16-bit, 24-bit, 32-bit float
ProcessingParametric (no FFT, no convolution)
Head trackingSupported (yaw, pitch, roll)
CPU footprintMinimal — suitable for mobile and embedded


Per-source parameters


  • Position (azimuth, elevation, distance)
  • Gain
  • Reverb send

Global parameters


  • Head width
  • Head height
  • Head rotation (real-time, yaw/pitch/roll)










Applications


Headphone Listening


The vast majority of audio consumption now happens on headphones. Commuters, travelers, office workers, home listeners — headphones dominate. Yet headphone presentation remains fundamentally inferior to speaker playback for spatial content. 


Stereo collapses to a line between your ears. Surround content downmixes to stereo and loses its spatial intent. Even "spatial audio" features in consumer devices often amount to crude DSP tricks rather than genuine binaural rendering.


BSE transforms headphone listening by rendering true spatial audio from any source. Stereo content, when processed through HSR + BSE, gains genuine spatial distribution. Multichannel content renders with full spatial positioning. Object-based audio places sources around the listener as intended. 


The headphone experience approaches — and in some ways exceeds — speaker playback for spatial impression.


The lightweight processing means BSE runs on the devices listeners actually use. A smartphone app can implement BSE without draining the battery. A portable DAC can include BSE in its DSP. Wireless earbuds with onboard processing can spatialize audio before it reaches the ear. The technology goes where the listeners are. 

Gaming & VR


Virtual reality promises complete immersion, but audio often undermines the visual illusion. Spatialization that doesn't match visual positioning breaks presence. Latency between head movement and audio response creates nausea and discomfort. 


CPU budget consumed by audio processing reduces frame rates and visual quality.

BSE addresses each of these problems. The physics-based spatial model produces positioning that matches visual expectations — a sound rendered at 3 meters distance sounds like it's at 3 meters because BSE models the acoustic reality of that distance. 


The few-sample latency means head tracking responses are imperceptible — turn your head, and the sound field turns with you instantly. The minimal CPU requirement leaves processing budget for graphics, physics, and gameplay.


For VR game developers, BSE integrates with standard audio middleware. Position sources using your existing spatial audio workflow; BSE handles the binaural rendering. The personalization parameters can connect to VR calibration systems, improving spatial accuracy based on the user's head measurements.


For AR applications, BSE's low latency and head tracking support enable audio augmentation of the real world. Virtual sounds blend with real environment sounds with consistent spatial behavior. An AR notification appears at a specific position in space and sounds like it's actually there. 


Automotive


Personal audio zones in vehicles present a unique opportunity for binaural technology. Headrest speakers or near-field transducers can deliver sound to individual passengers without disturbing others — if the spatial rendering convinces the ear-brain system that sound is coming from around the listener rather than from two points near their head.


BSE's speaker-based binaural mode addresses this application. The parametric models adapt to near-field speaker delivery, accounting for the different geometry compared to headphone presentation. Each passenger experiences personal spatial audio through speakers integrated into their seating position.


For automotive OEMs, BSE's embedded-friendly processing integrates into existing vehicle audio architectures. The low latency ensures compatibility with announcement systems, navigation prompts, and hands-free calling. The personalization parameters can tie into seat position memory, adjusting the spatial rendering when different drivers use the vehicle.


Aviation


Aircraft entertainment systems increasingly offer personal audio, but the listening experience remains compromised. Headphones provided by airlines are low-quality. Headrest speakers disturb neighboring passengers. Neither option delivers spatial audio that makes long-haul entertainment truly immersive.


BSE enables headrest-mounted speaker arrays to deliver personal binaural audio to each seat. The passenger experiences spatial sound without wearing headphones, without disturbing neighbors, and with quality that transforms the entertainment experience. The parametric processing runs on existing aircraft entertainment system hardware without modification.


For airline cabin designers, BSE represents a genuine differentiator in premium class experiences. First and business class passengers can enjoy immersive movie soundtracks, spatial music, and 3D audio content through speakers integrated into their suites. The technology exists today, ready for deployment.



OEM licensing


We’re selling BSE as a one-time payment for one brand. With this licence you will have access to BSE source code and DSP code, for you to use as you like.


If you’re planning to sell parts of BSE code in B2B solutions, you will need to buy one licence per brand you’re planning to sell to. For this type of use we’re making discount on multiple licence payement.


If you want to test BSE, please contact us and we will send you a demo to test.





Contact us | Get an appointment