New publication in the IEEE/ACM Transactions on Audio, Speech, and Language Processing

The IT’S A DIVE project ends up with a bang! The article “Estimation of spectral notches from pinna meshes: Insights from a simple computational model” (S. Spagnol, R. Miccini, M.G. Onofrei, R. Unnthorsson, S. Serafin) was recently published in the prestigious IEEE/ACM Transactions on Audio, Speech, and Language Processing. An early access version of the article is currently available at the following URL: https://doi.org/10.1109/TASLP.2021.3101928.

3D mesh of a left pinna and selected vertices (red) for a representative elevation angle.

While previous research on spatial sound perception investigated the physical mechanisms producing the most relevant elevation cues, how spectral notches are generated and related to the individual morphology of the human pinna is still a topic of debate. Correctly modeling these important elevation cues, and in particular the lowest frequency notches, is an essential step for individualizing Head-Related Transfer Functions (HRTFs). In this paper we propose a simple computational model able to predict the center frequencies of pinna notches from ear meshes. We apply such a model to a highly controlled HRTF dataset built with the specific purpose of understanding the contribution of the pinna to the HRTF. Results show that the computational model is able to approximate the lowest frequency notch with improved accuracy with respect to other state-of-the-art methods. By contrast, the model fails to predict higher-order pinna notches correctly. The proposed approximation supplements understanding of the morphology involved in generating spectral notches in experimental HRTFs.

Accepted paper @I3DA 2021 conference

Our paper Evaluation of individualized HRTFs in a 3D shooter game” (J.S. Andersen, R. Miccini, S. Serafin, S. Spagnol) has been accepted for presentation at the 1st International Conference on Immersive and 3D Audio that will be held fully virtually on September 8-10, 2021. We will be there to present the final results of IT’S A DIVE.

Two clusters of targets within the first-person shooter game.

The paper proposes a method of using in-game metrics to test the hypothesis that individualized HRTFs improve the experience of both expert and novice players in a First-Person Shooter (FPS) game on a desktop environment. The FPS game provides players with a localization task across three different audio renderings using the same acoustic spaces: stereo panning (control condition), generic binaural rendering, and individualized binaural rendering. Collected metrics from the game include localization error, spatial quality attributes, and an extensive questionnaire.

The individualized HRTFs for each participant were synthesized using the hybrid structural model described in our previous publication. Results show that the 22 participants performed significantly better in the localization task with their individualized HRTF. Increased localization accuracy with respect to the generic HRTF was recorded both in azimuth and elevation perception, and especially in the case of expert game players.

The Viking HRTF Dataset v2

It is our pleasure to announce that the Viking HRTF dataset v2 is now available in open access!

The Viking HRTF dataset v2 is a collection of head-related transfer functions (HRTFs) measured at the University of Iceland. It includes full-sphere HRTFs measured on a dense spatial grid (1513 positions) with a KEMAR mannequin with different pairs of artificial pinnae attached.

The 20 pairs of 35 Shore OO silicone pinna replicas

A first version of the dataset had been released in May 2019. In this second version, the used artificial pinnae were re-casted from the existing inverse molds with 35 Shore OO silicone for both the left and right channels of the KEMAR. Furthermore, the HRTF measurements have been taken inside the anechoic chamber of the University of Iceland in Reykjavík and free-field compensated.

The HRTF measurement setup

The dataset, available in SOFA format, contains measurements for 20 different pairs of artificial pinna replicas (subjects A to T, where T is a pair of standard large KEMAR anthropometric pinnae replicas) plus a pair of flat baffles simulating a “pinna-less” condition (subject Z). 3D scans of the 20 left pinna replicas are also included.

More information is available on the official page of the dataset on Zenodo.

Accepted paper @SMC 2020 conference

Our paper 3D ear shape as an estimator of HRTF notch frequency” (M.G. Onofrei, R. Miccini, R. Unnthórsson, S. Serafin, S. Spagnol) has been accepted for presentation at the 17th Sound and Music Computing Conference that will be held fully virtually on June 24-26, 2020. The conference has members of IT’S A DIVE among its organizers, and will feature this year a dedicated oral session on spatial sound.

Measured and predicted N1 tracks for four test subjects.

The paper makes use of a new dataset of HRTFs (cfr. previous post) containing high resolution median-plane acoustical measurements of a KEMAR mannequin with 20 different left pinna models as well as 3D scans of the same pinna models. This allows for an investigation of the relationship between 3D ear features and the first pinna notch N1 present in the HRTFs. We propose a method that takes the 3D pinna mesh and generates a dataset of depth maps of the pinna viewed from various median-plane elevation angles, each having an associated pinna notch frequency value as identified in the HRTF measurements. A multiple linear regression model is then fit to the depth maps, aiming to predict the corresponding N1. The results of the regression model show moderate improvement to similar previous work built on global and elevation-dependent anthropometric pinna features extracted from 2D images.

New publication in IEEE Signal Processing Letters

The article “HRTF selection by anthropometric regression for improving horizontal localization accuracy” (S. Spagnol) was recently accepted for publication in the IEEE Signal Processing Letters. An early access version of the article is currently available at the following URL:
https://ieeexplore.ieee.org/document/9050904. It will be made open access in its final version.

Absolute lateral error for test set subjects with non-individual HRTFs: (1) best fitting HRTF according to the final regression model (top left), (2) best fitting HRTF according to an alternative regression model on the head dimensions only (top right), (3) HRTF selected by closest head width (bottom left), and (4) FABIAN dummy head HRTF (bottom right). Solid curves represent the approximate localization blur threshold.

The article focuses on objective HRTF selection from anthropometric measurements for minimizing localization error in the frontal half of the horizontal plane. Localization predictions for every pair of 90 subjects in the HUTUBS database are first computed through an interaural time difference-based auditory model, and an error metric based on the predicted lateral error is derived. A multiple stepwise linear regression model for predicting error from inter-subject anthropometric differences is then built on a subset of subjects and evaluated on a complementary test set. Results show that by using just three anthropometric parameters of the head and torso (head width, head depth, and shoulder circumference) the model is able to identify non-individual HRTFs whose predicted horizontal localization error generally lies below the localization blur. When using a lower number of anthropometric parameters, this result is not guaranteed.

Accepted papers @IEEE conferences

The new year kicks off with two paper acceptances. “Auditory model based subsetting of head-related transfer function datasets” (S. Spagnol) will be presented at the 45th IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2020) taking place in Barcelona, Spain in May. We will also be in Atlanta, US next month at the IEEE 5th VR Workshop on Sonic Interactions in Virtual Environments with the paper “HRTF individualization using deep learning” (R. Miccini, S. Spagnol), as part of the 27th IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR 2020).

Quadrant error rate (QE) and root mean square local polar error (PE) matrices for the 373 catalogue subjects.

The former paper outlines a novel HRTF subset selection algorithm based on auditory-model vertical localization predictions and a greedy heuristic to identify an optimal subset of representative HRTFs from a catalogue including the three biggest open HRTF datasets currently available online (see figure above). An objective validation of the optimal subset on a fourth independent dataset, based again on auditory model predictions, is provided. The results show an overwhelming agreement on the choice of this optimal subset, supporting the idea that a large HRTF catalogue can be efficiently reduced by two orders of magnitude while preserving at least one HRTF fitting the very large majority of a pool of listeners in terms of localization error.

True and reconstructed pinna depth maps using a convolutional variational autoencoder.

The research presented in the latter paper focuses on HRTF individualization using deep learning techniques. The rising availability of public HRTF data currently allows experimentation with different input data formats and various computational models. Accordingly, three research directions are investigated: (1) extraction of predictors from user data (see figure above) ; (2) unsupervised learning of HRTFs based on autoencoder networks; and (3) synthesis of HRTFs from anthropometric data using deep multilayer perceptrons. While none of the aforementioned investigations has shown outstanding results to date, the knowledge acquired throughout the development and troubleshooting phases highlights areas of improvement which are expected to pave the way to more accurate models for HRTF individualization.

New HRTF measurements

During the past month we have been working on the collection of a new set of HRTF measurements with the KEMAR mannequin. The measurements have been taken inside the recently built anechoic chamber at the University of Iceland, Reykjavík.

HRTF measurement setup inside the new University of Iceland anechoic chamber.

The new measurements are analogous to those included in the published Viking HRTF dataset. The main differences lie in the measurement environment (previously non-anechoic) and in the production of a sample that includes both left and right pinnae made out of soft (Shore 00-35) silicone. The new set of measurements will serve as an even stronger basis for our HRTF modelling stage.

The new set of pinna samples.

Accepted paper @Nordic SMC 2019 conference

Our paper Estimation of pinna notch frequency from anthropometry: an improved linear model based on Principal Component Analysis and feature selection” (R. Miccini, S. Spagnol) has been accepted for presentation at the 1st Nordic Sound and Music Computing Conference that will take place in Stockholm, Sweden next month.

Box-and-whisker plot showing the mean absolute error (expressed in kHz) of baseline model M0 and the proposed model M1, for training and test sets respectively.

In the paper, anthropometric data from a database of HRTFs is used to estimate the frequency of the first pinna notch in the frontal part of the median plane. Given the presence of high correlations between some of the anthropometric features, as well as repeated values for the same subject observations, we propose the introduction of Principal Component Analysis (PCA) to project the features onto a space where they are more separated. We then construct a regression model employing forward step-wise feature selection to choose the principal components most capable of predicting notch frequencies. Our results show that by using a linear regression model with as few as three principal components (M1 in the above plot), we can predict notch frequencies with a cross-validation mean absolute error of just about 600 Hz.

The Viking HRTF dataset

It is our pleasure to announce that the Viking HRTF dataset is publicly released! It includes a collection of spatially dense head-related transfer functions (HRTFs) measured on a KEMAR mannequin with 20 different artificial left pinnae attached, one at a time. In the current release of the dataset you can find full HRIR and ITD data from the measurement sessions, as well as four sample 3D ear scans. More information is available on the dataset web page.