logo Ircam

Sound texture perception via statistics of the auditory periphery


Sound texture perception via statistics of the auditory periphery

Josh McDermott

Rainstorms, insect swarms, and galloping horses produce “sound
textures” – the collective result of many similar acoustic events.
Sound textures are distinguished by temporal homogeneity, suggesting
they could be recognized with time-averaged statistics. To test this
hypothesis, we processed real-world textures with an auditory model
containing filters tuned for sound frequencies and their modulations,
and measured statistics of the resulting decomposition. We then
assessed the realism and recognizability of novel sounds synthesized
to have matching statistics. Statistics of individual frequency
channels, capturing spectral power and sparsity, generally failed to
produce compelling synthetic textures. However, combining them with
correlations between channels produced identifiable and
natural-sounding textures. Synthesis quality declined if statistics
were computed from biologically implausible auditory models. The
results suggest that sound texture perception is mediated by
relatively simple statistics of early auditory representations,
presumably computed by downstream neural populations. The synthesis
methodology offers a powerful tool for their further investigation.

Speaker biography:

Josh McDermott is a perceptual scientist studying sound, hearing, and
music in the Center for Neural Science at New York University. His
research in hearing addresses sound representation and auditory scene
analysis using tools from experimental psychology, engineering, and
neuroscience. He is particularly interested in using the gap between
human and machine competence to both better understand biological
hearing and design better algorithms for analyzing sound. His
interests in music stem from the desire to understand why music is
pleasurable, why some things sound good while others do not, and why
we have music to begin with.
McDermott obtained a BA in Brain and Cognitive Science from Harvard,
an MPhil in Computational Neuroscience from the Gatsby Unit at
University College London, a PhD in Brain and Cognitive Science from
MIT, and postdoctoral training in psychoacoustics at the University of
Minnesota. He currently works in the Lab for Computational Vision at
NYU, using computational tools from image processing and computer
vision to explore auditory representation.




mcdermottCasa11.pdf12.05 MB