logo Ircam

Probabilistic frameworks for Scene understanding


Probabilistic frameworks for Scene understanding


Jon Barker


The talk will consider (hearing-inspired) probabilistic frameworks for scene understanding. In particular, the talk will consider how individual sound sources can be understood in the presence of multiple competing sound sources. Bregman's Auditory Scene Analysis account presents this demixing problem as a two stage process in which innate primitive grouping `rules' are balanced by the role of learnt schema-driven processes. The manner we perceive a scene is determined by the poorly understood balance between these processes. For example, our interpretation of speech in noise is a product of universally applied grouping forces such as cross-frequency pitch grouping and `softer' expectations that are dependent on our learnt (and personal) knowledge of the patterns of speech and language.  This talk will discuss the difficulty of integrating these contrasting organisational principals in a common probabilistic framework. As an example, the talk will feature an ASA inspired approach to robust speech recognition, `fragment decoding'. The talk will use the short-comings of this approach to demonstrate what future frameworks need to be able to do better.

The talk will also defend the adoption of a human-inspired approach to scene understanding, i.e. why we might want to build planes with flapping wings even if they can't fly as fast.


Speaker Biography


Jon Barker is a Senior Lecturer in the Speech and Hearing Research group of the Computer Science department at the University of Sheffield. He has had a long standing interest in the perceptual organisation of complex acoustic scenes and in machine listening systems inspired by our understanding of Auditory Scene Analysis. He has a particular interest in the robust processing of speech in non-stationary noise environments. This interest faces in two directions: using insights gained from ASA to construct noise robust automatic speech recognition system, and using statistical modelling techniques -- adopted from the speech recognition community -- as a basis for understanding speech intelligibility.
Barker obtained a BA in Electrical and Information Science from the University of Cambridge and a Ph.D in Computer Science from the University of Sheffield under the supervision of Prof Martin Cooke. He spent some time working on audio-visual speech perception at ICP in Grenoble before returning as a post-doctoral researcher to Sheffield. He has spent some twenty years researching in the areas of speech and hearing.



barkerCasa11.pdf5.12 MB