Project 9: Hybrid episodic-abstract computational modelling: ASR

Two fundamentals of traditional HMM ASR systems, namely generalization of phoneme-like units and a top-down search procedure, are in conflict with current understanding of HSP, i.e. the preservation of FPD and the importance of bottom-up processing. Properties of recognition of novel words and L2 learning require early abstraction of (sub)-phoneme-like units; while other phenomena hint at detailed segmental matching. Project 9 seeks to further improve its recent combined episodic and abstractionist SR model. The focus for the first half of the project will be on weighting acoustic-phonetic information relative to more abstract knowledge. Later this project may seek to incorporate systematic linguistic knowledge fed to it by Project 10, using the bottom-up information in ways that humans are thought to.

Two recent projects at Leuven have tried to embed these concepts in new ASR architectures. Within the TEMPLATE project the recognition units are of arbitrary length and the matching paradigm is exemplar based. The FlaVoR project explores a new search paradigm for an HMM based ASR in which bottom-up (data-driven) and top-down (knowledge-driven) processes are combined. Recently, very preliminary, though impressive, results have been obtained by a combination of both systems. A bottom-up phonemic recognizer (abstractionist model) defines the search space. In the final recognition (ASR search process) scores from multiple knowledge sources (acoustic and linguistic) are combined. The acoustic contributors are the score from the phonemic recognizer and the score from an exemplar based recognizer that incorporates fine phonetic detail and longer speech units into the recognition process. 

Methods The bottom-up phoneme recognizer can rely on gross properties (traditional HMM) or it can examine FPD (e.g. segmental recognition). The former is likely to have the best generalization behaviour, but the latter will work best in the case of sufficiently similar examples in the database. Advantages and shortcomings of both approaches will be evaluated. The final recognition needs a weighting of the HMM and exemplar scores. We will investigate which factors influence the weighting function. The L1/L2 situation forms a specific test case in which some or all of the components may be fully derived from L1, while only a limited number of the components are adapted to L2.

This work is highly innovative, especially the aim of comparing L1 and L2 recognition. In addition, most Fellows engaged in computational projects are likely to visit Leuven, so the ‘resident’ Leuven Fellow will play a significant mentoring and helping role throughout Theme IV, as well as for other projects, e.g. Project 5.

Young researchers One ER (Demange) is based at Leuven. His likely visits are: substantial periods at Cambridge to learn acoustic phonetics including elements of linguistic structuring of FPD with Sarah Hawkins, and HSP with Dennis Norris and Sarah Hawkins; also Sheffield to enrich his experience of ASR approaches to modelling episodic representations, with Roger Moore.

Links: This project links to Projects 7, 10, and 11. It links engineering/computer science with phonetics.

Working on this project: » Prof Dirk Van Compernolle » Prof Roger Moore » Dr Sébastien Demange » Dr Kris Demuynck » Dr Dino Seppi

< Go back to Projects

August 2010
S M T W T F S
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

Marie Curie Logo