Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition

Jun Wang, Seongjun Hahm, Ted Mau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

Articulatory data have gained increasing interest in speech recognition with or without acoustic data. Electromagnetic articulograph (EMA) is one of the affordable, currently used techniques for tracking the movement of flesh points on articulators (e.g., tongue) during speech. Determining an optimal set of sensors is important for optimizing the clinical applications of EMA data, due to the inconvenience of attaching sensors on tongue and other intraoral articulators, particularly for patients with neurological diseases. A recent study found an optimal set (tongue tip and body back, upper and lower lips) on tongue and lips for isolated phoneme, word, or short phrase classification from articulatory movement data. This four-sensor set, however, has not been verified in continuous silent speech recognition. In this paper, we investigated the use of data from sensor combinations in continuous speech recognition to verify the finding using a publicly available data set MOCHA-TIMIT. The long-standing speech recognition approach Gaussian mixture model (GMM)-hidden Markov model (HMM) and a recently available approach deep neural network (DNN)-HMM were used as the recognizers. Experimental results confirmed that the four-sensor set is optimal out of the full set of sensors on tongue, lips, and jaw. Adding upper incisor and/or velum data further improved the recognition performance slightly.

Original languageEnglish (US)
Title of host publicationSLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings
EditorsJan Alexandersson, Ercan Altinsoy, Heidi Christensen, Peter Ljunglof, Francois Portet, Frank Rudzicz
PublisherAssociation for Computational Linguistics (ACL)
Pages79-85
Number of pages7
ISBN (Electronic)9781941643792
StatePublished - 2015
Event6th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2015 - Dresden, Germany
Duration: Sep 11 2015 → …

Publication series

NameSLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings

Conference

Conference6th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2015
Country/TerritoryGermany
CityDresden
Period9/11/15 → …

Keywords

  • Articulation
  • Deep neural network
  • Dysarthria
  • Electromagnetic articulograph
  • Hidden Markov model
  • silent speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Signal Processing
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition'. Together they form a unique fingerprint.

Cite this