Articulation-to-speech synthesis using articulatory flesh point sensors' orientation information

Beiming Cao, Myungjong Kim, Jun R. Wang, Jan Van Santen, Ted Mau, Jun Wang

Research output: Contribution to journalConference articlepeer-review

19 Scopus citations

Abstract

Articulation-to-speech (ATS) synthesis generates audio waveform directly from articulatory information. Current works in ATS used articulatory movement information (spatial coordinates) only. The orientation information of articulatory flesh points has rarely been used, although some devices (e.g., electromagnetic articulography) provide that. Previous work indicated that orientation information contains significant information for speech production. In this paper, we explored the performance of applying orientation information of flesh points on articulators (i.e., tongue, lips and jaw) in ATS. Experiments using articulators' movement information with or without orientation information were conducted using standard deep neural networks (DNNs) and long-short term memory-recurrent neural networks (LSTM-RNNs). Both objective and subjective evaluations indicated that adding orientation information of flesh points on articulators in addition to movement information generated higher quality speech output than using movement information only.

Original languageEnglish (US)
Pages (from-to)3152-3156
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
StatePublished - 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: Sep 2 2018Sep 6 2018

Keywords

  • Articulation-to-speech synthesis
  • Deep neural network
  • Orientation information

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Articulation-to-speech synthesis using articulatory flesh point sensors' orientation information'. Together they form a unique fingerprint.

Cite this