ML p(r)ior | An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection

An Iterative Deep Learning Framework for Unsupervised Discovery of Speech Features and Linguistic Units with Applications on Spoken Term Detection

2016-02-01
In this work we aim to discover high quality speech features and linguistic units directly from unlabeled speech data in a zero resource scenario. The results are evaluated using the metrics and corpora proposed in the Zero Resource Speech Challenge organized at Interspeech 2015. A Multi-layered Acoustic Tokenizer (MAT) was proposed for automatic discovery of multiple sets of acoustic tokens from the given corpus. Each acoustic token set is specified by a set of hyperparameters that describe the model configuration. These sets of acoustic tokens carry different characteristics fof the given corpus and the language behind, thus can be mutually reinforced. The multiple sets of token labels are then used as the targets of a Multi-target Deep Neural Network (MDNN) trained on low-level acoustic features. Bottleneck features extracted from the MDNN are then used as the feedback input to the MAT and the MDNN itself in the next iteration. We call this iterative deep learning framework the Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN), which generates both high quality speech features for the Track 1 of the Challenge and acoustic tokens for the Track 2 of the Challenge. In addition, we performed extra experiments on the same corpora on the application of query-by-example spoken term detection. The experimental results showed the iterative deep learning framework of MAT-DNN improved the detection performance due to better underlying speech features and acoustic tokens.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2016-11-04

Recurrent neural networks have been very successful at predicting sequences of words in tasks such a… show more
PDF

Highlights - Most important sentences from the article

2017-03-18

Recent papers have shown that neural networks obtain state-of-the-art performance on several differe… show more
PDF

Highlights - Most important sentences from the article

2016-06-22

Zero-resource speech technology is a growing research area that aims to develop methods for speech p… show more
PDF

Highlights - Most important sentences from the article

2016-07-13

Environmental audio tagging aims to predict only the presence or absence of certain acoustic events … show more
PDF

Highlights - Most important sentences from the article

2019-01-25

We consider the task of unsupervised extraction of meaningful latent representations of speech by ap… show more
PDF

Highlights - Most important sentences from the article

2018-09-10
1809.03391 | cs.CL

Previous work in Indonesian part-of-speech (POS) tagging are hard to compare as they are not evaluat… show more
PDF

Highlights - Most important sentences from the article

2018-06-29

For spoken dialog systems to conduct fluid conversational interactions with users, the systems must … show more
PDF

Highlights - Most important sentences from the article

2015-08-20
1508.04999 | cs.LG

Feature learning and deep learning have drawn great attention in recent years as a way of transformi… show more
PDF

Highlights - Most important sentences from the article

2018-06-19
1806.07506 | cs.SD

In the past, Acoustic Scene Classification systems have been based on hand crafting audio features t… show more
PDF

Highlights - Most important sentences from the article

2018-03-23
1803.08863 | cs.CL

How can we effectively develop speech technology for languages where no transcribed data is availabl… show more
PDF

Highlights - Most important sentences from the article

2016-06-10

Objective: Patient notes in electronic health records (EHRs) may contain critical information for me… show more
PDF

Highlights - Most important sentences from the article

2018-03-27

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 cha… show more
PDF

Highlights - Most important sentences from the article

2015-10-05

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition an… show more
PDF

Highlights - Most important sentences from the article

2019-04-26

Automatic measuring of speaker sincerity degree is a novel research problem in computational paralin… show more
PDF

Highlights - Most important sentences from the article

2019-04-01

When the available data of a target speaker is insufficient to train a high quality speaker-dependen… show more
PDF

Highlights - Most important sentences from the article

2019-04-23

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on au… show more
PDF