ML p(r)ior | Lipreading with Long Short-Term Memory

Lipreading with Long Short-Term Memory

2016-01-29
Lipreading, i.e. speech recognition from visual-only recordings of a speaker's face, can be achieved with a processing pipeline based solely on neural networks, yielding significantly better accuracy than conventional methods. Feed-forward and recurrent neural network layers (namely Long Short-Term Memory; LSTM) are stacked to form a single structure which is trained by back-propagating error gradients through all the layers. The performance of such a stacked network was experimentally evaluated and compared to a standard Support Vector Machine classifier using conventional computer vision features (Eigenlips and Histograms of Oriented Gradients). The evaluation was performed on data from 19 speakers of the publicly available GRID corpus. With 51 different words to classify, we report a best word accuracy on held-out evaluation speakers of 79.6% using the end-to-end neural network-based solution (11.6% improvement over the best feature-based solution evaluated).
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2015-11-04
1511.01432 | cs.LG

We present two approaches that use unlabeled data to improve sequence learning with recurrent networ… show more
PDF

Highlights - Most important sentences from the article

2015-08-09
1508.01991 | cs.CL

In this paper, we propose a variety of Long Short-Term Memory (LSTM) based models for sequence taggi… show more
PDF

Highlights - Most important sentences from the article

2016-11-05

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approach… show more
PDF

Highlights - Most important sentences from the article

2015-11-19

One of the challenges in modeling cognitive events from electroencephalogram (EEG) data is finding r… show more
PDF

Highlights - Most important sentences from the article

2015-11-27

Neural network models have been demonstrated to be capable of achieving remarkable performance in se… show more
PDF

Highlights - Most important sentences from the article

2014-11-17

Models based on deep convolutional networks have dominated recent image interpretation tasks; we inv… show more
PDF

Highlights - Most important sentences from the article

2014-12-17

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Ou… show more
PDF

Highlights - Most important sentences from the article

2016-09-14
1609.04301 | cs.SD

TristouNet is a neural network architecture based on Long Short-Term Memory recurrent networks, mean… show more
PDF

Highlights - Most important sentences from the article

2015-10-21

Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very … show more
PDF

Highlights - Most important sentences from the article

2017-03-12

We propose an end-to-end deep learning architecture for word-level visual speech recognition. The sy… show more
PDF

Highlights - Most important sentences from the article

2016-02-18
1602.05875 | stat.ML

Traditional convolutional layers extract features from patches of data by applying a non-linearity o… show more
PDF

Highlights - Most important sentences from the article

2016-04-04

In this paper we present an approach to polyphonic sound event detection in real life recordings bas… show more
PDF

Highlights - Most important sentences from the article

2015-12-09
1512.02949 | cs.CV

In this paper, we describe the system for generating textual descriptions of short video clips using… show more
PDF

Highlights - Most important sentences from the article

2017-07-06

The Recurrent Neural Networks and their variants have shown promising performances in sequence model… show more
PDF

Highlights - Most important sentences from the article

2018-07-12

Recently, recurrent neural networks have become state-of-the-art in acoustic modeling for automatic … show more
PDF

Highlights - Most important sentences from the article

2017-03-23
1703.08089 | cs.CV

The traditional bag-of-words approach has found a wide range of applications in computer vision. The… show more