Hands-on ASR with Kaldi

If you are not satisfied with the book format you bought, please email us for a full-color PDF version for free.

With the increasing demand for In-car Systems, Health Care, Military, Telephone, and our daily life, Automatic Speech Recognition (ASR) related job market is booming right now.

As the leading open source software in ASR field, Kaldi might be the best start point.

We could learn all the concepts and technologies through building and running a Kaldi model, as well as using it in the real world. We don’t yet know how expansive this trend will be, but if you’re a developer who specializes in software developing, now might be the time to capitalize on the rising job opportunities as major apps work to integrate Kaldi.

Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Machine Learning are hot. From installation to the final results, we go through the whole life-cycle of Kaldi developing process using TIMIT corpus. You will actually build up a real ASR model and could apply it into your working environment. All the steps to build Kaldi/TIMIT model have been recorded by screenshot, code, and output. You will not be lost and missed.

Chapter 1 discussing some background of the ASR. Knowing the history and context of the new topic is the best way to understand it in my humble opinion. Always asking: Who is it? Where did it come from? Where is it going? Once the soul questions have been answered, we get to install Kaldi.

Chapter 2 Installation will explain Kaldi environment and installation process. We could have Mac, PC, Linux, Windows or any platform. We test the Kaldi installation with some small projects like yes/no. There are another recipe called 10 digits speech recognition good for testing purpose as well, which is not included in this book.

Chapter 3 downloads and sets up TIMIT in Kaldi with specific environment parameters.

Chapter 4 prepares the data for TIMIT. We learn about FST, dictionary and some other relevant concepts during preparation.

Chapter 5 extracts features. MFCC and CMVN will be discussed in details.

Chapter 6 runs monophone model for TIMIT. All the ASR fundamental concepts have been explained.

Chapter 7 to 9 run triphone model for tri1, tri2, tri3.

Chapter 10 runs SGMM2 model.

Chapter 11 runs MMI + SGMM2.

Chapter 12 runs Dan’s DNN.

Chapter 13 covers all the stages of Karel’s DNN, including store features, pre-training, frame-level cross-entropy, sequence-discriminative training, and iteration of sMBR.

Chapter 14 is the final results of the whole TIMIT output which could be used as a template for comparison. When finishing the whole book, we will be armed with Kaldi ASR Neutral Network models running capability.

This book gives you a start point to pursue higher goals in Artificial Intelligence world.

  1. Preview

  2. Buy a Kindle version or hard copy from Amazon

Kaldi

Learning by doing

Learn Android App developing from scratch, and practise in a real environment. Throughout this ca...

Three Common Problems

Tutorial hell Internet age tutorial is flooding. As a beginner developer, it’s easy to ge...

Grasp the fundamentals first

The fundamentals may not be beautiful but they are reused everywhere. As elementary as they may a...

Don’t read the sample code. Tinker with it!

A jazz musician looking to understand how chords relate to one another. You just play the chords ...

Code by hand. It sharpens proficiency

In modern society, chat replaces meeting, AI looks better than training, time flies, we copy/pas...

Ask for help. Multiple channels.

As awesome as it would be to become the next Bill Gates on your own, the reality is that people l...