Sunday, April 12, 2015

Day 1: Familiar with the framework

[1]A little about history:
Kaldi began its existence in the 2009 Johns Hopkins University workshop cumbersomely titled "Low Development Cost, High Quality Speech Recognition for New Languages and Domains"

[2] Makefile
You can edit settings for different compilation options (debugging, speed, performance, precision, etc.) in ~/kaldi-trunk/src/kaldi.mk. (link)

[3] Matrixmatrix library is heavily used in Kaldi, whose matrix library is mostly a C++ wrapper for standard BLAS and LAPACK linear algebra routines. (check this out)

[4] GPU
Kaldi has applied CUDA matrix library. In the ~/kaldi-trunk/src/cudamatrix,  you can search "#if HAVE_CUDA==1" in the .cc files. (for more info)
In their implementation, they usually only run specific tasks on the GPU- mainly neural net training.

Kaldi is intended to be run in "exclusive mode"; whether it's process exclusive or thread exclusive doesn't matter. You can find out what mode your GPU is running in as follows:

# nvidia-smi --query | grep 'Compute Mode'
Compute Mode : Exclusive_Thread

You can set the correct mode by typing nvidia-smi -c 1. You might want to do this in a startup script so it happens each time you reboot.

    -c,   --compute-mode=       Set MODE for compute applications:
                                0/DEFAULT, 1/EXCLUSIVE_THREAD,
                                2/PROHIBITED, 3/EXCLUSIVE_PROCESS

[5] online decoding (link)
By "online decoding" we mean decoding where the features are coming in in real time, and you don't want to wait until all the audio is captured before starting the online decoding. (We're not using the phrase "real-time decoding" because "real-time decoding" can also be used to mean decoding whose speed is not slower than real time, even if it is applied in batch mode).

The approach that we took with Kaldi was to focus for the first few years on off-line recognition, in order to reach state of the art performance as quickly as possible. Now we are making more of an effort to support online decoding.

There are two online-decoding setups: the "old" online-decoding setup, in the subdirectories online/ and onlinebin/, and the "new" decoding setup, in online2/ and online2bin/. The "old" online-decoding setup is now deprecated, and may eventually be removed from the trunk (but remain in ^/branches/complete).

[6] Keyword Search
They focus on word level keyword search for simplicity purpose, but our implementation naturally supports word level as well as subword level keyword search – both our LVCSR module and the KWS module are implemented using weighted finite state transducer (WFST), and the algorithm should work as long as the symbol table properly maps words/subwords to integers.

This one tutorial on youtube that covers the simple structure of FST. (you can start watch at 10:30)


Mapping integers and strings to save RAM space.



No comments:

Post a Comment