Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial...

47
c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing, Drottning Kristinas v. 31, SE-100 44, Stockholm, Sweden [email protected] Nov. 2003

Transcript of Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial...

Page 1: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 147

HTK Tutorial

Giampiero Salvi

KTH (Royal Institute of Technology)Dep of Speech Music and Hearing

Drottning Kristinas v 31SE-100 44 Stockholm Sweden

giampikthse

Nov 2003

ccopy2008 Giampiero Salvi 247

Introduction

Data formats and manipulation

Data visualization

Training

Recognition

ccopy2008 Giampiero Salvi 347

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 447

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 547

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 647

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 747

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 847

ASR Overview

Training

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

Recognition

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 2: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 247

Introduction

Data formats and manipulation

Data visualization

Training

Recognition

ccopy2008 Giampiero Salvi 347

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 447

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 547

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 647

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 747

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 847

ASR Overview

Training

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

Recognition

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 3: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 347

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 447

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 547

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 647

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 747

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 847

ASR Overview

Training

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

Recognition

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 4: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 447

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 547

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 647

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 747

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 847

ASR Overview

Training

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

Recognition

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 5: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 547

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 647

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 747

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 847

ASR Overview

Training

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

Recognition

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 6: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 647

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 747

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 847

ASR Overview

Training

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

Recognition

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 7: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 747

HTK What is it

I A toolkit for Hidden Markov Modeling

I General purpose but

I optimized for Speech Recognition

I Very flexible and complete (active development)

I Very good documentation (HTKBook)

ccopy2008 Giampiero Salvi 847

ASR Overview

Training

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

Recognition

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 8: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 847

ASR Overview

Training

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

Recognition

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 9: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 947

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 10: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1047

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 11: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1147

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 12: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1247

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 13: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1347

Things that you should have before you start

I familiarity with Unix-like shellI cd ls pwd mkdir cp foreach

I text processing toolsI perl perl perl perl perlI grep gawk tr sed find cat wc

I lots of patience

I the fabulous HTK Book

I a look at the RefRec scripts

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 14: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1447

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 15: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1547

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 16: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1647

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 17: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1747

The HTK tools

I data manipulation toolsHCopy HQuant HLEd HHEd HDMan HBuild

I data visualization toolsHSLab HList HSGen

I training toolsHCompV HInit HRest HERest HEAdapt HSmooth

I recognition toolsHLStats HParse HVite HResults

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 18: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1847

The HTK data formats

data formatsaudio many common formats plus HTK binaryfeatures HTK binarylabels HTK (single or Master Label files) textmodels HTK (single or Master Macro files) text or binaryother HTK text

audiofeatureextraction

training model set

prototype HMM

features

dictionary

labels

audiofeatureextraction

features

dictionary

model set

labelsdecoding

grammar

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 19: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 1947

Usage example (HList)

gt HList

USAGE HList [options] file

Option Default

-d Coerce observation to VQ symbols off-e N End at sample N 0-h Print source header info off-i N Set items per line to N 10-n N Set num streams to N 1-o Print observation structure off-p Playback audio off-r Write raw output off-s N Start at sample N 0-t Print target header info off-z Suppress printing data on-A Print command line arguments off-C cf Set config file to cf default-D Display configuration variables off

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 20: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2047

Command line switches and options

gt HList -e 1 -o -h feature_file

Source feature_file

Sample Bytes 26 Sample Kind MFCC_0

Num Comps 13 Sample Period 100000 us

Num Samples 336 File Format HTK

-------------------- Observation Structure ---------------------

x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0

------------------------ Samples 0-gt1 -------------------------

0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734

1 -13591 -4756 -6037 -3362 3541 3510 2867

0812 0630 5285 1054 8375 40778

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 21: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2147

Configuration file

gt cat config_file

SOURCEKIND = MFCC_0TARGETKIND = MFCC_0_D_A

gt HList -C config_file -e 0 -o -h feature_file

Source feature_fileSample Bytes 26 Sample Kind MFCC_0Num Comps 13 Sample Period 100000 usNum Samples 336 File Format HTK

-------------------- Observation Structure ---------------------x MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7

MFCC-8 MFCC-9 MFCC-10 MFCC-11 MFCC-12 C0 Del-1Del-2 Del-3 Del-4 Del-5 Del-6 Del-7 Del-8Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2Acc-3 Acc-4 Acc-5 Acc-6 Acc-7 Acc-8 Acc-9Acc-10 Acc-11 Acc-12 AccC0

------------------------ Samples 0-gt1 -------------------------0 -14314 -3318 -6263 -7245 7192 4997 0830

3293 5428 6831 5819 5606 40734 -0107-0180 0731 1134 -0723 -0676 1083 -0552-0387 -0592 -2172 -0030 -0170 0236 0170-0241 -0226 -0517 -0244 -0053 0213 -00290097 0225 -0294 0051

----------------------------- END ------------------------------

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 22: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2247

File manipulation tools

I HCopy converts fromto various data formats (audiofeatures)

I HQuant quantizes speech (audio)

I HLEd edits label and master label files

I HDMan edits dictionary files

I HHEd edits model and master macro files

I HBuild converts language models in different formats (morein recognition section)

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 23: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2347

Computing feature files (HCopy)

gt cat config_file

Feature configurationTARGETKIND = MFCC_0TARGETRATE = 1000000SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 2500000USEHAMMING = TPREEMCOEF = 097NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = F input file format (headerless 8 kHz 16 bit linear PCM)SOURCEKIND = WAVEFORMSOURCEFORMAT = NOHEADSOURCERATE = 1250

gt HCopy -C config_file audio_file1 param_file1 audio_file2

gt HCopy -C config_file -S file_list

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 24: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2447

Label files

MLF

filename1[start1 [end1]] label1 [score] auxlabel [auxscore] [comment][start2 [end2]] label2 [score] auxlabel [auxscore] [comment][startN [endN]] labelN [score] auxlabel [auxscore] [comment]

filename2

I [] = optional (0 or 1)

I = possible repetition (0 1 2)

I time stamps are in 100ns units () 10ms = 100000

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 25: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2547

Label file example 1

gt cat alignedmlf

MLFa10001a1rec

0 6400000 sil ltsilgt6400000 8600000 f forra8600000 10400000 oe

10400000 11700000 r11700000 14100000 a14100000 14100000 sp14100000 29800001 sil ltsilgta10001i1rec

0 2600000 sil ltsilgt2600000 4900000 S sju4900000 8300000 uh8300000 8600000 a8600000 8600000 sp8600000 21600000 sil ltsilgt

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 26: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2647

Label file example 2 (HLEd)

gt HLEd -l rsquorsquo -d lexdic -i phonesmlf words2phonesled wordsmlf

gt cat wordsmlf

MLF

a10001a1rec

forra

a10001i1rec

sju

gt cat words2phonesled

EX

IS sil sil

gt cat phonesmlf

MLFa10001a1recsilfoeraspsila10001i1recsilSuhaspsil

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 27: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2747

Dictionary (HDMan)

WORD [OUTSYM] PRONPROB P1 P2 P3 P4

gt cat lexdic

forra f oe r a spsju S uh a sp

gt cat lex2dic

ltsilgt [] silforra f oe r a spsju 03 S uh a spsju 07 S uh sp

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 28: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2847

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

HMM definition (~h)

1

2 3 4

5

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 29: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 2947

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

State definition (~s)

1

2 3 4

5

2

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 30: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3047

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Gaussian mixture component definition(~m)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 31: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3147

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Mean vector definition (~u)Diagonal variance vector definition (~v)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 32: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3247

HMM definition files (HHEd)

~h hmm_nameltBEGINHMMgt

ltNUMSTATESgt 5ltSTATEgt 2

ltNUMMIXESgt 2ltMIXTUREgt 1 08

ltMEANgt 401 00 07 03

ltVARIANCEgt 402 01 01 01

ltMIXTUREgt 2 02ltMEANgt 4

02 03 04 00ltVARIANCEgt 4

01 01 01 02ltSTATEgt 3

~s state_nameltSTATEgt 4

ltNUMMIXESgt 2ltMIXTUREgt 1 07

~m mix_nameltMIXTUREgt 2 03

ltMEANgt 4~u mean_name

ltVARIANCEgt 4~v variance_name

ltTRANSPgt~t transition_name

ltENDHMMgt

Transition matrix definition (~t)

1

2 3 4

5

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 33: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3347

I HSLab graphical tool to label speech (use WaveSurferinstead)

I HList gives information about audio and feature files

I HSGen generates random sentences out of a regular grammar

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 34: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3447

Intermezzo what do we know so far

Training

audiofeatureextraction

prototype HMM

features

dictionary

labels

model settraining

Recognition

grammar

decodingaudiofeatureextraction

features

model set

labels

dictionary

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 35: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3547

model initialization

Initialization procedure depends on the information avaliable atthat time

I HCompV computes the overall mean and varianceInput a prototype HMM

I HInit Viterbi segmentation + parameter estimation Formixture distribution uses K-means

Input a prototype HMM time aligned transcriptions

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 36: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3647

Traning tools

I HRest Baum-Welch re-estimation

Input an initialized model set time aligned transcriptions

I HERest performs embedded Baum-Welch trainingInput an initialized model set timeless transcriptions

I HEAdapt performs adaptation on a limited set of data

I HSmooth smoots a set of context-dependent modelsaccording to the context-independent counterpart

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 37: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3747

Training example RefRec

first pass

prototype HMM rarr HCompV rarr cloning rarr HERest rarr

realignment (HVite) rarr HERest rarr increase MC (HHEd) rarr

HERest rarr realignment (HVite)

second pass

prototype HMM + aligned transcriptions rarr HInit rarr HRest

rarr HERest rarr increase MC (HHEd) rarr HERest

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 38: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3847

Recognition tools

grammar generation

I HLStats creates bigram from training data

I HParse parses a user defined grammar to produce a lattice

decoding

I HVite performs Viterbi decoding

evaluation

I HResults evaluates recognition results

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 39: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 3947

Grammar definition (HParse)

delete

page

line

word

char

insert

end insert

bottom

top

movedown

left

up

right

sil

fil

spk

sil

fil

spk

quit

sil

fil

spk

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 40: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 4047

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 41: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 4147

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

bottom

top

movedown

left

up

right

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 42: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 4247

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

delete

page

line

word

char

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 43: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 4347

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

insert

end insert

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 44: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 4447

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 45: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 4547

Grammar definition (HParse)

gt cat grammarbnf$dir = up | down | left | right$mcmd = move $dir | top | bottom$item = char | word | line | page$dcmd = delete [$item]$icmd = insert$ecmd = end [insert]$cmd = $mcmd | $dcmd | $icmd | $ecmd$noise = sil | fil | spk($noise lt $cmd $noise gt quit $noise)

I [] optional

I zero or moreI () blockI ltgt loop

I ltltgtgt context dep loop

I | alternative

sil

fil

spk

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 46: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 4647

Grammar parsing (HParse) and recognition (HVite)

Parse grammargt HParse grammarbnf grammarslf

Run recognition on file(s)gt HVite -C offlinecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis audio_filewav

Run recognition livegt HVite -C livecfg -H mono_32_2mmf -w grammarslf

-y lab dicttxt phoneslis

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition
Page 47: Giampiero Salvi - KTHgiampi/teaching/htk_tutorial.pdf · c 2008 Giampiero Salvi 1/47 HTK Tutorial Giampiero Salvi KTH (Royal Institute of Technology), Dep. of Speech, Music and Hearing,

ccopy2008 Giampiero Salvi 4747

Evaluation (HResults)

gt HResults -I referencemlf wordlst recognizedmlf

====================== HTK Results Analysis =======================

Date Thu Jan 18 161753 2001

Ref nworkdir_traintestsetmlf

Rec nresults_trainmono_32_2recmlf

------------------------ Overall Results --------------------------

SENT Correct=7407 [H=994 S=348 N=1342]

WORD Corr=9469 Acc=9437 [H=9202 D=196 S=320 I=31 N=9718]

-------------------------------------------------------------------

N = total number I = insertions S = substitutions D = deletions

correct H = N minus S minus D

correct Corr = HN accuracy Acc = HminusIN

= NminusSminusDminusIN

  • Outline
  • Introduction
  • Data formats and manipulation
  • Data visualization
  • Training
  • Recognition