CommonVoiceCs
The CommonVoiceCs is a Czech subset of the spoken Common Voice dataset.
The task is to transcribe a given audio sample into a sentence. The dataset contains recordings of people speaking in Czech, with input sound waves passed through the usual preprocessing—computing Mel-frequency cepstral coefficients (MFCCs). You can repeat this preprocessing on a given audio using the load_audio and extract_mfcc methods.
The dataset is automatically downloaded if necessary, but note that is has 200MB, so it might take a while. Furthermore, you can listen to the development portion of the dataset.
Each dataset element is a Python dictionary with the following keys:
"mfccs": a sequence of MFCCs with shape[sequence_length, CommonVoiceCs.MFCC_DIM=13],"sentence": a string with the transcription of the audio sample.
The dataset is split into:
train: 9,773 utterances for training;dev: 904 utterances for development (validation);test: 3,240 utterances for testing.
npfl138.datasets.common_voice_cs.CommonVoiceCs
Source code in npfl138/datasets/common_voice_cs.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | |
PAD
class-attribute
instance-attribute
PAD: int = 0
The index of the padding token in the vocabulary.
MFCC_DIM
class-attribute
instance-attribute
MFCC_DIM: int = 13
The dimensionality of the MFCC features.
LETTERS
class-attribute
instance-attribute
LETTERS: int = 48
The number of letters used in the dataset.
LETTER_NAMES
class-attribute
instance-attribute
LETTER_NAMES: list[str] = [
"[PAD]",
" ",
"a",
"á",
"ä",
"b",
"c",
"č",
"d",
"ď",
"e",
"é",
"è",
"ě",
"f",
"g",
"h",
"i",
"í",
"ï",
"j",
"k",
"l",
"m",
"n",
"ň",
"o",
"ó",
"ö",
"p",
"q",
"r",
"ř",
"s",
"š",
"t",
"ť",
"u",
"ú",
"ů",
"ü",
"v",
"w",
"x",
"y",
"ý",
"z",
"ž",
]
The list of letter strings used in the dataset.
LETTERS_VOCAB
class-attribute
instance-attribute
LETTERS_VOCAB: Vocabulary = Vocabulary(LETTER_NAMES)
The npfl138.Vocabulary object of the letters used in the dataset.
Element
class-attribute
instance-attribute
The type of a single dataset element.
Dataset
Bases: TFRecordDataset
Source code in npfl138/datasets/common_voice_cs.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
__len__
__len__() -> int
Return the number of elements in the dataset.
Source code in npfl138/datasets/common_voice_cs.py
70 71 72 | |
__init__
__init__(decode_on_demand: bool = False) -> None
Load the CommonVoiceCs dataset, downloading it if necessary.
Source code in npfl138/datasets/common_voice_cs.py
86 87 88 89 90 | |
load_audio
Load an audio file and return the audio tensor and sample rate.
Optionally resample the audio to the target sample rate.
Source code in npfl138/datasets/common_voice_cs.py
100 101 102 103 104 105 106 107 108 109 | |
mfcc_extract
Extract MFCC features from an audio tensor.
This function can be used to extract MFCC features from any audio sample, allowing to perform speech recognition on any audio sample.
Source code in npfl138/datasets/common_voice_cs.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
EditDistanceMetric
class-attribute
instance-attribute
EditDistanceMetric = EditDistance
The edit distance metric used for evaluation.
evaluate
staticmethod
Evaluate the predictions against the gold dataset.
Returns:
-
edit_distance(float) –The average edit distance of the predictions.
Source code in npfl138/datasets/common_voice_cs.py
143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
evaluate_file
staticmethod
Evaluate the file with predictions against the gold dataset.
Returns:
-
edit_distance(float) –The average edit distance of the predictions.
Source code in npfl138/datasets/common_voice_cs.py
158 159 160 161 162 163 164 165 166 167 168 | |