CommonVoiceCs
npfl138.datasets.common_voice_cs.CommonVoiceCs
Source code in npfl138/datasets/common_voice_cs.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
PAD
class-attribute
instance-attribute
PAD: int = 0
The index of the padding token in the vocabulary.
MFCC_DIM
class-attribute
instance-attribute
MFCC_DIM: int = 13
The dimensionality of the MFCC features.
LETTERS
class-attribute
instance-attribute
LETTERS: int = 48
The number of letters used in the dataset.
LETTER_NAMES
class-attribute
instance-attribute
LETTER_NAMES: list[str] = [
"[PAD]",
" ",
"a",
"á",
"ä",
"b",
"c",
"č",
"d",
"ď",
"e",
"é",
"è",
"ě",
"f",
"g",
"h",
"i",
"í",
"ï",
"j",
"k",
"l",
"m",
"n",
"ň",
"o",
"ó",
"ö",
"p",
"q",
"r",
"ř",
"s",
"š",
"t",
"ť",
"u",
"ú",
"ů",
"ü",
"v",
"w",
"x",
"y",
"ý",
"z",
"ž",
]
The list of letter strings used in the dataset.
Element
class-attribute
instance-attribute
The type of a single dataset element.
Dataset
Bases: TFRecordDataset
Source code in npfl138/datasets/common_voice_cs.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
__len__
__len__() -> int
Return the number of elements in the dataset.
Source code in npfl138/datasets/common_voice_cs.py
45 46 47 |
|
__init__
__init__(decode_on_demand: bool = False) -> None
Load the CommonVoiceCs dataset, downloading it if necessary.
Source code in npfl138/datasets/common_voice_cs.py
61 62 63 64 65 66 67 68 69 70 71 72 |
|
letters_vocab
property
letters_vocab: Vocabulary
The npfl138.Vocabulary object of the letters used in the dataset.
load_audio
Load an audio file and return the audio tensor and sample rate.
Optionally resample the audio to the target sample rate.
Source code in npfl138/datasets/common_voice_cs.py
87 88 89 90 91 92 93 94 95 96 |
|
mfcc_extract
Extract MFCC features from an audio tensor.
This function can be used to extract MFCC features from any audio sample, allowing to perform speech recording on any audio sample.
Source code in npfl138/datasets/common_voice_cs.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
EditDistanceMetric
class-attribute
instance-attribute
EditDistanceMetric = EditDistance
The edit distance metric used for evaluation.
evaluate
staticmethod
Evaluate the predictions
against the gold dataset.
Returns:
-
edit_distance
(float
) –The average edit distance of the predictions in percentages.
Source code in npfl138/datasets/common_voice_cs.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
evaluate_file
staticmethod
Evaluate the file with predictions against the gold dataset.
Returns:
-
edit_distance
(float
) –The average edit distance of the predictions in percentages.
Source code in npfl138/datasets/common_voice_cs.py
145 146 147 148 149 150 151 152 153 154 155 |
|