TextClassificationDataset
The TextClassificationDataset class represents a text classification dataset.
Each dataset element is a Python dictionary with the keys:
"document": the document as a string"label": the document label as a string
The dataset is split into:
traindevtest
Each split is a torch.utils.data.Dataset providing:
__len__: number of documents in the dataset__getitem__: return the requested sentence as anElementdictionary with keys "document" and "label"- label_vocab, which is a npfl138.Vocabulary instance with the label mapping
npfl138.datasets.text_classification_dataset.TextClassificationDataset
Source code in npfl138/datasets/text_classification_dataset.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
Element
class-attribute
instance-attribute
The type of a single dataset element, i.e., a single document and its label.
Elements
class-attribute
instance-attribute
The type of the whole dataset, i.e., a corpus of documents.
Dataset
Bases: Dataset
Source code in npfl138/datasets/text_classification_dataset.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | |
data
property
data: Elements
Return the whole dataset as a TextClassificationDataset.Elements object.
__len__
__len__() -> int
Return the number of documents in the dataset.
Source code in npfl138/datasets/text_classification_dataset.py
75 76 77 | |
__getitem__
Return the index-th element of the dataset as a dictionary.
Source code in npfl138/datasets/text_classification_dataset.py
79 80 81 | |
__init__
__init__(dataset: str) -> None
Load the dataset dataset, downloading it if necessary.
Parameters:
dataset: The name of the dataset, for example czech_facebook.
Source code in npfl138/datasets/text_classification_dataset.py
83 84 85 86 87 88 89 90 91 92 93 | |
evaluate
staticmethod
Evaluate the predictions against the gold dataset.
Returns:
-
accuracy(float) –The accuracy of the prediction.
Source code in npfl138/datasets/text_classification_dataset.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
evaluate_file
staticmethod
Evaluate the file with predictions against the gold dataset.
Returns:
-
accuracy(float) –The accuracy of the predictions.
Source code in npfl138/datasets/text_classification_dataset.py
119 120 121 122 123 124 125 126 127 | |