TextClassificationDataset
The TextClassificationDataset
class represents a text classification dataset.
- Loads a text classification dataset in a vertical format.
- The data consists of three datasets:
train
dev
test
- Each dataset is a torch.utils.data.Dataset providing
__len__
: number of sentences in the dataset__getitem__
: return the requested sentence as anElement
instance, which is a dictionary with keys "document" and "label", each being a stringdata
: a dictionary of typeElements
, with keys "documents" and "labels"label_vocab
, a npfl138.Vocabulary instance with the label mapping
npfl138.datasets.text_classification_dataset.TextClassificationDataset
Source code in npfl138/datasets/text_classification_dataset.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
Element
class-attribute
instance-attribute
The type of a single dataset element, i.e., a single document and its label.
Elements
class-attribute
instance-attribute
The type of the whole dataset, i.e., a corpus of documents.
Dataset
Bases: Dataset
Source code in npfl138/datasets/text_classification_dataset.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
data
property
data: Elements
Return the whole dataset as a TextClassificationDataset.Elements
object.
__len__
__len__() -> int
Return the number of documents in the dataset.
Source code in npfl138/datasets/text_classification_dataset.py
71 72 73 |
|
__getitem__
Return the index
-th element of the dataset as a dictionary.
Source code in npfl138/datasets/text_classification_dataset.py
75 76 77 |
|
__init__
__init__(name: str) -> None
Create the dataset from the given filename, downloading it if necessary.
Source code in npfl138/datasets/text_classification_dataset.py
79 80 81 82 83 84 85 86 87 88 89 90 |
|
evaluate
staticmethod
Evaluate the predictions
against the gold dataset.
Returns:
-
accuracy
(float
) –The accuracy of the predictions in percentages.
Source code in npfl138/datasets/text_classification_dataset.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
evaluate_file
staticmethod
Evaluate the file with predictions against the gold dataset.
Returns:
-
accuracy
(float
) –The accuracy of the predictions in percentages.
Source code in npfl138/datasets/text_classification_dataset.py
116 117 118 119 120 121 122 123 124 |
|