PADT Analytical Data

Data Set TokensParas Docs Token / Para Para / Doc Token / Doc
+++ 113700 2995 420 37.963 7.131 270.714
ALH 10098 215 25 46.967 8.600 403.920
ANN 12613 209 17 60.349 12.294 741.941
XIA 26338 888 111 29.660 8.000 237.279
AFP 12931 374 50 34.575 7.480 258.620
UMH 38378 881 132 43.562 6.674 290.742
XIN 13342 428 85 31.173 5.035 156.965

ALH Al Hayat News Agency

10098non-root nodes = tokens
215trees = paragraphs
25files
46.9674nodes per tree
8.6000trees per file
403.9200nodes per file

ANN An Nahar News Agency

12613non-root nodes = tokens
209trees = paragraphs
17files
60.3493nodes per tree
12.2941trees per file
741.9412nodes per file

XIA Xinhua News Agency

26338non-root nodes = tokens
888trees = paragraphs
111files
29.6599nodes per tree
8.0000trees per file
237.2793nodes per file

AFP Agence France Presse

12931non-root nodes = tokens
374trees = paragraphs
50files
34.5749nodes per tree
7.4800trees per file
258.6200nodes per file

UMH Ummah Press Service

38378non-root nodes = tokens
881trees = paragraphs
132files
43.5619nodes per tree
6.6742trees per file
290.7424nodes per file

XIN Xinhua News Agency

13342non-root nodes = tokens
428trees = paragraphs
85files
31.1729nodes per tree
5.0353trees per file
156.9647nodes per file