W2C - Web To Corpus

Info

  • Released: 20th December 2011
  • Languages: 120
  • Size: 54.78GB

Download

Examples

for i in `langList.sh -w 1000000 | cut -f1`; do 
	wikiCorpora.sh $i;
done

From each wikipedia with at least 100000 creates corpus.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you make use of W2C data, please, cite the following paper.