The latest release of CzEng now is:

For reproducibility of past experiments, we provide also previous CzEng releases:

  • CzEng 1.7 (~58 M parallel sentences in plaintext): used in WMT18 and WMT19.
  • CzEng 1.6 (~62.5 M parallel sentences, fully automatically annotated): used in WMT17. Note that the text basis is identical to CzEng 1.6pre, the increased number of sentence pairs is only due to document-level (as opposed to segment-level) deduplication used in CzEng 1.6.
  • CzEng 1.6pre (~51.4 M parallel sentences in plaintext): used in WMT16.
  • CzEng 1.0 (~15.0 M parallel sentences with deep syntax): used in WMT12, WMT13 and WMT14.
  • CzEng 0.9 (~8.0 M parallel sentences with deep syntax): used in WMT10 and WMT11.
  • CzEng 0.7 (~1.0 M parallel sentences): used in WMT08 and WMT09.
  • CzEng 0.5 (~0.9 M parallel sentences): the first public release.