If the author of the text misspelled a foreign name (e.g. converted a non-Czech character to a Czech one, say Milošević to Miloševič), it is a low-level error that should be corrected.
Sometimes, foreign characters had been be screwed (e.g. Fran?oise), which may not only lead to an unknown word, it may mislead the tokenizer, resulting in three tokens. Since most work until the release of PDT 2.0 has been done in the ISO Latin 2 encoding, there is a problem with letters not contained in Latin 2. HTML entities should be used but the corresponding accent-free character is also acceptable.