Abstract: The belief has recently become widespread that the properties of language needed to process it for useful purposes will emerge if sufficiently large quantities of raw text and speech are analyzed automatically using sufficiently sophisticated techniques. The kind of understanding that a linguist attempts to achieve by examining individual specimens at close range has little value, at least for practical purposes. But, if information can be caused to emerge from the raw data only if it is in there in the first place, and it has long been known that this is not the case. A language is a code, that is, a system of arbitrary relations between symbols and things in worlds, real and imaginary. No time or effort invested in examining the symbols will
reveal these relations to one who does not know the code. If this is true, then we must ask why statistically based machine translation, for example, has come as far as it has, and how much further it can expect to go.