.
In early February 2011, there were 1,008,879 English words according to the Global Language Monitor (GLM) and 1.022 million by the Harvard Google / Study who counted the words in 15 million pounds and reveals that the lexical universe is expanding at a rate of 160 words per week. These counts, for they are unreliable, in principle and in their marketing, are probably realistic. Already noticed a lexicographer, there are at least a million insects and each has a name ...
In comparing these counts on the Web, The Oxford Dictionary Français ( OED, 1989 paper) offers only 301,100 entries in its 2005 edition. Double this number (616 500) when one takes into account derived words, compounds, etc.. The online edition ( ODO), freed from material constraints, growing at a rate of 2500 words per quarter (including revisions) and it sticks to synchrony, the OED covering diachronic aspects (changing words). On this last point see OED vs. CDO .
The GLM has a policy even more open than the OED , largely accepting all words, including those from the weaving of English words with words from other languages (Chinese, English, Hindi, etc.). the term coined by the film ( hollywords ), etc..
- boundaries of the body that gives rise to the dictionary are blurred. What is a word that is understood by some? When an unusual word he ceases to be taken into account? What frequency of use, which geographical expansion? What words of experts (scientific, technical), regionalism, and the sociolects géolectes? The delimitation is arbitrary and is not convenient to establish the perimeter of the words of a language.
- In most cases, there are written words, the corpus is drawn from books, newspapers. But the words we say, we mean, how register? Few words from the bases of orality, the French fundamental of Gougenheim developed in the 1950s, didactic purpose in this area, was revolutionary.
- There can be a representative sample of words from one language because it can not exist as a sampling frame from which we could draw words randomly.
Le Grand Robert has 100 000 words (inflected forms and 800 000, an average of 8 per word inflections). It is estimated that the French vocabulary passes 700,000 terms and beyond when it incorporates French technique and various lexical recent creations. Thus, a traditional dictionary to the registration of all these practices writing on the Web, word count varies from 1 to 10.
and all these words, we say that the French use an average of only 3 000 to 30 000, depending on their cultural capital and school.
- The lexicon of the Internet is 250 times richer than the current vocabulary (25 times richer than that of the public said cultivated). So the frequency of words is crucial, therefore their lemmatization too. The higher the frequency of a word is low (high rank), the more likely it is discriminatory (cf. the work of Zipf, reviewed by Mandelbrot, from the results of Shannon).
- Without lemmatization The spelling scrambles all calculations; mastery of spelling varies by age and cultural capital, cf. the Dubois-Buyse scale ). Some "mistakes" in spelling have a source and a socio-linguistic (they target), others are typos (but they say that these mistakes - typo - big dividends in the form of typosquatting !).
References
Claude E. Shannon Warren Waever, The Mathematical Theory of Communication , 1949
Ters Francis, George Mayer, Daniel Reichenbach Dubois-Buyse scale , 1988
Pierre Bourdieu et al. , educational and communication Report , 1968
Leon Brillouin Science and Information Theory , 1959 (Editions Jacques Gabay, 1988)
Benoit Mandelbrot, "Information Theory and Psycholinguistics", in Language by RC Oldfield and JC Marshall, 1968 pp. 263-275.
Leon Brillouin Science and Information Theory , 1959 (Editions Jacques Gabay, 1988)
Benoit Mandelbrot, "Information Theory and Psycholinguistics", in Language by RC Oldfield and JC Marshall, 1968 pp. 263-275.
.
0 comments:
Post a Comment