An Introduction to The National Language Research Institute: A Sketch of its Achievements
Third Edition(1988)/ HTML Version(1997)

[contens]| [previous]| [next]

II.3.6 Vocabulary and Chinese Characters in Ninety Magazines of Today

(1. Report 21, 1962. 321 pages; 2. Report 22, 1963. 256 pages; 3. Report 25, 1964. 337 pages)
After the two preceding surveys, we planned to extend the scope to the entire field of magazines. This series is a report on one such additional survey. The following criteria were adopted in the selection of magazines: 1) The magazine should be for adults; 2) It should be on open sale, but not a house organ nor one for specialists; 3) It should have a large circulation compared to others of its kind. Such magazines were then classified into five strata (or simply "group"): I. Review, Literature and Art ("Tyu~o~ko~ron," " Gunzo~," "Geizyutu Sintyo~," etc.); Popular Reading ("Bungei Syunzyu~," "Sunday Mainiti," etc.); III, Business and Popular Science ("To~yo~ Keizai Sinpo~," "Kagaku Asahi," etc.); IV. Housekeeping ("Syuhu no Tomo," etc.); V. Amusements, Hobbies and Sports ("All Yomimono," "Eiga Fan," "Igo," "Yakyu~-kai," etc.). The ninety selected magazines were published quarterly, monthly, semimonthly, every ten days or weekly. The "universe" of this survey was the complete text of the issues published in 1956 (total 227,000 pages). The number of running words was estimated at some 160 million b-units, including sixty million occurrences of zyosi and zyodosi. We investigated some 440 thousand words (not counting zyosi and zyodosi), and some 100 thousand zyosi and zyodosi. For this survey we devised a new sampling plan to guarantee the estimation precision for the small frequency of the 1/10,000 order. This plan is a kind of stratified cluster sampling, where each cluster in the same stratum is formed by random combination of one-eighth-page-size parts of texts in such a way that the number of running words in any cluster is approximately equal to a certain constant. We believe that our method, including such a sampling plan, can make possible the manual completion of the statisical aspect of word count. Report 21 gives both a general description, with a full discussion of how our sampling-estimation method was applied, and frequency tables of words with occurred seven times or more in the sample, separately for zyosi and zyodosi and for other words. The tables are arranged in the order of the kana-syllabary for the former (140 entries are listed), and in the order both of the kana-syllabary and of frequency in the whole and in every stratum. For words with sample frequencies over 49, intervals of the confidence coefficient(95%) and estimation precisions are given in addition to their relative frequencies. Frequency distribution: Sample Frequency Different Words Percentage of Running Words 1-6 32,782 14 7- 7,234 86 (50- ) (1,220) (63) Total 40,016 100 Report 22 gives a frequency table of 1,995 Chinese characters which occurred nine times or more in a smaller sample (two-thirds of the total) drawn at random from the initial sample, a list of these characters showing their uses classified by on and kun reading, and some analyses. An index to all the Chinese characters occurring in the sample is appended. Frequency distribution: Sample Frequency Different Percentage of Characters Running Characters 1-8 1,333 1.4 9- 1,995* 98.6 Total 3,328** 100.0 * Including 1,673 To~yo~ Kanzi. ** In the entire sample used for the above-mentioned word count, the number of different characters amounted to 3,505. Report 25 contains the following sections: 1) Fundamentalities of words- The fundamentality function, f=a+blogp+clogsc, is fitted by the least square method, to twenty-five sets of a trial (whose components are the experts' evaluation of a set of quantitatively similar words, the averaged relative frequency, and the averaged degree of scattering). This chapter contains the table of the fundamentalities of the 1,200 most frequent words and semantic classification of the 700 most fundamental words. 2) Statistical structure of the vocabulary- Three topics are here discussed: (1) How many different words belong to each word-frequency grade, and what proportion of the total occurrences is covered by the accumulative number of such different words; (2) Distributional differences among parts of speech and among classes by word origin; (3) The distribution of inflectional forms of verbs and adjectives. 3) Usage of zyosi and zyodosi- Frequency tables according to their meanings and to their combinational forms in a pause group are given. Differences in usage among synonymous zyosi and zyodosi are discussed. Some quantitative considerations of zyosi and zyodosi as syntactic markers are also given. 4) Word-construction- A table of 4,381 compound words and an analysis of them are given. 5) On a discrimination problem of whether words formally similar are recognized as the same or as different words- The discussion of this problem is proposed from two points of view, with a word list (974 headings) relating to the problem. This volume also contains an index to subjects, an outline of the data, and a table of contents for all three volumes. This project was carried out cooperatively by HAYASI Oki, KENBO Hidetosi, SAIGA Hideo, MIZUTANI Sizuo, ISIWATA Tosio, MIYAZIMA Tatuo and MATUMOTO Akira. Some articles connected with the above-mentioned vocabulary surveys have been published in the Annual Report.

[contens]| [previous]| [next]