An Introduction to The National Language Research Institute:
A Sketch of its Achievements
Third Edition(1988)/
HTML Version(1997)
[contens]|
[previous]|
[next]
II.3.6 Vocabulary and Chinese Characters in Ninety Magazines of Today
(1. Report 21, 1962. 321 pages; 2. Report 22, 1963. 256 pages;
3. Report 25, 1964. 337 pages)
After the two preceding surveys, we planned to extend the
scope to the entire field of magazines. This series is a
report on one such additional survey.
The following criteria were adopted in the selection of
magazines: 1) The magazine should be for adults; 2) It should
be on open sale, but not a house organ nor one for
specialists; 3) It should have a large circulation compared
to others of its kind. Such magazines were then classified
into five strata (or simply "group"): I. Review, Literature
and Art ("Tyu~o~ko~ron," " Gunzo~," "Geizyutu Sintyo~,"
etc.); Popular Reading ("Bungei Syunzyu~," "Sunday Mainiti,"
etc.); III, Business and Popular Science ("To~yo~ Keizai
Sinpo~," "Kagaku Asahi," etc.); IV. Housekeeping ("Syuhu no
Tomo," etc.); V. Amusements, Hobbies and Sports ("All
Yomimono," "Eiga Fan," "Igo," "Yakyu~-kai," etc.). The ninety
selected magazines were published quarterly, monthly,
semimonthly, every ten days or weekly. The "universe" of this
survey was the complete text of the issues published in 1956
(total 227,000 pages). The number of running words was estimated
at some 160 million b-units, including sixty million occurrences
of zyosi and zyodosi. We investigated some 440 thousand words
(not counting zyosi and zyodosi), and some 100 thousand zyosi
and zyodosi.
For this survey we devised a new sampling plan to
guarantee the estimation precision for the small frequency of the
1/10,000 order. This plan is a kind of stratified cluster
sampling, where each cluster in the same stratum is formed by
random combination of one-eighth-page-size parts of texts in such
a way that the number of running words in any cluster is
approximately equal to a certain constant. We believe that our
method, including such a sampling plan, can make possible
the manual completion of the statisical aspect of word count.
Report 21 gives both a general description, with a full
discussion of how our sampling-estimation method was
applied, and frequency tables of words with occurred seven
times or more in the sample, separately for zyosi and zyodosi and
for other words. The tables are arranged in the order of the
kana-syllabary for the former (140 entries are listed), and in
the order both of the kana-syllabary and of frequency in the
whole and in every stratum. For words with sample frequencies
over 49, intervals of the confidence coefficient(95%) and
estimation precisions are given in addition to their relative
frequencies.
Frequency distribution:
Sample Frequency Different Words Percentage of
Running Words
1-6 32,782 14
7- 7,234 86
(50- ) (1,220) (63)
Total 40,016 100
Report 22 gives a frequency table of 1,995 Chinese
characters which occurred nine times or more in a smaller
sample (two-thirds of the total) drawn at random from the
initial sample, a list of these characters showing their uses
classified by on and kun reading, and some analyses. An index to
all the Chinese characters occurring in the sample is appended.
Frequency distribution:
Sample Frequency Different Percentage of
Characters Running Characters
1-8 1,333 1.4
9- 1,995* 98.6
Total 3,328** 100.0
* Including 1,673 To~yo~ Kanzi.
** In the entire sample used for the above-mentioned word
count, the number of different characters amounted to 3,505.
Report 25 contains the following sections:
1) Fundamentalities of words- The fundamentality function,
f=a+blogp+clogsc, is fitted by the least square method, to
twenty-five sets of a trial (whose components are the
experts' evaluation of a set of quantitatively similar
words, the averaged relative frequency, and the averaged
degree of scattering). This chapter contains the table of the
fundamentalities of the 1,200 most frequent words and semantic
classification of the 700 most fundamental words.
2) Statistical structure of the vocabulary- Three topics
are here discussed: (1) How many different words belong to
each word-frequency grade, and what proportion of the total
occurrences is covered by the accumulative number of such
different words; (2) Distributional differences among parts
of speech and among classes by word origin; (3) The
distribution of inflectional forms of verbs and adjectives.
3) Usage of zyosi and zyodosi- Frequency tables according
to their meanings and to their combinational forms in a pause
group are given. Differences in usage among synonymous zyosi
and zyodosi are discussed. Some quantitative
considerations of zyosi and zyodosi as syntactic markers are also
given.
4) Word-construction- A table of 4,381 compound words and
an analysis of them are given.
5) On a discrimination problem of whether words formally
similar are recognized as the same or as different
words- The discussion of this problem is proposed from two
points of view, with a word list (974 headings)
relating to the problem.
This volume also contains an index to subjects, an outline
of the data, and a table of contents for all three volumes.
This project was carried out cooperatively by HAYASI
Oki, KENBO Hidetosi, SAIGA Hideo, MIZUTANI Sizuo, ISIWATA
Tosio, MIYAZIMA Tatuo and MATUMOTO Akira.
Some articles connected with the above-mentioned
vocabulary surveys have been published in the Annual Report.
[contens]|
[previous]|
[next]