What is C-JAS?
C-JAS stands for "Corpus of Japanese As a Second language," which is a speech corpus for people learning Japanese as a second language. The target users of this corpus are those who are interested in and studying how non-Japanese people learn Japanese, as well as Japanese teachers.
Here are three characteristics of this corpus:
- This is data collected by doing research on specific learners whose mother tongue is one of two different languages for about three years.
- This is data from natural conversations collected for the purpose of studying the learning of grammar.
- It can be utilized via the Chunagon corpus search application.
Second-language acquisition is an area of research that studies varied phenomena concerning the learning and acquisition of a foreign language (second language) in addition to one's mother tongue, which inevitably requires data. We are hoping that this corpus can contribute to research in this area and as reference material for Japanese teaching.
C-JAS, which used to be published through another search system, has been transferred to the Chunagon corpus search application. In the course of this transfer, the following revisions have been made to the data:
- Misused tags were eliminated and the same tags as in the International Corpus of Japanese as a Second Language (I-JAS) were added for reanalysis.
- Third-party speaker codes were changed from "NNS1," "NNS2," ... to "L2," "L3," ...
- The notation of withheld personal information was revised (to make it consistent with I-JAS).
- Transcription errors and typos were corrected.
(June 2021)
Outline of the data
Here is an outline of the C-JAS data:
(1) Outline of learners
- 3 native Chinese speakers (C1-C3: female learners)
- 3 native Korean speakers (K1-K3: 1 female and 2 male learners)
(2) Environment
- Learning Japanese in a classroom environment in Japan
- First year:
- They learned in a Japanese language school.
- Second and later years:
- Each advanced to higher education (university/vocational school/language school)
(3) Research period
- For about 3 years after about 3 months of Japanese learning (surveyed every 3-4 months)
(4) Breakdown of the data
- Surveys:
- 7-8 surveys per person (about 60 minutes each time)
- Amount of data:
- 47 pieces (about 46 hours and 30 minutes in total, about 570,000 words)
- Survey method:
- Free conversations with native Japanese speakers (Common topics were separately set for the respective survey periods.)