Japanese
C-JAS logo

What is C-JAS?

C-JAS stands for "Corpus of Japanese As a Second language," which is a speech corpus for people learning Japanese as a second language. The target users of this corpus are those who are interested in and studying how non-Japanese people learn Japanese, as well as Japanese teachers.

Here are three characteristics of this corpus:

  1. This is data collected by doing research on specific learners whose mother tongue is one of two different languages for about three years.
  2. This is data from natural conversations collected for the purpose of studying the learning of grammar.
  3. It can be utilized via the Chunagon corpus search application.

Second-language acquisition is an area of research that studies varied phenomena concerning the learning and acquisition of a foreign language (second language) in addition to one's mother tongue, which inevitably requires data. We are hoping that this corpus can contribute to research in this area and as reference material for Japanese teaching.

C-JAS, which used to be published through another search system, has been transferred to the Chunagon corpus search application. In the course of this transfer, the following revisions have been made to the data:

  1. Misused tags were eliminated and the same tags as in the International Corpus of Japanese as a Second Language (I-JAS) were added for reanalysis.
  2. Third-party speaker codes were changed from "NNS1," "NNS2," ... to "L2," "L3," ...
  3. The notation of withheld personal information was revised (to make it consistent with I-JAS).
  4. Transcription errors and typos were corrected.

(June 2021)


Outline of the data

Here is an outline of the C-JAS data:

(1) Outline of learners
(2) Environment
First year:
They learned in a Japanese language school.
Second and later years:
Each advanced to higher education (university/vocational school/language school)
(3) Research period
(4) Breakdown of the data
Surveys:
7-8 surveys per person (about 60 minutes each time)
Amount of data:
47 pieces (about 46 hours and 30 minutes in total, about 570,000 words)
Survey method:
Free conversations with native Japanese speakers (Common topics were separately set for the respective survey periods.)

Downloads

  1. Transcripts
  2. outline of the data

The C-JAS transcripts are licensed under a Creative Commons Attribution – Non-Commercial – No Derivative Works 4.0 International License.
クリエイティブ・コモンズ・ライセンス

Search system

  1. Chunagon corpus search application

For use

To use C-JAS in publishing your research results or for other publication purposes, it is essential to clearly state that you have utilized C-JAS and provide the following literature information:

  1. Kumiko Sakoda, Aiko Sasaki (Kinoshita), Madoka Konishi & Jae-ho Lee (2014), "Report on the Construction of Corpus of Japanese as a Second Language (C-JAS)," Center for JSL Research and Information, National Institute for Japanese Language and Linguistics, National Institutes for the Humanities (Inter-University Research Institute Corporation)
    *The above report can be downloaded from the website below.
    https://www2.ninjal.ac.jp/jll/lsaj/wp-content/uploads/2015/06/064c14345bdf7c2916b3fd86250e6a2f.pdf

Projects

国立国語研究所共同研究プロジェクト「多文化共生社会における日本語教育研究」(2009~2015年度)