The acquisition of definiteness: Analysis of child language data

In a nutshell

  • What and why: This lesson, which includes a home assignment, focuses on the acquisition of definiteness, and how this can be understood by analysing child language data (the frame can be easily adopted by lessons in other disciplines). In this topic-specific lesson, students are given a two-part assignment whereby they shall analyse results on this very topic, taken from different types of published research data. This activity is particularly useful for fields where data collection is diverse in nature. The activity aims at improving the student’s understanding of the field of research and how different types of data can complement each other. Also, it aims at furthering the student’s knowledge and skills regarding structuring and documentation of data.
  • Prerequisite knowledge: A basic knowledge of the discipline-specific topic, a basic knowledge of research methodology.
  • Stuff you need: Access to the internet.
  • Estimated time: 1 hour + 1 week.

Suggested intended learning outcomes

  • Discipline/methods outcomes:
    • Apply theoretical knowledge in the analysis of child language data
    • Synthesise different empirical data on a given phenomenon
    • Assess the value of experimental and naturalistic data for linguistic analysis
    • Understand the mechanisms underlying the acquisition of definiteness
    • Understand typological variation in the acquisition of definiteness
  • Open Data outcomes:
    • Search for research data using keywords
    • Understand how to find information about access to and reuse of data
    • Understand the structure and components of a dataset post
    • Download datasets and open data files

Example dataset

Suggested activity and instructions

You are teaching a course on language acquisition, and you are preparing the session where focus will be on the acquisition of morpho-syntax. In the presentation of the subject, you plan to use examples from a variety of languages, taken from peer-reviewed scientific literature. In the second part of the session, you want the students to start working on a home assignment, where the task is to analyse noun phrases in child language data, with the aim to reach a deeper understanding of the acquisition of definiteness. You want the students to learn about typological variation and therefore plan to introduce them to Baltic vs. Germanic language data. At the same time, you want to further their knowledge in acquisition research methods and therefore plan to introduce them to experimental vs. naturalistic data.

In order to find experimental data, you do a search on “acquisition; definiteness” in the CLARIN Virtual Language Observatory, which you know displays (or ‘harvests’) metadata for datasets in a wide variety of linguistics archives. One of the results is an openly available dataset located in the Tromsø Repository for Language and Linguistics (TROLLing), containing experimental data from Latvian children, Acquisition of definiteness marking in monolingual and bilingual Latvian-speaking children, authored by Urek et al. (2017). The related publication is not yet available, but the read-me file that is published alongside the data files explains the rough details of the methodology are sufficient to prepare the in-class exercise for the students. You download the data file and the readme file, and write an introductory text that explains the exercise, which is to examine the responses from the monolingual informants, identify the pattern of correct vs. incorrect answers, and to provide a possible explanation to the results.

The students work on this exercise in groups.

The second part of the home assignment is to examine the acquisition of definiteness in Germanic. In order to help the students find naturalistic speech data, you do a research on, filtering on Linguistics. You go through the results, the list of available linguistics archives, and find Childes: Child Language Data Exchange System, a well-established established multilingual database with naturalistic speech corpora. The task given to the students, is to go to Childes, examine the index of the database and select one speaker from a corpus at her/his own wish. Childes contains data from a rich variety of Germanic languages, which together represent a certain variability when it comes to definiteness marking. In order to prepare the introductory text to the exercise, you download data from one speaker yourself. The files have the extension .cha, which seems to require a specific program in order to be readable. You go back to Childes and see that the transcriptions are made in a format called CHAT, and need to be opened in a program called Clan. The program is available on the Childes webpage, free to download and open. You download it and open one of the transcription files, without any problem.

You see that the transcription is based on orthography, thus easily accessible to the students, but in order for them to concentrate on the scientific part of the exercise, you write a small introductory text explaining how to go about retrieving the data. The second part of the introductory text contains the very exercise, which is to extract noun phrases from the transcription and identify patterns of target-like vs. non-target-like forms.

The students are encouraged to work on this exercise in the same groups as in the classroom.

The final part of the home assignment is to summarise the results from the two exercises, and place these in a typological and developmental context. This requires the students to do a search for literature on definiteness in the two language families, as well as on the acquisition of definiteness. You direct them to the University Library’s homepage, where they can perform the literature search using the library’s search system.

Options and adaptations

This activity can easily be adapted to other sub-disciplines of linguistics, and to other phenomena within the field. To find linguistic archives with data from other populations (not children), go to and filter on Linguistics. The activity can also be adapted to other disciplines, where data on a given topic can be collected using different methods, and where there is variation across populations.

