About the data

The data in the Scots Syntax Atlas was collected between December 2015 and July 2018 from over 500 speakers in 146 locations across Scotland, and consists of over 100,000 acceptability judgments and over 275 hours of text-to-speech transcribed sociolinguistic interviews.


The locations

Target locations were determined with an implementation of the methodology in Buchstaller and Alvanides (2013). First, we used the Office of National Statistics’ ‘Travel To Work Area’ maps to identify potential dialect subregions, and then we chose the number of target locations per area on the basis of population density and our assessment of the potential dialect diversity of the area.


The fieldworkers

Fieldworkers were community insiders, primarily recruited from English Language and Linguistics undergraduate programmes in Glasgow, Edinburgh and Aberdeen. This insider status is particularly crucial in the collection of dialect data, where certain non-standard forms may be heavily stigmatised. All fieldworkers received training in delivering the acceptability judgment questionnaire and in conducting sociolinguistic interviews. Each of these fieldworkers collect data from people in their hometown that they knew well.  Some also conducted further interviews in nearby towns where they also had connections.

All data collection was conducted in the participants’ homes in order to collect relaxed, naturalistic data in a comfortable setting.


The participants

In each location, pairs of participants were recruited by the local fieldworker in two age groups: 65+ and 18-25. Participants met a set of standard sociolinguistic criteria (e.g. Labov 1984):

  • Born and brought up in the area;
  • No significant time away from the area;
  • Parents were also from the area;
  • Had not gone on to higher education.

This profile ensures that the speakers exhibit the vernacular norms of the community in question.  These criteria were met for the vast majority of participants in the study.


The questionnaire

The questionnaire was developed by the atlas team on the basis of a review of existing literature on the syntax of Scots, which they used to identify significant syntactic features to test. The team created test sentences which controlled for extraneous factors, basing them on attested examples where possible.

A full version of the questionnaire was piloted with linguists from 10 locations around Scotland, and was adapted from their feedback. A core set of 182 features were tested by all participants in all locations. Other features were judged in certain locations only, based on existing knowledge from the literature and in order to keep the questionnaire a manageable length. Each questionnaire contained approximately 200 examples to be judged.


The judgment task

The delivery of the questionnaire was an adapted version of the “interview method” (see e.g. Barbiers & Bennis 2007). Each target example was presented in a short context that created a naturalistic context for the example (see ex. 1).

  1. You’re telling me you saw me and a friend earlier. You say:
    I saw youse earlier on.

The fieldworker read out the contexts and examples to the first participant in each pair. The participant rated each example on a Likert scale (see ex. 2). Each point on the scale was labelled, and participants were given a copy of the scale to refer to throughout the task. Scores were recorded by the fieldworker.

The first participant subsequently delivered the same questionnaire to the second participant of the pair, carrying out the same process of reading out the contexts and examples for judgment using the Likert scale. The fieldworker remained present for this second half of the task and was able to provide help or correction if necessary.

The acceptability judgment task took around an hour per participant, and all the discussion around the task was recorded.


The conversation data

The second part of the data collection process was a standard sociolinguistic interview (e.g. Labov 1984) which aims to tap the vernacular, the most systematic form of spontaneous speech data. In order to mitigate the Observer’s Paradox (Labov 1972) as much as possible, pairs of participants conversed with eath other on topics ranging from jobs, school life, friends and family to local gossip. The recorded conversations lasted approximately one hour.

The recordings were fully text-to-speech transcribed in Transcriber, following a set of transcription conventions designed to capture morphosyntactic variation in Scots. Lexical and phonological variation was not included in the transcription. The full corpus was subsequently anonymised.

The transcription and anonymisation work was carried out by students at the University of Glasgow.