SUBTLEX-PT-BR

SUBTLEX-PT-BR is a 61 million word corpus of conversational Brazilian Portuguese, compiled using subtitle texts.

Availability: Three versions of the SUBTLEX-PT-BR are available below:

Citation:
Tang, K. (2012) A 61 Million Word Corpus of Brazilian Portuguese Film Subtitles as a Resource for Linguistic Research. UCL Working Papers in Linguistics 24. [pdf] [bib]

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

SUBTLEX-KR

SUBTLEX-KR is a 90 million word corpus of conversational Korean, compiled using subtitle texts.

Availability: (Forthcoming), please email me if you would like to be notified when it becomes available: kevin.tang.10@ucl.ac.uk

Citation:
Tang, K., de Chene, B. (2014, forthcoming) A New Corpus of Colloquial Korean and its Applications. The 14th Laboratory Phonology Conference (LabPhon 14), Tachikawa, Tokyo, Japan. July 2014. [poster]

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

PRAAT TOOLKIT

A selection of Praat scripts written for specific purposes. Please see individual files for details

Silence Inserter: This script inserts a silence between two adjacent labels in a TextGrid. It modifies both the Sound file and the TextGrid file. It works on all the files in specified directories. [Download]

Channel Extractor: This script extracts only one channel of all the files in a specified directory. [Download]

Mean Formant Analyser: This script calculates the mean formant values (F1,F2,F3) of each labelled segment in each pair of Sound and TextGrid files (which have to have the same name). The mean formant values are calculated over a region of the label, e.g. the beginning (20-30%) of a diphthong. It works on all the files in specified directories. [Download]

Citation:
Tang, K. (2014-2015). Praat Toolkit. http://tang-kevin.github.io/Tools.html. [bib]

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

LINGER TOOLKIT

Linger is a software package for performing reading, listening, and other sentence processing experiments.
Even though Linger is an excellent piece of software, its analytical tools, Lingalyzer, Lingrapher, Subjector, are becoming increasingly outdated. The Linger Toolkit aims to fill this gap. Please email kevin.tang.10@ucl.ac.uk if you have any questions or suggestions.

Linger-Summarizer: A Python script that summarizes .dat files from Linger into a tab-delimited format. [Download]
How to use:
The script has a command line interface. It was tested with Python 2.7.6
Input: a) A directory contains dat files, b) the path of the output file
Optional input: a) the encoding of the dat files, b) the upper bound of the RT (default 2500ms) c) the lower bound of the RT (default 100ms).
e.g. "python ./Linger_Summarizer_Tang_2014.py -i ./dats/ -o ./dats.summary.tab"
See all options with "python ./Linger_Summarizer_Tang_2014.py -h"

Linger-Filter: An R script that filters a summarised linger text file data by mean and std calculated on multiple variables. (Version 1.4). [Download]
Description:
This script allows the user to filter a variable, called it VarX (e.g. logRT or Correct) by a subset of one or more variables. For instance, one might want to filter RT by-subject, by-condition and by region, the "grouping variables" here would be subject, condition and region. The filtering strategy is as followed: 1) The mean and standard deviation of VarX are calculated for each unique combination of the grouping variables. 2) VarX above or below N times the standard deviation from the mean are filtered.
How to use: Open the script in R, complete the user specification section (see below for detail), save the script and finally run the whole script.
Citation:
Tang, K. (2014-2015). Linger Toolkit. http://tang-kevin.github.io/Tools.html. [bib]

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.