CLOTH Dataset

Learderboard

Bert model results and codes on CLOTH dataset

We test the Bert model on CLOTH dataset. It achieves 86.0% accuracy, which is in the same level as Amazon Turkers performance. The code is released on Github, and you can find the detail results in the Leaderboard.

Paper

Large-scale Cloze Test Dataset Designed by Teachers

Qizhe Xie*, Guokun Lai*, Zihang Dai, Eduard Hovy (*: equal contribution)

Description

CLOTH is a large-scale cloze test dataset with 7,131 passages and 99,433 questions. The dataset is collected from middle-school and high-school English examinations in China. CLOTH evaluates machine's understanding of multiple aspects of natural language including vocabulary, reasoning and grammar. In addition, CLOTH can be used to evaluate language models' abilities in modeling a long context.

Terms and Conditions

1. The CLOTH dataset is available for non-commercial research purpose only.

2. All passages are obtained from the Internet which are not properties of Carnegie Mellon University. We are not responsible for the contents nor the meanings of these passages.

3. You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit any portion of the contexts and any portion of the derived data for commercial purposes.

4. We reserve the right to terminate your access to the CLOTH dataset at any time.

Download

Please use this link to download the dataset.

Contact

Please contact Guokun Lai and Qizhe Xie for questions about the dataset.

Data Format

Each passage is encoded as a JSON file, which contains the following fields:

article: A string. There are several blanks (denoted as "_") within each passage, where each blank represents a cloze question.
options: A list of options for each question. There are four options for each blank.
answers: A list, representing the golden labels of the questions. The answer can be A, B, C or D.
id: an unique id of the passage.