Abstract
Data-driven model building is an important task of scientific
discovery that is seeing real success in the development and
application of discovery programs. Most efforts have targeted fields
of natural science in which the hypothesis spaces are specialized and
deal with domains having considerable formal structure. Less work has
been directed toward qualitative areas of social science, in which
model building also arises. This paper reports the first automation
of a modelling task from linguistic anthropology: the analysis of
natural-language kinship terminologies in terms of simpler semantic
components. Our approach uses three generic simplicity criteria to
comprehensively find all the simplest models that are consistent with
kinship data. We have reproduced results from the linguistics
literature, but have also found simpler models in some cases. The
task has strong generic elements: extracts of the code are applied to
other data sets to illustrate this potential.
full paper