I am interested in the relationship between (especially usage-based
and cognitive approaches to) linguistics on the one hand, and natural
language processing on the other hand. The neural turn in NLP has
realized many ideas already proposed much earlier in the literature on
connectionist modelling. Yet, it remains elusive how well
state-of-the-art models and the cognitive/linguistic reality actually
map to one another. In my research, I explore the ways in which
linguistic knowledge emerges in human language learners and neural
language models – mostly from a usage-based and constructionist
perspective. Lately, my research focus in this direction has been on
very small language models trained with small amounts of data, and their
comparability to child language development.
Bastian Bunzeck, Daniel Duran, Leonie Schade, and
Sina Zarrieß. 2025. Small language models also work with small
vocabularies: Probing the linguistic abilities of grapheme- and
phoneme-based baby llamas. In Proceedings of the 31st International
Conference on Computational Linguistics, pages 6039–6048, Abu
Dhabi, UAE. Association for Computational Linguistics. https://aclanthology.org/2025.coling-main.404/
Bastian Bunzeck and Sina Zarrieß. 2024. The SlayQA
benchmark of social reasoning: Testing gender-inclusive generalization
with neopronouns. In Proceedings of the 2nd GenBench Workshop on
Generalisation (Benchmarking) in NLP, pages 42–53, Miami, Florida,
USA. Association for Computational Linguistics. https://aclanthology.org/2024.genbench-1.3/
Bastian Bunzeck and Sina Zarrieß. 2024. Fifty
shapes of BLiMP: Syntactic learning curves in language models are not
uniform, but sometimes unruly. In Proceedings of the 2024 CLASP
Conference on Multimodality and Interaction in Language Learning,
pages 39–55, Gothenburg, Sweden. Association for Computational
Linguistics. https://aclanthology.org/2024.clasp-1.7/
Bastian Bunzeck and Sina Zarrieß. 2023. GPT-wee:
How small can a small language model really get? In Proceedings of
the BabyLM Challenge at the 27th Conference on Computational Natural
Language Learning, pages 7–18, Singapore. Association for
Computational Linguistics. https://aclanthology.org/2023.conll-babylm.2/
Bastian Bunzeck and Sina Zarrieß. 2023.
Entrenchment matters: Investigating positional and constructional
sensitivity in small and large language models. In Proceedings of
the 2023 CLASP Conference on Learning with Small Data (LSD), pages
25–37, Gothenburg, Sweden. Association for Computational Linguistics. https://aclanthology.org/2023.clasp-1.3
Journal papers
Bastian Bunzeck and Holger Diessel. 2024. The
richness of the stimulus: Constructional variation and development in
child-directed speech. First Language. https://doi.org/10.1177/01427237241303225
Paula Wojcik, Bastian Bunzeck, and Sina Zarrieß.
2023. The Wikipedia Republic of Literary Characters. Journal of
Cultural Analytics, 8(2). https://doi.org/10.22148/001c.70251<
Stephan Druskat, Thomas Krause, Clara Lachenmaier, and
Bastian Bunzeck. 2023. Hexatomic: An extensible,
OS-independent platform for deep multi-layer linguistic annotation of
corpora. Journal of Open Source Software, 8(86):4825. https://doi.org/10.21105/joss.04825
Talks and presentations
2024
Fifty shapes of BLiMP: syntactic learning curves in language
models are not uniform, but sometimes unruly, (non-archival poster
presentation), BlackboxNLP 2024 at EMNLP 2024, Miami/Florida (US)
Constructions in child-directed speech (with Holger
Diessel), (peer-reviewed oral presentation), 10th International
Conference of the German Cognitive Linguistics Association, Osnabrück
University (Germany)
Generating authentic child speech from little data, (poster
presentation), NLG in the Lowlands 2024, Bielefeld University
(Germany)
2023
GPT-wee: Experiments in downscaling and curriculum
learning, (poster presentation), SAIL Workshop on Fundamental
Limits of Large Language Models, Bielefeld University (Germany)
From Byte to Babel: Large Language Models and the Tower of
Linguistic Knowledge, (peer-reviewed oral presentation), META-LING
2023 - Methodological Exploration and Technological Advances in
Linguistics, University of Bamberg (Germany)
Where and How Do Literary Characters Figure in Wikipedia?
(with Sina Zarrieß), (invited presentation), International Workshop |
Wikipedia, Wikidata and Wikibase: Usage Scenarios for Literary Studies,
Free University of Berlin (Germany)
Teaching
Summer term 2025
Neural nets in language technology – seminar (taught in
English)
Winter term 2024/2025
Introduction to computational linguistics (Einführung in die
Computerlinguistik) – practical sessions, accompanying lectures by
Sina Zarrieß
Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – lectures and practical
sessions
Summer term 2024
Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – lectures and practical
sessions
Neural nets in language technology (Neuronale Netze in der
Sprachverarbeitung) – practical sessions, accompanying lectures by
Sina Zarrieß
Winter term 2023/2024
Introduction to computational linguistics (Einführung in die
Computerlinguistik) – practical sessions, accompanying lectures by
Sina Zarrieß
Project seminar: Modeling and analysis of dialogue
(Projektseminar: Modellierung und Analyse von sprachlichen
Dialogen), taught jointly with Simeon Schüz
Summer term 2023
Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – practical sessions, accompanying
lectures by Sina Zarrieß