Learning language(s) How humans do and machines can do too.	Version	☄️.3
Updated	Jul ’25
Author	Bastian Bunzeck	License	MIT

Learning language(s)

How humans do and machines can do too.

Version

☄️.3

Updated

Jul ’25

Author

Bastian Bunzeck

License

MIT

About me

Hi! My name is Bastian Bunzeck and I am a third year PhD student at Bielefeld University. I work in the Computational Linguistics group (CLAUSE) under the supervision of Prof. Sina Zarrieß. I am also a member of the collaborative research center (CRC) 1646 – Linguistic Creativity in Communication in Bielefeld. Before my PhD, I studied English/American Studies and Computer Science at Friedrich Schiller University Jena in Germany and Katholieke Universiteit Leuven in Belgium. In Jena I helped to develop the corpus annotation tool Hexatomic and also worked at the English department.

I am interested in the relationship between (especially usage-based and cognitive approaches to) linguistics on the one hand, and natural language processing on the other hand. The neural turn in NLP has realized many ideas already proposed much earlier in the literature on connectionist modelling. Yet, it remains elusive how well state-of-the-art models and the cognitive/linguistic reality actually map to one another. In my research, I explore the ways in which linguistic knowledge emerges in human language learners and neural language models – mostly from a usage-based and constructionist perspective. Currently, my research focus in this direction lies on very small language models trained with small amounts of data, and their comparability to child language development – lately also from a multilingual perspective!

If you are looking for ways to contact me, check out my page in the Bielefeld University staff directory or send me an email (firstname.lastname@uni-bielefeld.de).

Publications

For up-to-date overviews also check: Google Scholar, PUB - Publications at Bielefeld University and my ORCID page.

Preprints

Bastian Bunzeck, Daniel Duran and Sina Zarrieß. 2025. Do Construction Distributions Shape Formal Language Learning In German BabyLMs? https://arxiv.org/abs/2503.11593 (To appear at CoNLL 2025)
Bastian Bunzeck and Sina Zarrieß. 2025. Subword models struggle with word learning, but surprisal hides it. https://arxiv.org/abs/2502.12835 (To appear at ACL 2025 Main)

Conference/Workshop Papers

Bastian Bunzeck, Daniel Duran, Leonie Schade, and Sina Zarrieß. 2025. Small language models also work with small vocabularies: Probing the linguistic abilities of grapheme- and phoneme-based baby llamas. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6039–6048, Abu Dhabi, UAE. Association for Computational Linguistics. https://aclanthology.org/2025.coling-main.404/
Bastian Bunzeck, Daniel Duran, Leonie Schade, and Sina Zarrieß. 2024. Graphemes vs. phonemes: battling it out in character-based language models. In The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning, pages 54–64, Miami, FL, USA. Association for Computational Linguistics. https://aclanthology.org/2024.conll-babylm.5/
Bastian Bunzeck and Sina Zarrieß. 2024. The SlayQA benchmark of social reasoning: Testing gender-inclusive generalization with neopronouns. In Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, pages 42–53, Miami, Florida, USA. Association for Computational Linguistics. https://aclanthology.org/2024.genbench-1.3/
Bastian Bunzeck and Sina Zarrieß. 2024. Fifty shapes of BLiMP: Syntactic learning curves in language models are not uniform, but sometimes unruly. In Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning, pages 39–55, Gothenburg, Sweden. Association for Computational Linguistics. https://aclanthology.org/2024.clasp-1.7/
Bastian Bunzeck and Sina Zarrieß. 2023. GPT-wee: How small can a small language model really get? In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 7–18, Singapore. Association for Computational Linguistics. https://aclanthology.org/2023.conll-babylm.2/
Bastian Bunzeck and Sina Zarrieß. 2023. Entrenchment matters: Investigating positional and constructional sensitivity in small and large language models. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 25–37, Gothenburg, Sweden. Association for Computational Linguistics. https://aclanthology.org/2023.clasp-1.3

Journal papers

Bastian Bunzeck and Holger Diessel. 2025. The richness of the stimulus: Constructional variation and development in child-directed speech. First Language, 45(2):152–176. https://doi.org/10.1177/01427237241303225
Paula Wojcik, Bastian Bunzeck, and Sina Zarrieß. 2023. The Wikipedia Republic of Literary Characters. Journal of Cultural Analytics, 8(2). https://doi.org/10.22148/001c.70251<
Stephan Druskat, Thomas Krause, Clara Lachenmaier, and Bastian Bunzeck. 2023. Hexatomic: An extensible, OS-independent platform for deep multi-layer linguistic annotation of corpora. Journal of Open Source Software, 8(86):4825. https://doi.org/10.21105/joss.04825

Talks and presentations

2025

What LLMs can do for linguistics…and what linguistics can do for LLMs, (invited guest lecture, seminar on empirical linguistics), Heinrich Heine Universität Düsseldorf (Germany)
Word learning in LMs: A trilogy in four parts, (oral presentation), 1st RTG SFB 1646 & Friends Symposium, Bielefeld University (Germany)
Word learning in (all kinds of) German and English BabyLMs, (poster presentation), HumanCLAIM Workshop, University of Göttingen (Germany)

2024

Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly, (non-archival poster presentation), BlackboxNLP 2024 at EMNLP 2024, Miami/Florida (US)
Constructions in child-directed speech (with Holger Diessel), (peer-reviewed oral presentation), 10th International Conference of the German Cognitive Linguistics Association, Osnabrück University (Germany)
Generating authentic child speech from little data, (poster presentation), NLG in the Lowlands 2024, Bielefeld University (Germany)

2023

GPT-wee: Experiments in downscaling and curriculum learning, (poster presentation), SAIL Workshop on Fundamental Limits of Large Language Models, Bielefeld University (Germany)
From Byte to Babel: Large Language Models and the Tower of Linguistic Knowledge, (peer-reviewed oral presentation), META-LING 2023 - Methodological Exploration and Technological Advances in Linguistics, University of Bamberg (Germany)
Where and How Do Literary Characters Figure in Wikipedia? (with Sina Zarrieß), (invited presentation), International Workshop | Wikipedia, Wikidata and Wikibase: Usage Scenarios for Literary Studies, Free University of Berlin (Germany)

Teaching

Winter term 2025/2026

Introduction to computational linguistics (Einführung in die Computerlinguistik) – practical sessions, accompanying lectures by Sina Zarrieß
Methods of applied computational linguistics (Methoden der angewandten Computerlinguistik) – lectures and practical sessions, taught jointly with Sina Zarrieß

Summer term 2025

Neural nets in language technology – seminar (taught in English)

Winter term 2024/2025

Introduction to computational linguistics (Einführung in die Computerlinguistik) – practical sessions, accompanying lectures by Sina Zarrieß
Methods of applied computational linguistics (Methoden der angewandten Computerlinguistik) – lectures and practical sessions

Summer term 2024

Methods of applied computational linguistics (Methoden der angewandten Computerlinguistik) – lectures and practical sessions
Neural nets in language technology (Neuronale Netze in der Sprachverarbeitung) – practical sessions, accompanying lectures by Sina Zarrieß

Winter term 2023/2024

Introduction to computational linguistics (Einführung in die Computerlinguistik) – practical sessions, accompanying lectures by Sina Zarrieß
Project seminar: Modeling and analysis of dialogue (Projektseminar: Modellierung und Analyse von sprachlichen Dialogen), taught jointly with Simeon Schüz

Summer term 2023

Methods of applied computational linguistics (Methoden der angewandten Computerlinguistik) – practical sessions, accompanying lectures by Sina Zarrieß

Design adapted from Oskar Wickström’s The Monospace Web