About me
Hi! I am a fourth-year PhD student, currently visiting Alex
Warstadt’s LeM🍋N Lab at UCSD!
Normally I work at Bielefeld
University in the Computational Linguistics group (CLAUSE), supervised by
Sina Zarrieß. I am also a member of (CRC) 1646 –
Linguistic Creativity in
Communication in Bielefeld. Before that, I studied
English/American Studies and Computer Science at Friedrich Schiller University Jena
in Germany and Katholieke
Universiteit Leuven in Belgium.
I am interested in the relationship between (especially usage-based
and cognitive approaches to) linguistics on the one hand, and NLP on the
other hand. The neural turn in ML has realized many ideas originating
from old-school connectionism. Yet, it remains elusive how well SOTA
models and the cognitive reality actually map to each other. In my
research, I explore the ways in which linguistic knowledge emerges in
human language learners and neural language models, with a focus on
small LMs trained with little data, and their comparability to child
language development.
If you want to contact me, check out the
Bielefeld University staff directory or send me an email
(firstname.lastname@uni-bielefeld.de).
Blog
I am thinking about starting a blog.
Content (probably) coming soon!
Publications
For up-to-date overviews also check: Google
Scholar, PUB
- Publications at Bielefeld University and my ORCID page.
Preprints
Conference/Workshop Papers
- BabyBabelLM: A
Multilingual Benchmark of Developmentally Plausible Training
Data [2026]
Jaap Jumelet, Abdellah
Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav
Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca
Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prevot, Linyang
He, MarĂa Grandury, Mila Marcheva, Negar Foroutan, Nikitas
Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou,
Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, and Leshem
Choshen. EACL 2026.
- Dialogue
is not enough to make a communicative BabyLM (but neither is
developmentally inspired reinforcement learning) [2025]
Francesca Padovani, Bastian Bunzeck, Manar Ali, Omar Momen, Arianna Bisazza, Hendrik Buschmeier,
and Sina ZarrieĂź. BabyLM Workshop
2025.
- Do
Construction Distributions Shape Formal Language Learning In German
BabyLMs? [2025]
Bastian Bunzeck, Daniel
Duran, and Sina ZarrieĂź. CoNLL
2025.
- Subword
models struggle with word learning, but surprisal hides it
[2025]
Bastian Bunzeck and Sina ZarrieĂź.
ACL 2025.
- Small
language models also work with small vocabularies: Probing the
linguistic abilities of grapheme- and phoneme-based baby llamas
[2025]
Bastian Bunzeck, Daniel Duran, Leonie
Schade, and Sina ZarrieĂź. COLING
2025.
- Graphemes
vs. phonemes: battling it out in character-based language
models [2024]
Bastian Bunzeck, Daniel
Duran, Leonie Schade, and Sina ZarrieĂź. BabyLM
Challenge @ CoNLL 2024.
- The SlayQA
benchmark of social reasoning: Testing gender-inclusive generalization
with neopronouns [2024]
Bastian Bunzeck
and Sina ZarrieĂź. GenBench Workshop
2024.
- Fifty shapes
of BLiMP: Syntactic learning curves in language models are not uniform,
but sometimes unruly [2024]
Bastian Bunzeck and Sina ZarrieĂź. CLASP 2024
(MILLing).
- GPT-wee:
How small can a small language model really get? [2023]
Bastian Bunzeck and Sina ZarrieĂź.
BabyLM Challenge @ CoNLL 2023.
- Entrenchment
matters: Investigating positional and constructional sensitivity in
small and large language models [2023]
Bastian Bunzeck and Sina ZarrieĂź. CLASP 2023
(LSD).
Journal papers
Miscellaneous
Talks and presentations
2026
- Constructions in German child-directed and child-available
language, (peer-reviewed poster presentation), 14th International
Conference on Construction Grammar, Princeton University (US)
- Constructing a Language Model’s Language, (poster
presentation), HumanCLAIM Workshop, University of Göttingen
(Germany)
- Towards Communicative BabyLMs (with Sina ZarrieĂź), (invited
talk), HumanCLAIM Workshop, University of Göttingen (Germany)
2025
- Do Construction Distributions Shape Formal Language Learning In
German BabyLMs?, (non-archival poster presentation), The Second
International Workshop on Construction Grammars and NLP (CxGs+NLP 2025),
DĂĽsseldorf (Germany)
- Developmentally plausible pretraining, now also auf Deutsch: a
BabyLM Dataset for German, (non-archival poster presentation),
KONVENS 2025, University of Hildesheim (Germany)
- Child-directed speech is fine-tuned to children’s developmental
needs, (peer-reviewed poster presentation), Bialogue 2025 – The
29th Workshop on the Semantics and Pragmatics of Dialogue, Bielefeld
University (Germany)
- What LLMs can do for linguistics…and what linguistics can do for
LLMs, (invited guest lecture, seminar on empirical linguistics),
Heinrich Heine Universität Düsseldorf (Germany)
- Word learning in LMs: A trilogy in four parts, (oral
presentation), 1st RTG SFB 1646 & Friends Symposium, Bielefeld
University (Germany)
- Word learning in (all kinds of) German and English BabyLMs,
(poster presentation), HumanCLAIM Workshop, University of Göttingen
(Germany)
2024
- Fifty shapes of BLiMP: syntactic learning curves in language
models are not uniform, but sometimes unruly, (non-archival poster
presentation), BlackboxNLP 2024 at EMNLP 2024, Miami/Florida (US)
- Constructions in child-directed speech (with Holger
Diessel), (peer-reviewed oral presentation), 10th International
Conference of the German Cognitive Linguistics Association, OsnabrĂĽck
University (Germany)
- Generating authentic child speech from little data, (poster
presentation), NLG in the Lowlands 2024, Bielefeld University
(Germany)
2023
- GPT-wee: Experiments in downscaling and curriculum
learning, (poster presentation), SAIL Workshop on Fundamental
Limits of Large Language Models, Bielefeld University (Germany)
- From Byte to Babel: Large Language Models and the Tower of
Linguistic Knowledge, (peer-reviewed oral presentation), META-LING
2023 - Methodological Exploration and Technological Advances in
Linguistics, University of Bamberg (Germany)
- Where and How Do Literary Characters Figure in Wikipedia?
(with Sina ZarrieĂź), (invited presentation), International Workshop |
Wikipedia, Wikidata and Wikibase: Usage Scenarios for Literary Studies,
Free University of Berlin (Germany)
Teaching
Winter 2026/2027
- Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – lectures and practical
sessions
Summer 2026
- Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – lectures and practical
sessions
Winter 2025/2026
- Introduction to computational linguistics (EinfĂĽhrung in die
Computerlinguistik) – practical sessions, accompanying lectures by
Sina ZarrieĂź
- Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – lectures and practical sessions,
taught jointly with Sina ZarrieĂź
Summer 2025
- Neural nets in language technology – seminar (taught in
English)
Winter 2024/2025
- Introduction to computational linguistics (EinfĂĽhrung in die
Computerlinguistik) – practical sessions, accompanying lectures by
Sina ZarrieĂź
- Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – lectures and practical
sessions
Summer 2024
- Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – lectures and practical
sessions
- Neural nets in language technology (Neuronale Netze in der
Sprachverarbeitung) – practical sessions, accompanying lectures by
Sina ZarrieĂź
Winter 2023/2024
- Introduction to computational linguistics (EinfĂĽhrung in die
Computerlinguistik) – practical sessions, accompanying lectures by
Sina ZarrieĂź
- Project seminar: Modeling and analysis of dialogue
(Projektseminar: Modellierung und Analyse von sprachlichen
Dialogen), taught jointly with Simeon SchĂĽz
Summer 2023
- Methods of applied computational linguistics (Methoden der
angewandten Computerlinguistik) – practical sessions, accompanying
lectures by Sina ZarrieĂź
Design adapted from Oskar Wickström’s The Monospace
Web