The 13th International Conference on Grammatical Inference

is to be held in Delft, The Netherlands, from October 5 through October 7, 2016.

ICGI is a conference on all aspects of grammatical inference, including (but not limited to) theoretical and experimental analysis of different models of grammar induction, and algorithms for induction of different classes of languages and automata.

Colocated with ICGI 2016 is SPiCe, a competition about guessing the next element in a sequence.

Visit the website »

Registration is now open!

Register now »

ICGI 2016 seeks to provide a forum for presentation and discussion of original research papers on all aspects of grammatical inference.

Key interests are machine-learning methods applied to discrete combinatorial structures such as strings, trees, or graphs, and algorithms for learning symbolic models such as grammars, automata, Markov models, or pattern languages.

The conference seeks to provide a forum for presentation and discussion of original research papers on all aspects of grammatical inference including, but not limited to:

Program chairs

Program committee

Submission for regular papers is now closed.

We invite researchers to submit original, unpublished research that fits the scope of ICGI 2016 by June 12, 2016 AOE through EasyChair. The conference proceedings will be published in JMLR: Workshop and Conference Proceedings.

Important Dates

We invite three types of papers:

There are no restrictions on the domain of application as long as the paper provides sufficient background information.

Each paper should contain title, authors and affiliation, mailing address, a brief abstract describing the work and at least three keywords which can describe the contents of the work.

Papers must be submitted in pdf format. The total length of the paper should not exceed 12 pages on A4-size paper. The use of LaTeX is strongly encouraged. Prospective authors are strongly recommended to use the JMLR style file for LaTeX.

We are looking forward to your submissions. For questions about your submission or the conference in general, you can contact Rick Smetsers.

Submission for Work in Progress papers is now closed.

We are now calling for work in progress and ideas that may be of interest to the grammatical inference community.

You will get the opportunity to present and discuss your work in progress or ideas in a designated session of ICGI. Abstracts of work in progress will not appear in the proceedings (as they will not be peer reviewed), but will appear in a conference booklet and on the website.

All abstracts should be submitted by email to rick (AT) cs (DOT) ru (DOT) nl by July 24, 2016.

Important Dates

We invite abstracts on work in progress, which can be either theoretical or experimental, fundamental or application-oriented, solving or proposing important problems.

Prospective authors are invited to submit an abstract which represents original and previously unpublished work. Simultaneous submission to other conferences with published proceedings is not allowed.

Each abstract should contain title, authors and affiliation, mailing address, and at least three keywords which can describe the contents of the work.

Abstracts must be submitted in pdf format. The total length of the paper should not exceed 2 pages on A4-size paper. The use of LaTeX is strongly encouraged. Prospective authors are strongly recommended to use the JMLR style file for LaTeX.

Abstracts will appear in a conference booklet and on the website, and a presentation should be given in the special session.

All abstracts should be submitted by email to rick (AT) cs (DOT) ru (DOT) nl by July 24, 2016.

We are looking forward to your submissions!

The following papers have been accepted for ICGI 2016. The authors will present their work at the conference.

In addition, a collection of work-in-progress extended abstracts will be presented. The list of accepted work-in-progress extended abstracts will be made available later.

The following top researchers will be keynote speakers at ICGI 2016:

Borja Balle

Borja Balle

Borja Balle is currently a Lecturer in Data Science at Lancaster University. He received his PhD from Universitat Politècnica de Catalunya in 2013 and then spent two years as a postdoctoral fellow at McGill University.

His research focuses on the design and analysis of machine learning algorithms for structured data like sequences, trees, and graphs. On the theoretical side, his goal is to advance our understanding of the foundations of data science by identifying essential trade-offs between statistical and computational efficiency.

Besides machine learning, Borja also conducts research in related areas like automata theory, streaming algorithms, and data privacy. Practical applications of his research include efficient algorithms for solving large-scale problems in natural language processing and reinforcement learning.

The problem of learning automata or weighted automata dates back to the origins of computer science. This talk presents a series of recent fundamental data-dependent learning guarantees for this problem based on the notion of Rademacher complexity. These learning theory results can guide the design of new algorithms benefitting from a strong justification.

Hendrik Blockeel

Hendrik Blockeel

Hendrik Blockeel is a professor at the Katholieke Universiteit Leuven (Belgium), and part-time associate professor at the University of Leiden (The Netherlands). His research interests include theory and algorithms for machine learning and data mining in general, with a particular focus on relational learning, graph mining, probabilistic logics, inductive knowledge bases, and applications of these techniques in the broader field of computer science, bio-informatics, and medical informatics.

Prof. Blockeel's main research results include an efficient and versatile relational decision tree learning tool, that has been used in many relational learning applications, a framework for symbolic machine learning that generalized decision tree and rule learning, and experiment databases for machine learning.

Language learning has been studied for decades. For a long time, the focus was on learning the grammatical structure of a language from sentences, or learning the semantics of sentences from examples of sentence/meaning pairs. More recently, there has been increasing interest in grounded language learning, where the language is learned by observing sentences used in a particular context, and trying to link elements of these sentences to elements of the context.

This talk is about an approach called relational grounded language learning. In this approach, the semantics of a sentence is a relational structure, and this structure is learned from sentence/context pairs in which the context is represented in a relational format. Once a model of the link between sentences and semantic structures is in place, it can be used for a variety of purposes: generating sentences describing a given scene, identifying the elements in a scene that a sentence refers to, translating a sentence from one language to another through its semantic representation, and more. The potential of this approach for all these uses has been demonstrated on some simple problems. Although the approach is clearly still in its infancy, we believe it has much potential in terms of helping us understand how humans learn their first language, as well as improving natural language processing technology.

Valerin Spitkovsky

Valentin Spitkovsky

Valentin Spitkovsky completed a doctoral dissertation in Computational Linguistics at Stanford's Artificial Intelligence Laboratory in 2014. His focus has been on unsupervised parsing and grammar induction.

Since then, he has been doing research at Google on Natural Language Processing, Data Mining and Modeling, Machine Intelligence, and Information Retrieval.

Unsupervised learning of hierarchical syntactic structure from free-form natural language text is an important and difficult problem, with implications for scientific goals, such as understanding human language acquisition, or engineering applications, including question answering, machine translation and speech recognition. As is the case with many unsupervised settings in machine learning, grammar induction usually reduces to a non-convex optimization problem. In the first part of the talk, I will review a collection of search heuristics to make expectation-maximization algorithms less sensitive to local optima. However, a deeper challenge of unsupervised learning is that even the locations of global optima of the non-convex objectives, the intrinsic metrics being optimized, are often at best loosely correlated with extrinsic metrics, such as accuracies with respect to reference parse trees. The second part of the talk will cover a suite of constraints on possible valid parses, which can be derived from unparsed surface text forms, to address this concern by helping guide language learners towards linguistically more plausible syntactic constructions.

These results will be used, in the final parts of the talk, to define a family of dependency-and-boundary parsing models and a curriculum learning strategy, co-designed to effectively induce dependency grammars in an unsupervised fashion. The models are parameterized to exploit, as much as possible, observable state, such as words at sentence boundaries, which limits the proliferation of local maxima that is ordinarily caused by presence of latent variables. The training strategy is then to split incoming data into simpler text fragments, in a way that is consistent with the parsing constraints, thus facilitating bootstrapping by increasing numbers of visible edges. Using fragment length as a proxy, the optimization strategy gradually exposes learners to more complex data, starting from just single-word fragments, whose unique parses correspond to a globally optimal initial solution. This grammar induction pipeline attains state-of-the-art accuracies against standard evaluation sets that span a total of nineteen natural languages from several disparate families.

Experimental results presented in this talk strongly suggest that complex learning tasks like grammar induction can be coaxed to more reliably escape local optima and to discover substantially more correct substructures by pursuing learning strategies that begin with simple data and basic models and carefully progress to more complex data instances and expressive model parameterizations.

ICGI 2016 will be held in Delft, The Netherlands.

Delft is a historic town, known for its canals, iconic pottery, painter Johannes Vermeer, microbiologist Antony van Leeuwenhoek, and its association with the Dutch Royal House. This image shows an aerial view of Delft with from left to right three churches, a university tower building and an iconic windmill.

Delft is situated in the provence of South Holland between the cities of Rotterdam and The Hague, in the central (western) part of the Netherlands.

The nearest international airport (Amsterdam Airport Schiphol, AMS) is approximately 40 minutes away by public transport.

Trains from Schiphol to Delft are serviced by NS and typically run throughout the entire day and night. We suggest you use NS Journeyplanner for planning your trip.

NS Journeyplanner »

Conference Venue

The conference will be held in "De Mekelzaal", which is part of Science Centre Delft.

The address of the venue is:

Science Centre Delft
Mijnbouwstraat 120
2628 RX Delft


We have arranged special conference rates for the following hotels:


If you need a visum for visiting The Netherlands, we suggest you apply for a (standard) tourist visum.

Registration for ICGI 2016 is now open. You are registered once you receive a confirmation e-mail.

Registration form »

Registration costs are EUR 360 per person (for the registration fee of ICGI 2016, including participation in the conference, coffee breaks and lunches, the conference dinner, and the social events) and EUR 110 per person for guests (the conference dinner and the social events).

Wednesday, October 5 2016

Author(s) Title Type
08:30 Coffee and registration
09:00 Denis Arrivault, Dominique Benielli, François Denis and Rémi Eyraud Sp2Learn: A Toolbox for the spectral learning of weighted automata Regular
09:30 Kristina Strother-Garcia, Jeffrey Heinz and Hyun Jin Hwangbo Using model theory for grammatical inference: a case study from phonology Regular
10:00 Olgierd Unold, Łukasz Culer and Agnieszka Kaczmarek Visualizing context-free grammar induction with grammar-based classifier system Work-in-progress
10:15 Amos Yeo, John Howroyd, Mark Bishop Using grammar inference for classification Work-in-progress
10:30 Coffee
11:00 Hendrik Blockeel Relational grounded language learning Keynote
12:30 Lunch
14:30 Roland Groz and Catherine Oriat Inferring Non-resettable Mealy machines with n States Regular
15:00 Adrien Boiret, Aurélien Lemay and Joachim Niehren Learning Top-Down Tree Transducers with Regular Domain Inspection Regular
15:30 Witold Dyrka, Francois Coste, Olgierd Unold, Lukasz Culer, Agnieszka Kaczmarek How to Measure the Topological Quality of Protein Grammars? Work-in-progress
15:45 Tobias Endres Determining Syntactical Invariants Within the DOM-Structure of Ajax Applications through XML Grammar Inference Work-in-progress
16:00 Coffee
16:30 Adrian-Horia Dediu, Joana M. Matos and Claudio Moraga Query Learning Automata with Helpful Labels Regular
17:00 Gaetano Pellegrino, Christian Albert Hammerschmidt and Sicco Verwer Learning Deterministic Finite Automata from Infinite Alphabets Regular
17:30 Michael Siebers Inferring Languages of Multi-Dimensional Observations Work-in-progress
17:45 End of day 1

Thursday, October 6 2016

Author(s) Title Type
08:30 Coffee
09:00 Annie Foret and Denis Bechet Simple K-star Categorial Dependency Grammars and their Inference Regular
09:30 Shouhei Fukunaga, Yoshimasa Takabatake, Tomohiro I and Hiroshi Sakamoto Online Grammar Compression for Frequent Pattern Discovery Regular
10:00 Qin Lin and Sicco Verwer Probabilistic Model Learning from Noisy Data Work-in-progress
10:15 Chihiro Shibata, Ryo Yoshinaka Towards Learning Generalized Residual Finite State Automata Work-in-progress
10:30 Coffee
11:00 Borja Balle Theoretical Guarantees for Learning Weighted Automata Keynote
12:30 Lunch
14:30 Social event and conference dinner

Friday, October 7 2016

Author(s) Title Type
08:30 Coffee
09:00 Payam Siyari and Matthias Gallé The Generalized Smallest Grammar Problem Regular
09:30 Alexander Clark Testing Distributional Properties of Context-Free Grammars Regular
10:00 Francois Coste, Mikail Demirdelen A Refined Parsing Graph Approach to Learn Smaller Contextually Substitutable Grammars With Less Data Work-in-progress
10:15 Tomoko Ochi, Ryo Yoshinaka and Akihiro Yamamoto Polynomial time inference of generalization of non-cross pattern languages to term tree languages Work-in-progress
10:30 Coffee
11:00 Valentin Spitkovsky Grammar Induction and Parsing with Dependency-and-Boundary Models Keynote
12:30 Lunch
14:30 SPiCe workshop
16:00 Coffee
16:30 SPiCe workshop
18:00 Closing statement