January 2018 Coordinated Project @ ILLC

Scroll down for a detailed description of this project's motivation and structure.

If you are interested in this project, please contact the instructor -- Shane Steinert-Threlkeld -- by e-mail: S.N.M.Steinert-Threlkeld (at) uva.nl.

- 23 Jan: reading list updated
- 22 Jan: updated notebook with predictions/output, which will be useful for many of your projects
- 22 Jan: updated reading list
- 19 Jan: updates to projects list and Jupyter notebook
- 18 Jan: uploaded list of possible projects (access information provided in e-mail)
- 18 Jan: reading list updated
- 17 Jan: added Jupyter notebook introducing TensorFlow and neural networks, which can be used for familiarizing oneself and experiment with TF.
- 17 Jan: refactoring of shanest/quantifier-rnn-learning has been completed. Feel free to inspect the code now!
- 15 Jan: updated reading list
- 19 Dec: updated reading list
- 13 Dec: updated reading list
- 06 Dec: website created!

In this project, students will develop tools and run (computational) experiments to test the hypothesis that *semantic universals* arise because expressions satisfying them are easier to learn than those that do not.

A semantic universal is a property of meaning shared by (almost) all natural languages (possibly conditional on the languages having additional properties). Because languages vary quite a bit, when one finds a universal, one naturally wonders whether there's an explanation for it. Why do all languages have this semantic property? We are interested in exploring the following hypothesis.

**Hypothesis:** Semantic universals arise because they make meaning systems easier to learn.

Of course, the hypothesis can only be supported or refuted when a model of learning a semantic system has been specified. Thus, the hypothesis naturally gives rise to a challenge.

**Challenge:** Provide a model of learning which makes good on the Hypothesis (at least, for some semantic universals).

In recent work, Steinert-Threlkeld and Szymanik attempt to meet the Challenge by training *recurrent neural networks* to learn the meanings of *quantifiers*, a domain where many semantic universals have been posited. They use this framework to explain universals like *monotonicity* and *quantity*. The monotonicity universal works as follows. Consider the following two sentences.

- Many French people smoke cigarettes.
- Many French people smoke.

Sentence (1) entails sentence (2): the former cannot be true without the latter being true. Notice that all we have done is replaced the term "smoke cigarettes" with the strictly more general term "smoke". Also notice that the inference pattern holds for any choice of the restrictor (not just "French people") and pairs of nuclear scope that stand in the same specific-general relation. Because of this, we say that the quantifier "many" is *upward monotone*. If "many" is replaced by "few", the inference pattern reverses. "few" is *downward monotone*. The proposed universal then states:

**Monotonicity:** All simple determiners are monotone.

In the paper, neural networks are trained to learn monotone and non-monotone quantifiers and it is shown that the former are learned significantly faster than the latter. We also show that this pattern holds for another universal called Quantity, which states that quantifiers only care about the sizes of sets, and not the identity of objects or their position in a structure. (This is related to a conception of logicality.)

In this project, students will develop new tools and run more experiments in order to further develop the explanation of semantic universals in terms of learnability. They will be doing original research that could become (part of) publications. Existing code as well as access to computing infrastructure will be provided. Possible topics that can be addressed are:

- Running larger experiments, with many more quantifiers being learned at once, using their semantic properties as statistical factors in their learning rate.
- Developing tools to look inside the resulting neural networks and see how they actually process quantifiers. Do they behave similarly to other proposals from the semantics literature, like semantic automata?
- Extending the framework to explain more universals for quantifiers (such as Extensionality).
- Extending the framework to explain semantic universals in other domains.
- Developing theoretical connections between learnability in neural networks and semantic properties.
- A project of your own choice!

The class will meet 3 times a week in the four weeks from January 8 to February 2. We will provide necessary background in the first two weeks, then transition into coding / experimenting sessions in the last two weeks. The course should be mostly self-contained, though some pre-requisites are listed.

**Week 1:** theoretical background on semantic universals and quantifiers (possibly color as well)

**Week 2:** background on training neural networks, tutorials on how to run your own experiments by modifying provided code

**Week 3:** run experiments! We will have in-class coding sessions for support and question answering.

**Week 4:** finish experiments; write up the results and deliver short presentation

Working knowledge of Python will be very valuable. Knowledge of specific libraries (Numpy, TensorFlow/Keras) will be helpful, but can be learned on the fly. While we will cover neural networks and their training and evaluation in the second week, familiarity with those topics will also help.

Students will be conducting their own (computational) experiments. They will be expected to produce a short write-up (~5 pages) of at least one experimental result, explaining the motivation for their experiment and what they found. On the final day of class, there will be short presentations of results and discussion.

Here we include both reading and coding resources.

- Barwise and Cooper 1981, "Generalized Quantifiers and Natural Language"

This very influential paper introduced generalized quantifiers in to natural language semantics and developed many semantic universals. - Steinert-Threlkeld and Szymanik 2017/2018, "Learnability and Semantic Universals"

We attempt to meet the Challenge above by training neural networks to learn the meanings of quantifiers. This paper forms the motivational basis for this project.

- van Benthem 1986, "Essays in Logical Semantics"

Chapter 6 introduces the semantic automata framework and contains many important results. Earlier chapters also provide very nice background for many of the semantic universals we discuss. - Partee, ter Meulen, and Wall 1990, "Mathematical Methods in Linguistics"

Terrific introduction to syntax, semantics, and the mathematics required to study them. Chapter 14 covers much of the ground on quantifiers that we need. The book also includes an introduction to automata theory and an appendix on semantic automata. - Westerståhl 2011, "Generalized Quantifiers"

A modern overview of generalized quantifiers. - Szymanik 2016,
*Quantifiers and Cognition*(especially chapter 4):

An overview of quantifiers integrating logic and cognitive science. Chapter 4 introduces the semantic automata approach, which could be connected to the approach in our paper above.

- 3Blue1Brown 2017, "Deep learning"

Very well-produced videos introducing neural networks, gradient descent, and back-propagation. - Nielsen 2015, "Neural Networks and Deep Learning"

Good e-book explaining the basic concepts behind neural networks and their training. (I find the notation can be a bit cumbersome here.) - LeCun, Bengio, & Hinton 2015, "Deep learning"

A great scientific review in*Nature*of deep learning. - Colah 2015, "Understanding LSTM Networks"

A step-by-step walk-through of the LSTM computation, showing how it can be seen as gated reading/writing to a form of memory. - Goodfellow, Bengio, and Courvill 2016, "Deep Learning"

A fantastic textbook introduction to neural networks and deep learning.

- Transfer learning:
- Andrew Ng lecture: introduces the idea of transfer learning.
- Donahue et al 2013, "DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition"

A very influential example of transfer learning for computer vision. - Yosinki et al 2014, "How transferable are features in deep neural networks?"

Another case study of transfer learning in vision, showing that early layers in a deep network are more 'general'. - Relationship between automata and RNNs:
- Hupkes, Veldhoen, and Zuidema 2017, "Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure"

Their method of diagnostic classifiers for an RNN could be used to test whether the LSTMs trained in Steinert-Threlkeld & Szymanik 2017/2018 are implementing a strategy like semantic automata. - Weiss, Goldberg, and Yahav 2017, "Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples"

A general method for extracting automata from RNNs. Do the automata extracted from our trained LSTMs match semantic automata? - Looking inside the "black box" of a network:
- Karpathy, Johnson, and Fei-Fei 2015, "Visualizing and Understanding Recurrent Networks"

Some general methods for 'looking inside' and understanding what RNNs are doing.

- shanest/quantifier-rnn-learning: source code for running experiments in the Steinert-Threlkeld and Szymanik paper.

**Note:**I refactored the code very heavily to make it easier to read, extend, and modify. The exact version from the paper can still be found at the semantics-paper branch of the repository. - Keras: library for building and training neural network models
- TensorFlow: Google's open source machine learning library; this is what we used in our paper and is the back-end behind Keras
- TensorFlow Estimators paper: section 3 does a good job explaining the Estimators API that we are using to train networks.