Can Recurrent Neural Networks Learn Nested Recursion?

Jean-Philippe Bernardy


Context-free grammars (CFG) were one of the first formal tools used to model natural languages, and they remain relevant today as the basis of several frameworks. A key ingredient of CFG is the presence of nested recursion.

In this paper, we investigate experimentally the capability of sev- eral recurrent neural networks (RNNs) to learn nested recursion. More precisely, we measure an upper bound of their capability to do so, by simplifying the task to learning a generalized Dyck language, namely one composed of matching parentheses of various kinds. To do so, we present the RNNs with a set of random strings having a given maximum nesting depth and test its ability to predict the kind of closing parenthesis when facing deeper nested strings. We report mixed results: when generalizing to deeper nesting levels, the accuracy of standard RNNs is significantly higher than random, but still far from perfect. Additionally, we propose some non-standard stack-based models which can approach perfect accuracy, at the cost of robustness.


RNN;CFG;nested recursion; machine learning; grammar

Full Text:



Bernardy, Jean-Philippe and Shalom Lappin. 2017. Using deep neural networks to learn syntactic agreement. Linguistic Issues In Language Technology 15(2):15.

Cartling, Bo. 2008. On the implicit acquisition of a context-free grammar by a simple recurrent neural network. Neurocomputing 71(7):1527–1537.

Cho, Kyunghyun, Bart van Merriënboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734. Doha, Qatar: Association for Computational Linguistics.

Das, Sreerupa, C Lee Giles, and Guo-Zheng Sun. 1992. Learning context-free grammars: Capabilities and limitations of a recurrent neural network with an external stack memory. In Proceedings of The Fourteenth Annual Conference of Cognitive Science Society. Indiana University, page 14.

Elman, Jeffrey L. 1991. Distributed representations, simple recurrent net-works, and grammatical structure. Machine learning 7(2-3):195–225.

Gers, Felix A and E Schmidhuber. 2001. Lstm recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks 12(6):1333–1340.

Grefenstette, Edward, Karl Moritz Hermann, Mustafa Suleyman, and Phil Blunsom. 2015. Learning to transduce with unbounded memory. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, pages 1828–1836. Cambridge, MA, USA: MIT Press.

Gulordava, Kristina, Piotr Bojanowski, Edouard Grave, Tal Linzen, and Marco Baroni. 2018. Colorless green recurrent networks dream hierarchically. arXiv preprint arXiv:1803.11138 .

Hochreiter, Sepp and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9(8):1735–1780.

Joulin, Armand and Tomas Mikolov. 2015. Inferring algorithmic patterns with stack-augmented recurrent nets. In Advances in neural information processing systems, pages 190–198.

Karlsson, Fred. 2007. Constraints on multiple center-embedding of clauses. Journal of Linguistics 43(2):365–392.

Karlsson, Fred. 2010. Working memory constraints on multiple center-embedding. In Proceedings of the Cognitive Science Society, vol. 32.

Karpathy, Andrej. 2016. The unreasonable effectiveness of recurrent neural networks

Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and understanding recurrent networks. In Proceedings of ICLR 2015.

Kingma, Diederik P. and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980.

Linzen, Tal, Emmanuel Dupoux, and Yoav Golberg. 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association of Computational Linguistics 4:521–535.

Weiss, Gail, Yoav Goldberg, and Eran Yahav. 2018. On the practical com-putational power of finite precision rnns for language recognition. arXiv preprint arXiv:1805.04908 .

Yogatama, Dani, Yishu Miao, Gabor Melis, Wang Ling, Adhiguna Kuncoro, Chris Dyer, and Phil Blunsom. 2018. Memory architectures in recurrent neural network language models. Proc. ICLR .


  • There are currently no refbacks.