From Abstract Syntax to Universal Dependencies

Prasanth Kolachina, Aarne Ranta

Abstract


Abstract syntax is a semantic tree representation that lies between parse trees and logical forms. It abstracts away from word order and lexical items, but contains enough information to generate both surface strings and logical forms. Abstract syntax is commonly used in compilers as an intermediate between source and target languages. Grammatical Framework (GF) is a grammar formalism that generalizes the idea to natural languages, to capture cross-lingual generalizations and perform interlingual translation. As one of the main results, the GF Resource Grammar Library (GF-RGL) has implemented a shared abstract syntax for over 30 languages. Each language has its own set of concrete syntax rules (morphology and syntax), by which it can be generated from the abstract syntax and parsed into it.

This paper presents a conversion method from abstract syntax trees to dependency trees. The method is applied for converting GF-RGL trees to Universal Dependencies (UD), which uses a common set of la- bels for different languages. The correspondence between GF-RGL and UD turns out to be good, and the relatively few discrepancies give rise to interesting questions about universality. The conversion also has po- tential for practical applications: (1) it makes the GF parser usable as a rule-based dependency parser; (2) it enables bootstrapping UD tree- banks from GF treebanks; (3) it defines formal criteria to assess the informal annotation schemes of UD; (4) it gives a method to check the consistency of manually annotated UD trees with respect to the anno- tation schemes; (5) it makes information from UD treebanks available for the construction and ranking of GF trees, which can improve GF applications such as machine translation. The conversion is tested and evaluated by bootstrapping two small treebanks for 31 languages, as well as comparing a GF version of the English Penn treebank with the UD version. 


Keywords


multilingual grammars; universal dependencies; abstract syntax

Full Text:

PDF

References


Aho, Alfred V. and Jeffrey D. Ullman. 1969. Syntax directed translations and the pushdown assembler. Journal of Computer and System Sciences 3(1):37–56.

Angelov, Krasimir, Björn Bringert, and Aarne Ranta. 2014. Speech-enabled hybrid multilingual translation for mobile devices. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 41–44. Gothenburg, Sweden. Association for Computational Linguistics.

Angelov, Krasimir and Peter Ljunglöf. 2014. Fast Statistical Parsing with Parallel Multiple Context-Free Grammars. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 368–376. Gothenburg, Sweden. Association for Computational Linguistics.

Appel, Andrew. 1998. Modern Compiler Implementation in ML. Cambridge University Press.

Bender, Emily M. and Dan Flickinger. 2005. Rapid Prototyping of Scalable Grammars: Towards Modularity in Extensions to a Language-Independent Core. In Proceedings of the 2nd International Joint Conference on Natural Language Processing IJCNLP-05 (Posters/Demos). Jeju Island, Korea.

Böhmová, Alena, Jan Hajič, Eva Hajičová, and Barbora Hladká. 2003. The Prague dependency treebank. In Treebanks, pages 103–127. Springer.

Butt, Miriam, Helge Dyvik, Tracy Holloway King, Hiroshi Masuichi, and Christian Rohrer. 2002. The Parallel Grammar Project. In COLING 2002, Workshop on Grammar Engineering and Evaluation, pages 1–7.

Collins, Michael. 1996. A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, pages 184–191. Association for Computational Linguistics.

Curry, Haskell B. 1961. Some Logical Aspects of Grammatical Structure. In Structure of Language and its Mathematical Aspects: Proceedings of the Twelfth Symposium in Applied Mathematics, pages 56–68. American Mathematical Society.

Dannélls, Dana, Mariana Damova, Ramona Enache, and Milen Chechev. 2012. Multilingual Online Generation from Semantic Web Ontologies. In Proceedings of the 21st International Conference on World Wide Web, pages 239–242. Lyon, France: ACM.

de Marneffe, Marie-Catherine, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford dependencies: A cross-linguistic typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 4585–4592. Reykjavik, Iceland: European Language Resources Association (ELRA).

de Marneffe, Marie-Catherine and Christopher D. Manning. 2008. The Stanford Typed Dependencies Representation. In Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation, pages 1–8. Manchester, UK: Coling 2008 Organizing Committee.

Dowty, David R. 1979. Word Meaning and Montague Grammar. Dordrecht: D. Reidel.

Dymetman, Marc, Veronika Lux, and Aarne Ranta. 2000. XML and Multilingual Document Authoring: Convergent Trends. In Proceedings of the 18th International Conference on Computational Linguistics, COLING 2000, pages 243–249. Saarbrücken, Germany.

Gazdar, Gerald, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag. 1985. Generalized Phrase Structure Grammar. Oxford: Basil Blackwell.

Hallgren, Thomas and Aarne Ranta. 2000. An Extensible Proof Text Editor. In Logic for Programming and Automated Reasoning: 7th International Conference, LPAR 2000 Proceedings, vol. 1955 of LNCS/LNAI, pages 70– 84. Springer.

Kaljurand, Kaarel and Tobias Kuhn. 2013. A Multilingual Semantic Wiki Based on Attempto Controlled English and Grammatical Framework. In Proceedings of The Semantic Web: Semantics and Big Data: 10th International Conference, ESWC 2013, pages 427–441. Springer.

Khegai, Janna. 2006. GF Parallel Resource Grammars and Russian. In Proceedings of the COLING/ACL 2006 Main Conference, pages 475–482. Sydney, Australia. Association for Computational Linguistics.

Ljunglöf, Peter. 2004. The Expressivity and Complexity of Grammatical Framework. Ph.D. thesis, Department of Computing Science, Chalmers University of Technology and University of Gothenburg.

McCarthy, John. 1962. Towards a mathematical science of computation. In Proceedings of the Information Processing Congress (IFIP) 62, pages 21–28. Munich, West Germany: North-Holland.

Montague, Richard. 1974. Formal Philosophy. New Haven (Conn.) (etc.): Yale University Press. Collected papers edited by Richmond Thomason.

Nivre, Joakim. 2015. Towards a Universal Grammar for Natural Language Processing. In CICLing 2015: Proceedings of Computational Linguistics and Intelligent Text Processing, vol. 9041 of LNCS, pages 3–16. Cairo, Egypt.

Nivre, Joakim, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris, France: European Language Resources Association (ELRA).

Petrov, Slav, Dipanjan Das, and Ryan McDonald. 2012. A Universal Part-of-Speech Tagset. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pages 2089–2096. Istanbul, Turkey: European Language Resources Association (ELRA).

Ranta, Aarne. 2004. Grammatical Framework: A Type-Theoretical Grammar Formalism. The Journal of Functional Programming 14:145–189.

Ranta, Aarne. 2009a. Grammars as Software Libraries. In From Semantics to Computer Science. Essays in Honour of Gilles Kahn, pages 281–308. Cambridge University Press.

Ranta, Aarne. 2009b. The GF Resource Grammar Library. Linguistic Issues in Language Technology Volume 2(2).

Ranta, Aarne. 2011. Grammatical Framework: Programming with Multilingual Grammars. Stanford. CSLI Publications.

Ranta, Aarne, Ramona Enache, and Grégoire Détrez. 2012. Controlled Language for Everyday Use: The MOLTO Phrasebook. In Controlled Natural Language: Second International Workshop, CNL 2010, Revised Papers, vol. 7175 of LNCS/LNAI , pages 115–136. Springer.

Rayner, Manny, David Carter, Pierrette Bouillon, Vassilis Digalakis, and Mats Wirén. 2000. The Spoken Language Translator. Cambridge. Cambridge University Press, 1st edn.

Ruppert, Eugen, Jonas Klesy, Martin Riedl, and Chris Biemann. 2015. Rule-based Dependency Parse Collapsing and Propagation for German and English. In Proceedings of International Conference of the German Society for Computational Linguistics and Language Technology. Essen.

Seki, Hiroyuki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. 1991. On multiple context-free grammars. Theoretical Computer Science 88(2):191–229.


Refbacks

  • There are currently no refbacks.

Comments on this article

View all comments