Distinguishing Voices in The Waste Land using Computational Statistics

Julian Brooke, Adam Hammond, Graeme Hirst

Abstract


T. S. Eliot’s poem The Waste Land is a notoriously challenging example of modernist poetry, mixing the independent viewpoints of over ten distinct characters without any clear demarcation of which voice is speaking when. In this work, we apply unsupervised techniques in computational stylistics to distinguish the particular styles of these voices, offering a computer’s perspective on longstanding debates in literary analysis. Our work includes a model for stylistic segmentation that looks for points of maximum stylistic variation, a k-means clustering model for detecting non-contiguous speech from the same voice, and a stylistic profiling approach which makes use of lexical resources built from a much larger collection of literary texts. Evaluating using an expert interpretation, we show clear progress in distinguishing the voices of The Waste Land as compared to appropriate baselines, and we also offer quantitative evidence both for and against that particular interpretation. 


Keywords


poetry; statistical text analysis; literary analysis; stylistic segmentation

Full Text:

PDF

References


Alm, Cecilia Ovesdotter. 2011. Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 107–112.

Amigó, Enrique, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12(4):461–486.

Baccianella, Stefano, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10). Valletta, Malta. ISBN 2-9517408-6-7.

Bagga, Amit and Breck Baldwin. 1998. Entity-based cross-document coreferencing using the Vector Space Model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (ACL-COLING ’98), pages 79–85. Montreal, Quebec, Canada.

Balossi, Giuseppina. 2014. A Corpus Linguistic Approach to Literary Language and Characterization: Virginia Woolf’s The Waves. Philadelphia: John Benjamins.

Beatie, Bruce A. 1967. Computer study of medieval German poetry: A conference report. Computers and the Humanities 2(2):65–70.

Bedient, Calvin. 1986. He Do the Police in Different Voices: The Waste Land and its protagonist. Chicago: University of Chicago Press.

Beeferman, Doug, Adam Berger, and John Lafferty. 1999. Statistical Models for Text Segmentation. Machine Learning 34:177–201.

Biber, Douglas. 1988. Variation Across Speech and Writing. Cambridge University Press.

Blei, David M. and Pedro J. Moreno. 2001. Topic segmentation with an aspect hidden Markov model. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, pages 343–348. ISBN 1-58113-331-6.

Brants, Thorsten and Alex Franz. 2006. Web 1T 5-gram Corpus Version 1.1. Google Inc.

Brooke, Julian. 2013. Computational Approaches to Style and the Lex- icon. Ph.D. thesis, University of Toronto, Toronto, ON, Canada. http://www.cs.toronto.edu/pub/gh/Brooke-PhD-thesis.pdf.

Brooke, Julian, Adam Hammond, and Graeme Hirst. 2012. Unsupervised Stylistic Segmentation of Poetry with Change Curves and Extrinsic Features. In Proceedings of the 1st Workshop on Computational Literature for Literature (CLFL ’12). Montreal.

Brooke, Julian, Adam Hammond, and Graeme Hirst. To appear. Using Models of Lexical Style to Quantify Free Indirect Discourse in Modernist Fiction. Digital Scholarship in the Humanities.

Brooke, Julian and Graeme Hirst. 2012. Paragraph clustering for intrinsic plagiarism detection using a stylistic vector-space model with extrinsic features. In Notebook for PAN 2012 Lab at CLEF ’12. Rome.

Brooke, Julian and Graeme Hirst. 2013a. A multi-dimensional Bayesian approach to lexical style. In Proceedings of the 13th Annual Conference of the North American Chapter of the Association for Computational Linguistics.

Brooke, Julian and Graeme Hirst. 2013b. Hybrid Models for Lexical Acquisition of Correlated Styles. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP ’13).

Brooke, Julian and Graeme Hirst. 2014. Supervised Ranking of Co-Occurrence Profiles for Acquisition of Continuous Lexical Attributes. In Proceedings of The 25th International Conference on Computational Linguistics (COLING 2014).

Brooke, Julian, Graeme Hirst, and Adam Hammond. 2013. Clustering voices in The Waste Land. In Proceedings of the 2nd Workshop on Computational Literature for Literature (CLFL ’13). Atlanta.

Brooke, Julian, Vivian Tsang, Graeme Hirst, and Fraser Shein. 2014. Unsu- pervised Multiword Segmentation of Large Corpora Using Prediction-Driven Decomposition of n-Grams. In Proceedings of The 25th International Conference on Computational Linguistics (COLING 2014).

Brooke, Julian, Tong Wang, and Graeme Hirst. 2010. Automatic Acquisition of Lexical Formality. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING ’10). Beijing.

Burnard, Lou. 2000. User reference guide for British National Corpus. Tech. rep., Oxford University.

Burrows, John F. 1987. Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press.

Burton, Kevin, Akshay Java, and Ian Soboroff. 2009. The ICWSM 2009 Spinn3r Dataset. In Proceedings of the Third Annual Conference on Weblogs and Social Media (ICWSM ’09). San Jose, CA.

Cooper, John Xiros. 1987. T.S. Eliot and the politics of voice: The argument of The Waste Land. Ann Arbor, Mich.: UMI Research Press.

Cox, David R. and Peter A.W. Lewis. 1966. The Statistical Analysis of Series of Events. Monographs on Statistics and Applied Probability. Chapman and Hall. ISBN 9780412218002.

Culpeper, Jonathan. 2009. Keyness: Words, Parts-Of-Speech and Semantic Categories in the Character-Talk of Shakespeare’s Romeo And Juliet. International Journal of Corpus Linguistics 14(1):29–59.

Dale, Edgar and Jeanne Chall. 1995. Readability Revisited: The New Dale-Chall Readability Formula. Cambridge, MA: Brookline Books.

DeForest, Mary Margolies and Eric Johnson. 2000. Computing Latinate word usage in Jane Austen’s novels. Computers & Texts 18/19:24–25.

Duggan, Joseph J. 1973. The Song of Roland: Formulaic style and poetic craft. University of California Press.

Eder, Maciej. 2015. Rolling stylometry. Digital Scholarship in the Humanities . Eisenstein, Jacob and Regina Barzilay. 2008.

Bayesian unsupervised topic segmentation. In Proceedings of the Conference on Empirical Methods in Natural

Language Processing (EMNLP ’08).

Emigh, William and Susan C. Herring. 2005. Collaborative Authoring on the Web: A Genre Analysis of Online Encyclopedias. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS ’05). Washington, DC.

Fournier, Chris. 2013. Evaluating Text Segmentation using Boundary Edit Distance. In Proceedings of 51st Annual Meeting of the Association for Computational Linguistics (ACL ’13).

Galley, Michel, Kathleen McKeown, Eric Fosler-Lussier, and Hongyan Jing. 2003. Discourse segmentation of multi-party conversation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL ’03), pages 562–569. Sapporo, Japan.

Graham, Neil, Graeme Hirst, and Bhaskara Marthi. 2005. Segmenting documents by stylistic character. Natural Language Engineering 11(4):397–415.

Guthrie, David. 2008. Unsupervised Detection of Anomalous Text. Ph.D. thesis, University of Sheffield.

Hammond, Adam, Julian Brooke, and Graeme Hirst. 2013. A Tale of Two Cultures: Bringing Literary Analysis and Computational Linguistics Together. In Proceedings of the 2nd Workshop on Computational Literature for Literature (CLFL ’13).

Hearst, Marti A. 1994. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL ’94), pages 9–16.

Heylighen, Francis and Jean-Marc Dewaele. 2002. Variation in the Contextuality of Language: An Empirical Measure. Foundations of Science 7(3):293–340.

Kao, Justine and Dan Jurafsky. 2012. A Computational Analysis of Style, Sentiment, and Imagery in Contemporary Poetry. In Proceedings of the 1st Workshop on Computational Linguistics for Literature (CLFL ’12). Montreal.

Kazantseva, Anna and Stan Szpakowicz. 2014. Hierarchical Topical Segmentation with Affinity Propagation. In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014).

Kestemont, Mike, Kim Luyckx, and Walter Daelemans. 2011. Intrinsic plagiarism detection using character trigram distance scores. In Proceedings of the PAN 2011 Lab: Uncovering Plagiarism, Authorship, and Social Software Misuse.

Koppel, Moshe, Navot Akiva, Idan Dershowitz, and Nachum Dershowitz. 2011. Unsupervised decomposition of a document into authorial components. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL ’11). Portland, Oregon.

Landauer, Thomas K. and Susan Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and repre- sentation of knowledge. Psychological Review 104:211–240.

MacQueen, J. B. 1967. Some Methods for Classification and Analysis of Mul- tiVariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297.

Malioutov, Igor and Regina Barzilay. 2006. Minimum cut model for spoken lecture segmentation. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL ’06), pages 25–32. Sydney, Australia.

McKenna, C. W. F. and A. Antonia. 2001. The statistical analysis of style: Reflections on form, meaning, and ideology in the ‘Nausicaa’ episode of Ulysses. Literary and Linguistic Computing 16(4):353–373.

Oberreuter, Gabriel, Gaston L’Huillier, Sebastián A. Ríos, and Juan D. Velásquez. 2011. Approaches for intrinsic and external plagiarism detection. In Proceedings of the PAN 2011 Lab: Uncovering Plagiarism, Authorship, and Social Software Misuse.

Pevzner, Lev and Marti A. Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics 28(1):19–36.

Schmid, Helmut. 1995. Improvements In Part-of-Speech Tagging With an Application To German. In Proceedings of the ACL SIGDAT Workshop, pages 47–50.

Sigg, Eric. 1994. Eliot as a Product of America. In A. D. Moody, ed., The Cambridge Companion to T. S. Eliot, pages 14–30. Cambridge: Cambridge University Press.

Simonton, Dean Keith. 1990. Lexical choices and aesthetic success: A computer content analysis of 154 Shakespeare sonnets. Computers and the Humanities 24(4):251–264.

Stamatatos, Efstathios. 2009. Intrinsic plagiarism detection using character n-gram profiles. In Proceedings of the SEPLN’09 Workshop on Uncovering Plagiarism, Authorship and, Social Software Misuse (PAN-09), pages 38–46. CEUR Workshop Proceedings, volume 502.

Stein, Benno, Nedim Lipka, and Peter Prettenhofer. 2011. Intrinsic plagiarism analysis. Language Resources and Evaluation 45(1):63–82.

Stone, Philip J., Dexter C. Dunphy, Marshall S. Smith, and Daniel M. Ogilivie. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT Press.

Taboada, Maite, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics 37(2):267–307.

Touch Press LLP. 2011. The Waste Land app. http://itunes.apple.com/ca/app/the- waste-land/id427434046?mt=8.

Utiyama, Masao and Hitoshi Isahara. 2001. A statistical model for domain-independent text segmentation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL ’01), pages 499–506. Toulouse, France.

Voigt, Rob and Dan Jurafsky. 2013. Tradition and Modernity in 20th Century Chinese Poetry. In Proceedings of the 2nd Workshop on Computational Linguistics for Literature (CLFL ’13). Atlanta.

Wallace, Byron C. 2012. Multiple Narrative Disentanglement: Unraveling Infinite Jest. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’12).

Wiebe, Janyce M. 1994. Tracking point of view in narrative. Computational Linguistics 20(2):233–287.


Refbacks

  • There are currently no refbacks.