Clustering of Novels Represented as Social Networks

Mariona Coll Ardanuy, Caroline Sporleder


Within the field of literary analysis, there are few branches as con- fusing as that of genre theory. Literary criticism has failed so far to reach a consensus on what makes a genre a genre. In this paper, we examine the degree to which the character structure of a novel is in- dicative of the genre it belongs to. With the premise that novels are societies in miniature, we build static and dynamic social networks of characters as a strategy to represent the narrative structure of novels in a quantifiable manner. For each of the novels, we compute a vector of literary-motivated features extracted from their network representa- tion. We perform clustering on the vectors and analyze the resulting clusters in terms of genre and authorship. 


novel; social networks; narrative structure

Full Text:



Aaronson, Scott. 2001. Stylometric Clustering: A Comparison of Data-Driven and Syntactic Features. Tech. rep., Computer Science Department, University of California, Berkeley.

Abbott, H. Porter. 2008. The Cambridge Introduction to Narrative. Cambridge Introductions to Literature. Cambridge University Press.

Agarwal, Apoorv, Augusto Corvalan, Jacob Jensen, and Owen Rambow. 2012. Social Network Analysis of Alice in Wonderland. In Workshop on Computational Linguistics for Literature, Association for Computational Linguistics, pages 88–96.

Akiva, Navot and Moshe Koppel. 2012. Identifying Distinct Components of a Multi-Author Document. In Proceedings of the 2012 European Intelligence and Security Informatics Conference, pages 205–209.

Alberich, Ricardo, Josep Miró-Julià, and Francesc Rosselló. 2002. Marvel Universe looks almost like a real social network. Preprint, Department of Mathematics and Computer Science, University of the Balearic Islands.

Allison, Sarah, Ryan Heuser, Matthew Jockers, Franco Moretti, and Michael Witmore. 2011. Quantitative Formalism: an Experiment. Pamphlet 1, Stanford Literary Lab.

Andrews, Nicholas O. and Edward A. Fox. 2007. Recent Developments in Document Clustering. Tech. rep., Department of Computer Science, Virginia Tech.

Aristotle. 2007. Poetics. The Internet Classics Archive - Atomic and Massachusetts Institute of Technology.

Baayen, Harald, Hans van Halteren, and Fiona Tweedie. 1996. Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11:121–131.

Bakhtin, Mikhail. 1981a. Epic and Novel: Towards a Methodology for the Study of the Novel. In J. M. Holquist, ed., The dialogic imagination: four essays. Unversity of Texas Press.

Bakhtin, Mikhail. 1981b. Forms of Time and of the Chronotope in the Novel: Notes Toward a Historical Poetics. In J. M. Holquist, ed., The dialogic imagination: four essays. Unversity of Texas Press.

Baldick, Chris. 2008. The Oxford dictionary of literary terms. Oxford Paperbacks. Oxford University Press.

Bamman, David, Ted Underwood, and Noah A. Smith. 2014. A Bayesian Mixed Effects Model of Literary Character. In Proceedings of the 52nd An- nual Meeting of the Association for Computational Linguistics (Voluma 1: Long Papers), pages 370–379. Baltimore, Maryland: Association for Computational Linguistics.

Basili, Roberto, Paolo Marocco, and Daniele Milizia. 2008. Semantically rich spaces for document clustering. In Proceedings of the 19th International Conference on Database and Expert Systems Application, DEXA Workshops, pages 43–47. IEEE Computer Society.

Bekkerman, Ron, Hema Raghavan, James Allan, and Koji Eguchi. 2007. Interactive Clustering of Text Collections According to a User-Specified Criterion. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 684–689.

Bemong, Nele and Pieter Borghart. 2010. State of the Art. In N. Bemong, P. Borghart, M. D. Dobbeleer, K. Demoen, K. D. Temmerman, and B. Keunen, eds., Bakhtin’s Theory of the Literary Chronotope: Reflections, Applications, Perspectives, chap. 1, pages 3–16. Academia Press.

Bloom, Harold. 2003. Introduction. In M. de Cervantes (translation by E. Grossman), Don Quixote. HarperCollins Publishers.

Celikyilmaz, Asli, Dilek Hakkani-Tur, Hua He, Greg Kondrak, and Denilson Barbosa. 2010. The actor-topic model for extracting social networks in literary narrative. In NIPS Workshop: Machine Learning for Social Computing.

Elsner, Micha. 2012. Character-based Kernels for Novelistic Plot Structure. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012). Avignon, France.

Elson, David K., Nicholas Dames, and Kathleen R. McKeown. 2010. Extracting Social Networks from Literary Fiction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.

Elson, David K. and Kathleen R. McKeown. 2010. Automatic Attribution of Quoted Speech in Literary Narrative. In Proceedings of the 24th AAAI Conference on Artificial Intelligence.

Emerson, Caryl. 1986. Boris Godunov: Transpositions of a Russian Theme. Indiana-Michigan Series in Russian and East European Studies. Indiana University Press.

Finkel, Jenny Rose, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs sampling. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, pages 363–370.

Fowler, Alastair. 1982. Kinds of Literature: An Introduction to the Theory of Genres and Modes. Oxford: Clarendon Press.

Freytag, Gustav. 1863. Die Technik des Dramas. S. Hirzel.

Gupta, Suhit, Hila Becker, Gail Kaiser, and Salvatore Stolfo. 2005. A Genre-based Clustering Approach to Content Extraction. Tech. rep., Department of Computer Science, Columbia University.

Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An update. SIGKDD Explorations Volume 11(1):10–18.

Holmes, David I. and Richard S. Forsyth. 1995. The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing 10:111–127.

James, Henry. 1908. Preface to volume 7 of the New York edition (containing: The tragic muse). text07.htm.

Jayannavar, Prashant Arun, Apoorv Agarwal, Melody Ju, and Owen Rambow. 2015. Validating Literary Theories Using Automatic Social Network Extraction. In Proceedings of NAACL-HLT Fourth Workshop on Computational Linguistics for Literature, pages 32–41.

Jockers, Matthew L. 2013. Macroanalysis: Digital Methods & Literary History. University of Illinois Press.

Layton, Robert, Paul Watters, and Richard Dazeley. 2011. Automated unsupervised authorship analysis using evidence accumulation clustering. Nat- ural Language Engineering 19:95–120.

Ledger, Gerard and Thomas Merriam. 1994. Shakespeare, Fletcher, and the two noble kinsmen. Literary and Linguistic Computing 9:235–248.

Madsen, Deborah L. 1994. Rereading Allegory: A narrative approach to genre. Palgrave Macmillan.

Margonari, Massimiliano. 2011. An Unsupervised Text Classification Method Implemented in Scilab. Tech. rep., Open Source Engineering.

Moretti, Franco. 2011. Network Theory, Plot Analysis. Pamphlet 2, Stanford Literary Lab.

Newman, Mark E. J. and Michelle Girvan. 2003. Finding and evaluating community structure in networks. Physical Review E 69:1–16.

Pavlyshenko, Bohdan. 2012. The Clustering of Author’s Texts of English fiction in the vector space of semantic fields. The Computing Research Repository abs/1212.1478.

Poudat, Céline and Guillaume Cleuziou. 2003. Genre and Domain Processing in an Information Retrieval Perspective. In Proceedings of the International Conference on Web Engineering (ICWE), pages 399–402.

Propp, Vladimir I. A. 1968. Morphology of the folktale. American Folklore Society Bibliographical and Special Series. University of Texas Press.

Propp, Vladimir I. A. 1984. Theory and History of Folklore, vol. 5 of Theory and history of literature. Manchester University Press.

Rydberg-Cox, Jeff. 2011. Social Networks and the Language of Greek tragedy. Journal of the Chicago Colloquium on Digital Humanities and Computer Science 1:1–11.

Sack, Graham. 2012. Character Networks for Narrative Generation. In Intelligent Narrative Technologies: Papers from the 2012 AIIDE Workshop, AAAI Technical Report WS-12-14, pages 38–43.

Shahnaz, Farial, Michael W. Berry, V. Paul Pauca, and Robert J. Plemmons. 2006. Document Clustering Using Nonnegative Matrix Factorization. Information Processing and Management 42(2):373–386.

Spang, Kurt. 1993. Géneros literarios. Teoría de la literatura y literatura comparada. Madrid, Spain: Editorial Síntesis.

Steinbach, Michael, George Karypis, and Vipin Kumar. 2000. A Comparison of Document Clustering Techniques. Tech. rep., Department of Computer Science and Egineering, University of Minnesota.

Suen, Caroline, Laney Kuenzel, and Sebastian Gil. 2013. Extraction and Analysis of Character Interaction Networks From Plays and Movies. Digital Humanities Conference abstracts.

Wang, Xufei, Jiliang Tang, and Huan Liu. 2011. Document Clustering via Matrix Representation. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining, ICDM ’11, pages 804–813. IEEE Computer Society.

Willett, Peter. 1988. Recent Trends in Hierarchic Document Clustering: A Critical Review. Information Processing & Management 24:577–597.

Woloch, Alex. 2003. The One vs. the Many: Minor Characters and the Space of the Protagonist in the Novel. Princeton, New Jersey: Princeton University Press.

Zhang, Bin. 2013. Learning Features for Text Classification. Ph.D. thesis, University of Washington.


  • There are currently no refbacks.

Comments on this article

View all comments