“Collating Places and Words with TopoText” by Randa El Khatib

1. Introduction

The software prototype presented in this article — TopoText — is a digital translation of an interdisciplinary spatial methodology, geocriticism, into concrete digital functionalities through critical interpretation processes. [1] By transmediating [2] spatial content expressed in textual forms to GIS-based visualizations in the form of digital maps, the digital artifact functions as a prism through which the text can be deformed in order to give rise to novel interpretations. Concurrently, a second transmediation occurs by translating literary cartographies of authors into visual literary geographies of readers through a combination of text analysis, natural language processing, and digital mapping. Within the framework of prototyping as theory, the findings of TopoText are meant to carry out geocriticism’s central aim, which is to facilitate a deeper understanding of the spatiality of a text through a platial focus.

2. Geocriticism

Geocriticism is an interdisciplinary approach towards the study of literature that necessarily includes a spatial or geographical dimension. First coined by Bertrand Westphal in Geocriticism: Real and Fictional Spaces, a geocritical approach adopts a multifocal and geocentered perspective to explore the significance of a place, whether real or fictional. A multifocal approach is achieved by representing the place from more than one dimension, perspective, or genre; this could include all types of texts, such travel brochures, historical texts, or a series of literary works, that focus on a particular place. Shifting to a geocentric approach from a more traditional egocentric one involves shifting the focus to the representation of a particular place, rather than the subject, as the central point of inquiry. Westphal acknowledges that a degree of egocentrism will unquestionably remain due to the logical impossibility of representing a place outside of a subject’s perspective. Geocriticism rather aims to combine numerous subjective representations of a central platial focus in order to gain a richer historical or literary comprehension of the place in question. Elaborating on the conceptual difference between space and place is Yi-Fu Tuan, who argues that place not only signifies a unit of location within the larger frame of space, but also “a reality to be clarified and understood from the perspectives of the people who have given it meaning” (387). By extending the focus beyond the place name mention to the different meanings given to it through textual narratives, the digital geocritical methodology outlined in this paper accommodates both platial and spatial explorations.

TopoText draws inspiration directly from Wespthal who argues that a multifocal approach would require working with many texts simultaneously, meaning that a branch of geocritical research will necessarily be both digital and collaborative. Building on this notion, TopoText explores the broader concept of geocriticism in the prototyping process itself, and embodies concrete elements of the methodology in its functionalities.

According to Robert Tally in Spatiality, there are two kinds of spatialities in a literary work – literary cartography and literary geography. He explains literary cartography by drawing a parallel between the act of writing and mapping cartographically by stating that “Like the mapmaker, the writer must survey territory, determining which features of a given landscape to include, to emphasize, or to diminish” (45). From this perspective, the writer acts as a mapmaker in creating a fictional world of their choosing by selecting what elements of that world they want to represent in their writing. Literary geography is what is at play at the receiving end of the spectrum. It refers to the reader who focuses on spatiality when reading a text by being conscious of how spatial configurations change over time and how this affects literature, keeping in mind that space is a formation of history, “and the history of spatial formation often overlaps with the history of narrative forms” (Spatiality 80). While literary geography offers a way to approach texts from a spatial perspective, digital mapping probes the forms it can take and offers a visual entry point to explore its significance through a cartographic medium.

3. Theoretical Framework

The concept of prototyping employed in this article follows Stephen Ramsay and Geoffrey Rockwell’s argument that conceives of a digital prototype as a theory, where theory is defined from a humanities perspective as that which “promises deeper understanding of something already given, like historical events of a literary work. To say that software is a theory is to say that digital works convey knowledge the way a theory does, in this more general sense” (77). Ramsey and Rockwell draw a parallel between the process of prototyping and the act of writing, and argue that the decisive feature that classifies scholarship as such is whether the work proves to be a worthwhile or insightful intervention in the field rather than the particular medium in which it is presented, which has traditionally been expressed in textual form. A similar perspective on prototyping in the digital humanities is proposed by Alan Galey and Stan Ruecker who lay the foundation for a more formal acknowledgement of prototypes as original contributions to knowledge in themselves. They propose various criteria by which to peer review a prototype, singling out that it is “contestable, defensible, and substantive,” and elaborate on how each of these applies within a prototyping framework (412). [3] The process of designing a digital artifact, according to the authors, could simultaneously be used for critical interpretation. By translating elements of the geocritical methodology into a digital prototype, the prototype conveys knowledge about geocriticism, while simultaneously resulting from a series of critical interpretations of it.

A second way of viewing digital artifacts as theories, according to Ramsay and Rockwell, is by approaching them as “hermeneutical instruments through which we can interpret other phenomena. Digital artifacts like tools could then be considered as “telescopes for the mind” that show us something in a new light” (79). This suggests that digital artifacts can serve as a way of altering the form of a work in order to look at it from different angles, evoking Lisa Samuels and Jerome McGann’s notion of deformance. [4] Deformance facilitates a critical interpretive act and can give rise to novel meanings that may have remained unseen in the original form of the text. Switching to a GIS-based interpretation of literary spatiality is an attempt to play on this uncanny representation of literary place in order to create novel forms of knowledge production and interpretation. In literary geography, “the critical reader becomes a kind of geographer who actively interprets the literary map in such a way as to present new, hitherto unforeseen mappings” (Spatiality 79). Actually mapping these literary places is its own manifestation of literary geography that reconstructs the literary cartography of the author in an altered, or deformed, manner.

4. TopoText

TopoText was created through collaboration between the English and Computer Science departments at the American University of Beirut in an undergraduate Software Engineering class. The collaboration was initiated as a pedagogical experiment in order to facilitate communication between humanities and computer science students, and to provide opportunities for student-led digital humanities tool development that would have immediate practical application in the field. The next section addresses the prototyping process, including the design decisions made for expressing geocriticism in computational terms.

5. The Prototyping Process

TopoText’s design and coding processes were carried out entirely by undergraduate and graduate students. Initially, the pedagogical experiment was conducted by providing teams of undergraduate computer science students in a Software Engineering class identical lists of potential functionalities of a tool. Based on this information and some guidance, they simultaneously developed multiple versions of the prototype, after which the most optimal one was chosen. However, a number of original and useful functionalities from the other prototypes have been documented and will be incorporated in future iterations. The prototype was built by remixing features of open source tools.

After inputting a text in plain text (.txt) format, TopoText matches all unambiguous place names with geographical coordinates and displays them on a map interface by using the Stanford’s Named Entity Recognition (NER) Tagger. Matching is carried out through the Google Maps Programming Interface (API) and placed onto a Google Maps Engine base map. Paired with the map are text analysis and concordance tools that provide the context in which the place names occur and allow to manipulate the content for analysis. Manipulation of text is carried out by collocating place name occurrences and words that appear around them, thereby enacting a geocentric approach. Users can specify the ratio of words to collocate around a place name, and can facet this according to part of speech; this is carried out through an imbedded Stanford Part-of-Speech (POS) Tagger that can extract nouns, version, adjectives, or adverbs. These collocations can be localized to the specific instance the place name appears in, or generated across the entire text my counting the most frequent word collocations around a selected place name. The target user for this tool is anyone interested in automatically visualizing places mentioned in a text and the language used in the vicinity of these places. TopoText was purposefully designed to read plain text format in order to allow users to work with any text of their choosing and not limit them to a predetermined corpus or library. This decision was made in order to facilitate a multifocal approach and to purposefully leave potential applications of the tool open to different disciplines.

Simply put, the prototype allows users to locate patterns and trends in relation to places and to trace how they evolve over time. Resulting maps give a more concrete understanding of the spatial scope of the work, which is often far more encompassing than close reading may suggest. Tracing place name occurrences helps facilitate research questions, such as where specific clusters of place names are found and what causes interest in that area at a specific time or with a certain author. The scope of the map can be navigated from a fine granularity, to countries, and all the way to entire world maps. By adjusting the scope of the text analysis component, users can perform a dissection of the entire text or zone into specific passages. Here, the unit of analysis is the word, where computational methods collocate word types or the most frequent words used in relation to place name occurrences in a text. A straightforward application would be to investigate Charles Dickens’ portrayal of London throughout his writing career, and whether his conception of the city changed as his career advanced. This could be carried out by running the novels through the prototype and identifying the most frequent words related to London in his works. By faceting the text analysis tool according the nouns collocated with London, TopoText can pick up on the general themes related to the city and how those change over the course of a single novel or an entire cannon. If an interesting collocation appears in the word cloud, the user can then switch to the concordance tool in order to close read a passage to further advance or contradict an argument or observation. Balancing word-place collocations on the one hand and the concordance tool with the full content on the other allows for a more meaningful geocentered application in which place names are contextualized with their content. This contextualization helps transform points on a map into places of investigation by providing different narratives associated with them. Resulting narrative forms can appear linearly, in the concordance tool, or as deformances, in the word-place collocates.

Figure 1 A world map of Thackeray’s Vanity Fair automatically generated by TopoText

Some noteworthy challenges in the prototyping phase arose mainly from the inherent limitations associated with automation and close reading. In other words, there was a compromise that had to be made in favor of either speed or accuracy. Automatic geocoding —the process of connecting a location with its corresponding geographical coordinates — bypasses what is considered to be one of the most tedious aspects of digital mapping, or any large data-driven research for that matter, namely the gathering and assembling of data for accuracy and for readability by digital tools. However, automatic parsing methods introduce a set of limitations, especially in terms of accuracy. Presently, most automatic geocoding methods do not disambiguate between a place name that corresponds to more than one geographical location and the chosen location; the point that actually appears on the map is determined by an invisible ad hoc algorithm that can be incorrect. For example, Figure 1 is a visualization of a map of William Thackeray’s Vanity Fair automatically generated by TopoText. Although the majority of the matching is accurate, some of the points are misplaced, such as the presence of points in Australia and New Zealand, which in the context of the novel actually refer to places in England with the same names. A majority of place names actually refer to more than one location, which makes complete accuracy in automatic matching hard to achieve. Most current geocoding technologies are also ambivalent of spelling variations and are only able to locate the standardized spelling of a place name, meaning that only unambiguous place names are actually mapped. This limits automatic digital mapping to more contemporary texts or texts with modernized spelling, typically starting with the nineteenth century, with a higher level of accuracy with places that have rich GIS-data. In the critical making process of this software prototype, these and other limitations were documented and taken into account in the modelling process for the next iteration.

6. Future Directions

Modelling the first prototype largely consisted of decisions related to rendering geocritical terms into computational tasks; with the skeleton in place, a future iteration will adopt a more humanities-based performance that will allow for more accuracy and human intervention. The concordance and word-place collocation tools allow switching between close and distant reading; however, neither of them are open or interactive in a way that allows for human input. Johanna Drucker expresses concern about applying tools and methods originally designed for other fields to the humanities since the quantitative restrictiveness often does not capture the complexity, temporality, and ambiguity inherent to the humanities. The next prototype will embed more subjective and open-ended approaches by including a human-in-the-loop functionality and retaining the ambiguities that are more meaningful to humanities research, which do not need to be quantified or resolved. TopoText 2.0 will support collaborative knowledge production by allowing researchers to continuously populate the datasets, as well as create and share richer maps. It will also more closely intertwine place and text by implementing a function that will allow users to add and save information in the form of text on the map itself that can be easily shared and visualized.

A solution for the parsing method will be to combine speed and accuracy by relying on automatic methods in the initial parsing process, and then including human intervention in the post-matching stage. This will be done by generating a list of alternative possibilities of locations that share the same place name and their corresponding geo-coordinates. In case of an incorrectly algorithmically parsed location, the user can simply select the accurate place name and the geographical information from the existing list embedded in the prototype. Such a combination is an optimal negotiation between automatic and manual geocoding that could ensure a higher level of accuracy while sparing the researcher from the tedious process of manually searching for the coordinates of a place name and preparing the dataset for visualization. The API itself will be switched from the Google Maps API, which primarily deals with modern place names, to GeoNames, which is one of the largest open gazetteers and includes historical place names and alternative place name spellings. This could open up the scope of automatic parsing and facilitate more research on text in less standardized language, such as pre- or early modern texts, translated texts, or texts that deal with places with more ambiguous place name spellings. Finally, TopoText 2.0 will have an export function that will output the geoparsed data along with the annotations and other relevant information into a separate CSV file that can be reused on other platforms.

7. Conclusion

TopoText is a form of critical inquiry into modes of spatial representation and meaning formation through deformance. The prototype ultimately attempts to reconstruct geocriticism in digital terms, where design-related decisions are critical interpretive acts. Maps are one way of manifesting a visual literary geography that may serve as an entry point into further spatial explorations. The challenges and findings involved in developing the first model serve as a solid foundation on which to build a second iteration. Since TopoText is primarily designed for humanities research, the second version will allow for human input and embed more subjective elements central to humanities practices within a broader quantitative realm.

Notes

[1] The collaboration on TopoText (https://github.com/rkhatib/topotext) was initiated at the American University of Beirut by David Wrisley (English department), Wassim El-Hajj and Shady Elbassuoni (Computer Science department). The prototype was designed by Randa El Khatib and coded by Julia El Zini, Bilal Abi Farraj, Houda Nasser, Shadia Barada, and Yasmin Kadah in Mohammad Jaber’s Software Engineering class

[2]See Øyvind Eide, Media Boundaries And Conceptual Modelling: Between Texts and Maps (Basingstoke, Hampshire: Palgrave Macmillan, 2015)

[3] Terminology adopted from Wayne C. Booth, Gregory G. Colomb, and Joseph M. Williams’ The Craft of Research (Chicago: University of Chicago Press, 2008) on the components of a good thesis

[4] See Travis, Charles. “Bloomsday’s Big Data: GIS, Social Media and James Joyce’s Ulysses.” for a description of the application of deformance in a mapping context

Works Cited

 

  • Booth, Wayne C., Gregory G. Colomb, and Joseph M. Williams. 2008. The Craft of Research. 3rd ed. Chicago: University of Chicago Press.
  • Drucker, Johanna. 2012. “Humanistic Theory and Digital Scholarship.” In Debates in the Digital Humanities, edited by Matthew Gold, 85-95. Minneapolis: University of Minnesota Press. http://dhdebates.gc.cuny.edu/debates/text/34.
  • Eide, Øyvind. 2015. Media Boundaries and Conceptual Modelling: Between Texts and Maps. London: Palgrave Macmillan UK.
  • Feinberg, Jonathan. 2014. Wordlehttp://www.wordle.net/.
  • Galey, Alan and Stan Ruecker. 2010. “How a Prototype Argues.” Literary and Linguistic Computing25 (4): 405-24.
  • n.d.GeoNameshttp://geonames.org/.
  • Google Developers. n.d. Google Maps API. https://developers.google.com/maps/.
  • Moretti, Franco. 2003. Graphs, Maps, Trees. London: New Left Review Ltd.
  • Ramsey, Stephen and Geoffrey Rockwell. 2012. “Developing Things: Notes Towards an Epistemology of Building in the Digital Humanities.” In Debates in the Digital Humanities, edited by Matthew Gold, 75-84. Minneapolis: University of Minnesota Press. http://dhdebates.gc.cuny.edu/debates/text/11.
  • Samuels, Lisa and Jerome McGann. 1999. “Deformance and Interpretation,”New Literary History 30 (1): 25-56.
  • Stanford Natural Language Processing Group. n.d. Stanford Named Entity Recognizerhttp://nlp.stanford.edu.
  • Tally, Robert. 2012. Spatiality. New York: Routledge.
  • Thackeray, William. 2008. Vanity Fair.Project Gutenberg.
  • Travis, Charles. “Bloomsday’s Big Data: GIS, Social Media and James Joyce’s Ulysses.” In Literary Mapping in the Digital Age, edited by David Cooper, Christopher Donaldson, and Patricia Murrieta-Flores. New York: Routledge.
  • Tuan, Yi-Fu. 1979. “Space and Place: Humanistic Perspective.” In Philosophy in Geography, 387-427. Dordrecht: Springer Netherlands.
  • Westphal, Bertrand. 2011. Geocriticism: Real and Fictional Spaces. Translated by Robert Tally. New York: Palgrave Macmillan.