1.vi Crowdsourcing

Carletti, Laura, Derek McAuley, Dominic Price, Gabriella Giannachi, and Steve Benford. 2013. “Digital Humanities and Crowdsourcing: An Exploration.” Museums and the Web 2013 Conference. Portland: Museums and the Web LLC. http://mw2013.museumsandtheweb.com/paper/digital-humanities-and-crowdsourcing-an-exploration-4/.

Carletti, McAuley, Price, Giannachi, and Benford survey and identify emerging practices in current crowdsourcing projects in the digital humanities. They base their understanding of crowdsourcing on an earlier 2012 publication that defined crowdsourcing as an online, voluntary activity connecting individuals to an initiative via an open call. This definition was used to select the case studies for the current research. The researchers found two major trends in the 36 initiatives included in the study: crowdsourcing projects use the crowd to either (a) integrate/enrich/configure existing resources or (b) create/contribute new resources. Generally, crowdsourcing projects asked volunteers to contribute in terms of curating, revising, locating, sharing, documenting, or enriching materials. The 36 initiatives surveyed divided naturally into three categories in terms of project aims: public engagement, enriching resources, and building resources.

Causer, Tim, and Melissa Terras. 2014. “Crowdsourcing Bentham: Beyond the Traditional Boundaries of Academic History.” International Journal of Humanities and Arts Computing 8 (1): 46–64. doi:10.3366/ ijhac.2014.0119.

Causer and Terras look back on some of the key discoveries that have been made from the Transcribe Bentham crowdsourced initiative. Transcribe Bentham was launched with the intention of demonstrating that crowdsourcing can be used successfully for both scholarly work and public engagement by allowing all types of participants to access and explore cultural  material. Causer and Terras note that the majority of the work on Transcribe Bentham was undertaken by a small percentage of users, or “super transcribers.” Only 15 per cent of the users have completed any transcription, and approximately 66 per cent of those users have transcribed only a single document—leaving a very select number of individuals responsible for the core of the project’s production. Causer and Terras illustrate how some of the user transcription has contributed to our understanding of some of Jeremy Bentham’s central values: animal rights, politics, and prison conditions. Overall, Causer and Terras demonstrate how scholarly transcription undertaken by a wide, online audience can uncover essential material.

Causer, Tim, Justin Tonra, and Valerie Wallace. 2012. “Transcription Maximized;  Expense  Minimized?  Crowdsourcing  and  Editing The Collected Works of Jeremy Bentham.Digital Scholarship in the Humanities (formerly Literary and Linguistic Computing) 27 (2): 119–doi:10.1093/llc/fqs004.

Causer, Tonra, and Wallace discuss the advantages and disadvantages of user-generated manuscript transcription using the Transcribe Bentham project as a case study. The intention of the project is to engage the public with the thoughts and works of Jeremy Bentham by creating a digital, searchable repository of his manuscript writings. Causer, Tonra, and Wallace preface this article by setting out five key factors the team hoped to assess in terms of the potential benefits of crowdsourcing: cost effectiveness, exploitation, quality control, sustainability, and success. Evidence from the project showcases the great potential for open access TEI-XML transcriptions in creating a long-term, sustainable archive. Additionally, users reported that they were motivated by a sense of contributing to a greater good and/or recognition. In the experience of Transcribe Bentham, crowdsourcing transcription may not have been the cheapest, quickest, or easiest route; the authors argue, however, that projects with a longer time frame may find this method both self-sufficient and cost-effective.

Causer, Tim, and Valerie Wallace. 2012. “Building a Volunteer Community: Results and Findings from Transcribe Bentham.Digital Humanities Quarterly 6 (2): n.p. http://digitalhumanities.org:8081/dhq/vol/6/2/000125/000125.html.

Causer and Wallace reflect on the experience of generating users and materials for the crowdsourced Transcribe Bentham project. The purpose of the Transcribe Bentham project is to create an open source repository of Jeremy Bentham’s papers that relies on volunteers transcribing the manuscripts. Causer and Wallace argue that crowdsourcing is a viable and effective strategy only if it is well facilitated and gathers a group of willing volunteers. They found that retaining users was just as integral to the success of the project as was recruiting. It was important, therefore, that they build a sense of community through outreach, social media, and reward systems. The number of active users involved in Transcribe Bentham was greatly affected by media publicity. Users reported that friendly competition motivated them to participate, but that an overall lack of time limited their contributions.

Fitzpatrick, Kathleen. 2012. “Beyond Metrics: Community Authorization and Open Peer Review.” In Debates in the Digital Humanities, edited by Matthew K. Gold, 452–59. Minneapolis: University of Minnesota Press.    http://dhdebates.gc.cuny.edu/debates/text/7.

Fitzpatrick calls for a reform of scholarly communication via open peer review. She argues that the Internet has provoked a conceptual shift wherein (textual) authority is no longer measured by a respected publisher’s stamp; rather, she contends, authority is now located in the community. As concepts of authority change and evolve in the digital sphere, so should methods. Peer review should be opened to various scholars in a field, as well as to non-experts from other fields and citizen scholars. Fitzpatrick claims that this sort of crowdsourcing of peer review could more accurately represent scholarly and non-scholarly reaction, contribution, and understanding. Digital humanities and new media scholars already have the tools to measure digital engagement with a work; now, a better model of peer review should be implemented to take advantage of the myriad, social, networked ways scholarship is (or could be) produced.

Franklin, Michael J., Donald Kossman, Tim  Kraska,  Sukriti  Ramesh, and Reynold Xin. 2011. “CrowdDB: Answering Queries with Crowdsourcing.” In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD/PODS ’11), 61–72. New York: ACM.

Franklin, Kossman, Kraska, Ramesh, and Xin discuss the importance of including human input in query processing systems due to their limitations in dealing with certain subjective tasks, which often result in inaccurate results. The authors propose using CrowdDB, a system which allows for crowdsourcing input when dealing with incomplete data and subjective comparison cases. They discuss the benefits and limitations of having human effort combined with machine processing, and offer a number of suggestions to optimize the workflow. The authors envision the field of human input combined with computer processing to be an area of rich research due to its improvement of existing models and enablement of new ones.

Ghosh, Arpita, Satyen Kale, and Preston McAfee. 2011. “Who Moderates the Moderators? Crowdsourcing Abuse Detection in User-Generated Content.” In Proceedings of the 12th ACM Conference on Electronic Commerce (EC ’11), 167–76. New York: ACM.

Ghosh, Kale, and McAfee address the issue of how to moderate the ratings of users with unknown reliability. They propose an algorithm that can detect abusive content and spam, starting with approximately 50 per cent accuracy on the basis of one example of good content, and reaching complete accuracy after a number of entries based on machine-learning techniques. They believe that rating each individual contribution is a better approach than rating the users themselves based on their past behaviour, as most platforms do. According to the authors, this algorithm may be a stepping-stone in determining more complex ratings by users with unknown reliability.

Holley, Rose. 2010. “Crowdsourcing: How and Why Should Libraries Do It?” D-Lib Magazine 16 (3/4): n.p. doi:10.1045/march2010–holley.

Holley defi es crowdsourcing, and makes a number of practical suggestions to assist with launching a crowdsourcing project. She asserts that crowdsourcing uses social engagement techniques to help a group of people work together on a shared, usually signifi t initiative. The fundamental principle of a crowdsourcing project is that it usually entails greater effort, time, and intellectual input than is available from a single individual, thereby requiring broader social engagement. Holley’s argument is that libraries are already profit at public engagement, but need to improve how they work toward shared group goals. She suggests ten basic practice to assist libraries in successfully implementing crowdsourcing. Many of these recommendations centre on project transparency and motivating users.

*Kittur, Aniket, and Robert E. Kraut. 2008. “Harnessing the Wisdom of the Crowds in Wikipedia: Quality Through Coordination.” In Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work (CSCW 08), 37–46. New York: ACM.

Kittur and Kraut study the correlation between the number of editors on a Wikipedia page and the quality of that page’s content. Significantly, they argue that an increased number of editors on a given page will prove productive only if some sort of coordination apparatus is in place. Articles are even more successful, content-wise, if a small group of experts manages the majority of the work. This argument runs counter to the crowdsourcing ethos of Wikipedia, which dictates that, generally, the more editors at work, the better the quality of the article. The authors argue, however, that a smaller group of editors working under a semi-authoritative organizational system facilitates peer-to-peer communication—a benefit that is often lost when large groups of uncoordinated individuals are involved.

Manzo, Christina, Geoff Kaufman, Sukdith Punjasthitkul, and Mary Flanagan. 2015. “‘By the People, For the People’: Assessing the Value of Crowdsourced, User-Generated Metadata.” Digital Humanities Quarterly 9 (1): n.p. http://www.digitalhumanities.org/dhq/vol/9/1/000204/000204.html.

Manzo, Kaufman, Punjasthitkul, and Flanagan make a case for the usefulness of folksonomy tagging when combined with categorical tagging in crowdsourced projects. The authors open with a defence of categorization by arguing that classification systems reflect collection qualities while allowing for efficient retrieval of materials. However, they admit that these positive effects are often diminished by the use of folksonomy tagging, which promotes self-referential and personal task organizing labels. The authors suggest that a mixed system of folksonomic and controlled vocabularies be put into play in order to maximize the benefits of both approaches while minimizing their challenges. This is demonstrated through an empirical experiment in labelling images from the Leslie Jones Collection of the Boston Public Library, followed by evaluating the helpfulness of the tags using a revised version of the Voorbij and Kipp scale.

McKinley, Donelle. 2012. “Practical Management Strategies for Crowdsourcing in Libraries, Archives and Museums.” Report for the School of Information Management, Faculty of Commerce and Administration, Victoria University of Wellington (New Zealand): N.p. http://nonprofitcrowd.org/wp-content/uploads/2014/11/McKinley-2012-Crowdsourcing-management-strategies.pdf.

McKinley reviews the literature and theory on crowdsourcing, and considers how it relates to the research initiatives of libraries, archives, and museums. She begins by claiming that burgeoning digital technologies have contributed to an increase in participatory culture. Furthermore, she argues that this is evinced by the growing number of libraries, archives, and museums using crowdsourcing. McKinley cites five different categories of crowdsourcing: collective intelligence, crowd creation, crowd voting, crowdfunding, and games. By way of conclusion, McKinley makes the following recommendations for crowdsourcing projects: (a) understand the context and convey the project’s benefits; (b) choose an approach with clearly defined objectives; (c) identify the crowd and understand its motivations; (d) support participation; (e) evaluate implementation.

Moyle, Martin, Justin Tonra, and Valerie Wallace. 2011. “Manuscript Transcription by Crowdsourcing: Transcribe Bentham.” Liber Quarterly 20 (3–4): 347–56. doi:10.18352/lq.7999.

Moyle, Tonra, and Wallace outline the objectives of the Transcribe Bentham project from its initial stages. Transcribe Bentham hopes to harness the power of crowdsourcing to develop an open source repository of Jeremy Bentham’s manuscripts. Beyond digitizing and transcribing the manuscripts, the project aims to create a transcription interface, promote community volunteerism, and roll out a TEI transcription tool, among other things. The authors work through the design concept for the transcription interface and the TEI toolbar, both of which are meant to mask the complexity of the markup. The project team hopes that this initiative will stimulate further public engagement in scholarly archives and that it will introduce Bentham’s work to new audiences.

*OpenStreetMap Foundation. n.d. OpenStreetMap. https://www.openstreetmap.org.

OpenStreetMap is an editable map of the world that consists of a vast amount of location information, ranging from bus routes and bicycle trails to cafés and restaurants. It owes much of its success to its open access values, which circumvent the widespread commercialization of geospatial information. OpenStreetMap is a collaborative project with more than two million users who crowdsource data through a number of resources, such as GPS devices and aerial photography. The OpenStreetMap Foundation—whose mission is to provide an infrastructure for openly reusable digital geospatial information—supports this project.

Ridge, Mia. 2013. “From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing.” Curator: The Museum Journal 56 (4): 435–50. doi:10.1111/cura.12046.

Ridge examines how crowdsourcing projects have the potential to assist museums, libraries, and archives with the resource-intensive tasks of creating or improving content about collections. She argues that a well-designed crowdsourcing project aligns with the core values and missions of museums by helping to connect people with culture and history through meaningful activities. Ridge synthesizes several definitions of crowdsourcing to present an understanding of the term as a form of engagement in which individuals contribute toward a shared and significant goal through completing a series of small, manageable tasks. She points toward several examples of such projects to illustrate her definition. Ridge argues that scaffolding the project by setting up boundaries and clearly defining activities helps to increase user engagement by making participants feel comfortable completing the given tasks. She sees scaffolding as a key component of mounting a successful crowdsourcing project that offers truly deep and valuable engagement with cultural heritage.

Rockwell, Geoffrey. 2012. “Crowdsourcing the Humanities: Social Research and Collaboration.” In Collaborative Research in the Digital Humanities, edited by Marilyn Deegan and Willard McCarty, 135–54. Farnham, UK, and Burlington, VT: Ashgate.

Rockwell demonstrates how crowdsourcing can facilitate collaboration by examining two humanities computing initiatives. He exposes the paradox of collaborative work in the humanities by summarizing the “lone ranger” past of the humanist scholar. Rockwell asserts that the digital humanities are, conversely, characterized by collaboration because they require a diverse range of skills. He views collaboration as an achievable value of digital humanities rather than a transcendent one. Case studies of the projects Dictionary and Day in the Life of Digital Humanities illustrate the limitations and promises of crowdsourcing in the humanities. Rockwell argues that the main challenge of collaboration is the organization of professional scholarship. Crowdsourcing projects provide structured ways to implement a social, counterculture research model that involves a larger community of individuals.

Ross, Stephen, Alex Christie, and Jentery Sayers. 2014. “Expert/ Crowdsourcing for the Linked Modernisms Project.” Scholarly and Research Communication 5 (4): n.p. http://src-online.ca/index.php/src/article/viewFile/186/368.

Ross, Christie, and Sayers discuss the creation and evolution of the Social Sciences and Humanities Research Council (SSHRC)-funded Linked Modernisms Project. The authors demonstrate how the project negotiates the productive study of both individual works and the larger field of cultural modernism through the use of digital, visual, and networked methods. Linked Modernisms employs a four-tier information matrix to accumulate user-generated survey data about modernist materials. The authors argue that the resulting information allows serendipitous encounters with data, and emphasizes discoverability. Linked Modernisms is focused on developing modes of scholarly publication that line up with the dynamic nature of the data and comply with the principles of open access.

Saklofske, Jon, with the INKE Research Group. 2012. “Fluid Layering: Reimagining Digital Literary Archives Through Dynamic, UserGenerated Content.” Scholarly and Research Communication 3 (4): n.p. http://src-online.ca/index.php/src/article/viewFile/70/181.

Saklofske argues that while the majority of print and digital editions exist as isolated collections of information, changing practices in textual scholarship are moving toward a new model of production. He uses the example of NewRadial, a prototype information visualization application, to showcase the potential of a more active public archive. Specifically, Saklofske focuses on making room for user-generated data that transforms the edition from a static repository into a dynamic and co-developed space. He champions the argument that the digital archive should place user-generated content in a more prominent position through reimagining the archive as a site of critical engagement, dialogue, argument, commentary, and response. In closing, Saklofske poses five open-ended questions to the community at large as a way of kickstarting a conversation regarding the challenges of redesigning the digital archive.

Walsh, Brandon, Claire Maiers, Gwen Nally, Jeremy Boggs, and Praxis Program Team. 2014. “Crowdsourcing Individual Interpretations: Between Microtasking and Multitasking.” Digital Scholarship in the Humanities (formerly Literary and Linguistic Computing) 29 (3): 379–doi:10.1093/llc/fqu030.

Walsh, Maiers, Nally, Boggs, et al. track the creation of Prism, an individual text markup tool developed by the Praxis Program at the University of Virginia. Prism was conceived in response to Jerome McGann’s call for textual markup tools that foreground subjectivity as the tool illustrates how different groups of readers engage with a text. Prism is designed to assist with projects that blend two approaches to crowdsourcing: microtasking and macrotasking. A compelling quality of Prism is that it balances the constraint necessary for generating productive metadata with the flexibility necessary for facilitating social, negotiable interactions with the textual object. In that way, Prism is poised to redefine crowdsourcing in the digital humanities.