Posted on 2016/10/31 by

Bot Buccaneers or Non-Human Privateers: Thoughts on Writing Algorithms

This probe was written by a human, and as such, its text is invariably subject to the shortcomings and excesses of its author, to the digressive effluent of his human idioms and emotions. If this text survives long enough to be assimilated by a writing algorithm, then this introduction will likely be excised, its content deemed superfluous by a more discerning editorial eye.

There are, as of 2016, several independent services specializing in the use of algorithms that can be ‘commissioned’ to write texts on virtually any subject. Prices and turnover times vary, but the principle is the same: the algorithm does the research, the writing, the editing, and the typesetting, and from there a .PDF can be generated and printed by a print-on-demand service.

Such entities, or more accurately such assemblages, have the potential to outcompete not only digital ‘pirates’ but any and all digital producers and curators of knowledge. Such prospects stir up age-old anxieties about the impending obsolescence of humanity in the face of our increasingly efficient artificial progeny. In his book Piracy: The Intellectual Property Wars from Gutenberg to Gates, Adrian Johns points out that

  • so serious has the prospect of piracy become for [industries] that in the United States the Digital Millenium Copyright Act has even outlawed the promulgation of algorithms that might be used to disable or circumvent copy-protected devices. A graduate student coming to Nevada to present a technical paper can be arrested, not for pirating anything himself, but for divulging principles that might allow others to do so. (Johns 3)

What kinds of algorithms might be defined by the broad language of such legislation? What is meant by ‘circumvent’ here? When considering the explosion of writing algorithms in the past 5-10 years, and their increased use by journalistic entities and businesses alike, it is imperative to consider not only their effect on the ‘business’ and ‘creative’ sides of writing and everything in-between, but also the effect of such legislation on the ways in which such algorithms are used.

Even the most rudimentary of these types of algorithms act as hyper-curators, hyper-authors, and hyper-nodes within information networks. They are able to commune with and collate, on an unprecedented level, the information they manage. These algorithms do not need to fret over funding and their only basic needs are electricity and an internet connection; the prospect of competition with such an entity would make even the most stringently ascetic grad student twitch with jealous fear.

Already, a storm of hard questions threatens to sink this probe as it shoves off. How and where do algorithms that manage and produce texts fit into debates about copying, copyright legislation, authorship, and piracy? What will happen as such programs are deployed to produce academic papers? What would the peer-review process of such works be? Is the author of a species of algorithms (assuming such people continue to be credited with the works their creations produce) qualified to teach a course on something that they have never researched or written about themselves? Some such questions seem to flirt with absurdity, but they beg to be asked and addressed nonetheless.

Addressing the above set of questions in turn necessitates answering a whole other series of queries about how such algorithms operate. Research is even more painfully scant here, if not absent altogether: to what extent could such algorithms be labelled ‘pirates’ if they plunder as many sources as they have access to (including GOA databases such as Sci-Hub) and then paraphrase and/or cite the information that they use from such sources to assemble ‘new’ texts?

The behaviour of these bots is also somewhat shrouded in the complexity of the code that makes them up, the content of their ‘DNA’, so-to-speak. This bears in-depth discussion. How do these algorithms determine whether the texts they are using as grist for the mill of their process are in the public domain or are subject to terms of fair dealing? What criteria or formulae, embedded in the effective ‘bodies’ of these algorithms, determine how or to what extent a source passage should be paraphrased?

Finally, how good are these algorithms at writing ‘creative’ works, compared to more explanatory, analytical, or summatory texts, and how do they fit into the networks they plunder for material? How do they and their creators function as authors and curators?

Balazs Bodó asks an oddly naïve question in his article “Pirates in the Library – an Inquiry into the Guerilla Access Movement”. He wonders why, circa mid-2016, “scholarly publication is affected [by GOA archives] and fiction is not?” (Bodó 4) It seems remarkable that Bodo failed to acknowledge the prevalence of textual piracy of all kinds, as well as the piracy of comics, manga, and graphic novels (the latter, studied by Darren Wershler, Kalervo Sinervo, Shannon Tien, will come up again later on).

I remember finding and downloading, in the late 2000’s, a massive archive of literature and fiction of all kinds, several dozen gigabytes in size and containing hundreds of thousands of e-books, .pdfs, .CBRs, etc. The torrent took a few days to download. Unfortunately, I lost the books years later when one of my backup drives crashed, reinforcing Wershler, Sinervo, and Tien’s argument that digital archiving is extremely unstable, and an ‘ecosystem’ model is preferable as a perspective for viewing this way of managing information (Sinervo et. al.).

If this ‘ecosystem’ model holds true (it certainly chimes with Hall’s ideas of ‘articulation’, and also with the concept of ANT theory), then where do writing algorithms fit into the information ‘food chain’ of this ecosystem? Are they predators, prey, or scavengers? Do they function as any or all of the three depending upon their use case, or do they represent a new kind of organism entirely?

Journalistic responses to the proliferation of writing bots touch on some of the waves of questions that crashed on our decks earlier, while raising various anxieties having to do with ideas of authorship, legitimacy, and piracy. In a Wired article, for example, Mark Allen Miller chronicles the evolution of the company Narrative Science, and its algorithmic writing engine, Quill.

Quill began as the pet project of founder Kristian Hammond in conjunction with Roger Schank. At MIT, the two created a piece of software that could write journalistic pieces following sports games, which became Quill. However, like other algorithmic writing engines, Quill still requires user-generated examples of text in order to produce new texts. As Miller notes, “news stories, particularly about subjects like sports or finance, hew to a pretty predictable formula, and so it’s a relatively simple matter for the meta-writers to create a framework for the articles.” (Miller)

The Quill website comes across as very ‘corporate’, not unlike the shining, smile-ridden façade of a fictional company from a dystopic sci-fi novel. On the other hand, the website of Business scholar Phillip M. Parker’s company ICON Group International resembles a bare-bones version of Amazon. There is even a wish list creation option. While Parker points out that writing a given algorithm can take years depending upon the bot’s intended task, once written, said algorithms can generate texts in about 15 minutes. Browsing through the texts already available on the ICON site, one finds prices ranging from typical textbook fare to nearly a thousand dollars for more lengthy and extensive compilations.

In an interview with Parker published by Readwrite, interviewer Adam Popescu notes that Parker “claims [that] he’s basically applying 19th century Taylorism to the publishing industry,” a somewhat chilling comparison to make despite the evident lack of human abuse in Parker’s system (Popescu). In fact, the term “Digital Taylorism”, aka “New Taylorism”, has already been coined. In the Readwrite interview, Parker describes what his bots do as ‘econometrics’ and credits the field of economics for laying the groundwork for what his bots do (Popescu). Parker further posits that

  • a lot of that process could be reverse engineered and basically characterized by algorithms and be used in an automated fashion. The methodologies are extremely old, just like the methodologies of writing haiku poetry are very old. An Elizabethan sonnet is 14 lines – that is a line of code if you think of it that way. The code is constrained. So all genres, no matter what the genres are, are a form of constrained writing. (Popescu)

Parker makes it clear that the majority of the writing his bots are doing is fairly dry, straightforward, and utilitarian in its applicability. Up-to-date statistical analyses and compendiums of similar information not available all in one text are popular order items, and Parker points out that these texts have short print runs due to the rapid pace at which they become out-of-date (Popescu). Parker’s bots seem to be meeting a demand for texts that required far more human time and drudgery to produce, but the implications of such advanced software for writing in general should not be ignored.

Parker also notes that, for commissioned research endeavours, firms will usually “pass off the editorial analysis to a group of people who do formatting and copy editing and graphic design, who then pass it off to another group of people who do metadata, covers, spines, all that. All we did is reverse engineer that. But the methodology to do that already existed before the books existed.” (Popescu)

It appears that Parker is not immune to the irony of the publicity he has received, or the echoes of copying (also called piracy depending upon how threatening the activity becomes to financially entrenched industries) in the journalistic pieces written in response to his work. Parker wryly observes that

  • there’s been in the last 2 weeks about 10 articles written about what I’ve done and none of them talked to me about it. They’re all copy and pasting from each other. I think it’s very a interesting observation that they’re using a formulaic method to deliver content and put their name on a byline, when in fact they’ve done a formulaic cut-and-paste. (Popescu)

In a brief piece published by Popular Science, Francie Diep points out that Forbes has begun using the software developed by Narrative Science to generate business-related articles, and that the LA Times has reportedly begun using bots developed by one of its own staff (Diep). Algorithmic writers are scuttling into big-name institutions that cater to millions of readers, and almost none of these readers are aware that what they may be reading was not written by a human. Some of these readers might ask, “why does it matter?” Why indeed.

In terms of addressing anxieties about bots replacing human writers, Diep points out that “the biggest argument for robot journalism is that it frees human reporters to do the kind of deeper reporting only people can do…auto-writers are able to accurately process an inhuman amount of data, then present it in a way that humans like to see: in words.” (Diep). Parker’s algorithms even open a Word document on their own and output information, as his video on the subject shows.

David J. Hill, writing for SingularityHub, comments that “parker is not so much an author as a compiler.” (Hill) Questions of authorship and prestige are also brought up by Adrian Johns, who reminds us of pre-modern debates about the authenticity of authorship, debates that are resurging as we begin considering the author function of the algorithm and the algorithm’s programmer. (Johns 495) Antiquated notions of ‘prestige’ re-emerging in the face of copying-framed-as-piracy bear an eerie resemblance to comments about a bot article’s ability to pass itself off as a text written by a human. ‘Sounding human’ is the ‘new prestige’ that developers of writing algorithms have to strive for.

In terms of prestige, algorithms attempting to emulate a ‘human-sounding’ standard tone is a complex notion to approach. What do we mean when we say things like “this reads like it was written by a human”. Is this ‘human’ tone a western one? An English-speaking one? Is it masculine, feminine or androgynous? How is this ‘voice’ inflected? Donna Harroway’s Cyborg Manifesto, and how its arguments can be applied to entities such as Microsoft’s Cortana and the mobile secretary Siri, would be useful in addressing the issue of writing algorithms in more detail. But that’s for another time.

In an article published by the American Journalism Review, Hille ban der Kaa and Emiel Krahmer touch on the issue of credibility, which brings to mind Adrian Johns treatment of the transition from earlier economies of ‘credibility’ to our current economy of information. However, credibility again rears its head when studies such as the one Kaa and Krahmer describe take place around algorithmic writing. The authors reveal that

  • we randomly showed a story to 232 native Dutch speakers (among them 64 journalists) and asked them to evaluate the perceived expertise and trustworthiness of the news writer and the contents of the story. As previously stated, our study found no differences in the perceptions that news consumers held regarding the credibility of machine-written stories versus articles they thought were created by humans. (Kaa & Krahmer)

Beyond questions of authenticity in the age of instant copying, sharing, and pirating of information, it is also imperative to note that Parker’s method of assembling texts via anagrams is patented, and technically if the information in most of the algorithms’ books is openly available and factual in nature, so it could be difficult to argue that the books violate copyright in a ‘piracy’ sense. Alternately, in an article published by Educase Review, Penn State director of educational technology Kyle Bowen frames algorithmic writing assemblages as potential allies of ‘open education’, while also espousing their potential to be used to write textbooks for the academy (Bowen).

However, as the examples of OA entities (sued into oblivion) given by Bodó illustrate, when a company is threatened enough by something that they can label as ‘piracy’, they will do all they can to do so if they have no alternative way of out-pirating the pirate with an equivalent but legitimate alternative (Bodó 15).

Algorithms trawling so much information so rapidly, from the greatest amount and variety of sources they can find, and generating or paraphrasing text to suit a particular project, seems like the framework for an anti-pirate’s nightmare. However, industry has proven time and again, whether it be in the case of the BBC radio pirates of the 1960s, the hacktivists and FOSS advocates of the mid-1980s and 1990s, or their present-day equivalents, that it can and will respond to pirates’ subversive efforts in one of two ways. If industry can’t beat the pirates’ systems with a better service of their own, then they will make allies or privateers of them.

If we ignore the role of algorithms in processes of digital text generation, copying/piracy, distribution, curation, and citation, we run the risk of being blindsided by unforeseen conflicts and complications between such entities and the more antiquated and potentially exploit-ridden portions of the networks they emerged within. As with real-life pirates, the same planks used to build any ship can also be used in the execution rituals of mutinous crewmembers.

Works Cited

Bodó, Balázs, Pirates in the Library – An Inquiry into the Guerilla Open Access Movement (July 6, 2016). Paper prepared for the 8th Annual Workshop of the International Society for the History and Theory of Intellectual Property, CREATe, University of Glasgow, UK, July 6-8, 2016.. Available at SSRN: https://ssrn.com/abstract=2816925

DePaul, Kristi. “Kyle Bowen: Robot Writers, Open Education, and the Future of Edtech.” Educase Review. Educase, 05 Apr. 2016. Web. 31 Oct. 2016.

Diep, Francie. “Associated Press Will Use Robots To Write Articles.” Popular Science. Bonnier Corporation, 1 July 2014. Web. 31 Oct. 2016.

Hill, David J. “Patented Book Writing System Creates, Sells Hundreds Of Thousands Of Books On Amazon.” Singularity HUB. Singularity University, 13 Dec. 2012. Web. 31 Oct. 2016.

Johns, Adrian. Piracy: The Intellectual Property Wars from Gutenberg to Gates. Chicago: U of Chicago, 2009. Print.

Kaa, Hille Van Der, and Emiel Krahmer. “Robot Reporters or Human Journalists: Who Do You Trust More?” American Journalism Review. Philip Merrill College of Journalism, 24 Oct. 2014. Web. 31 Oct. 2016.

Miller, Mark Allen. “Can an Algorithm Write a Better News Story Than a Human Reporter?” Wired.com. Conde Nast Digital, 04 Apr. 2012. Web. 31 Oct. 2016.

Popescu, Adam. “Why Write Your Own Book When An Algorithm Can Do It For You?” ReadWrite. ReadWrite, 15 Jan. 2013. Web. 31 Oct. 2016.

Wershler, Darren, Kalervo Sinervo, & Sharmon Tien. “A Network Archaeology of Unauthorized Book Scans.” Modern 2 (2013).

Print Friendly