Vol. 5, No. 4 - May/June 1995 FOCUS: Academic Computing  
Repossession: An Academic Romance
The Rossetti Archive and the Quest to Revive Scholarly Editing
By Steven Johnson

IT'S A BRISK, SUNNY MARCH morning in Charlottesville, and Jerome McGann is talking animatedly about the manuscript history of Dante Gabriel Rossetti's 1870 Poems. It's a reasonable enough scenario: McGann is an English professor a t the University of Virginia, a specialist in nineteenth-century British poetry, and the editor of the seven-volume Collected Works of Byron. His research in recent years has increasingly centered on the work of Rossetti, the Pre-Raphaelite poet an d painter. But there's more to McGann's archival musings than meets the eye. His words have as much to do with the hard edges of information technology as with the dreamy fantasies of the Pre-Raphaelite Brotherhood.

The story goes that Rossetti decided somewhere in the middle of proofing the poems that each page should contain only twenty-four lines, rather than the initial twenty-nine. Though this information might seem trivial, McGann wants us to see the line-count change as a significant event in Rossetti's work, perhaps as significant as the poet's editorial changes to the words themselves. "The page airs itself out -- it's a completely different page," he says. McGann speaks with the air of a committed bibliophi le. Listening to him explain the complicated history of Rossetti's poems, you half imagine him to be gingerly leafing through a yellowed, brittle manuscript in a dark study, walls lined with leather-bound first editions.

But the nostalgic, ghost-of-scholarship-past trip doesn't last long with McGann. As it turns out, there's not a book in sight as he argues for the relevance of Rossetti's line breaks. He's huddled over an IBM RS-6000 workstation, scrolling through the vir tual pages on a twenty-inch monitor. And the high-tech gear is more than just scholarly window dressing. McGann has devoted the last three years to compiling a computerized edition of Rossetti's collected works, a massive compendium of zeros and ones that mutates a late-Victorian artist into binary code. Hence the disorientation experienced upon visiting McGann in his home environment, like watching a film that's been ineptly dubbed: The language evokes the rich, textured history of small presses and coll ector's editions, while the visuals might as well be outtakes from a futuristic AT&T ad.

Although it's only half-completed, McGann's Rossetti Archive is already among academia's most ambitious efforts to drag textual editing into the age of digital reproduction. Since Rossetti was both a poet and painter, the archive is geared as much to the visual image as to the written word. And -- if that's not enough -- it also attempts to reproduce the Pre-Raphaelites' passionate attachment to the aesthetics of the printed page. It's an attachmen t that McGann shares. Few scholarly editors have placed such importance on the rich connotations of page layout and typography embedded within every print document. His most recent work, Black Riders: The Visible Language of Modernism (Princeton, 1 993), ruminates on William Morris's use of the Caslon typeface the way Empsonians enthuse about ambiguity. In this sense, McGann's critical perspective takes its cues from Yeats's ecstatic vision of the "total book" -- a fusion of writing, design, typogra phy, and illustration where all the elements work as an ensemble to create the larger meaning of the book itself.

Yet McGann's alliance of bookish aesthetics and digital programming is a rather uneasy one. His gamble is that a deft, artful use of binary code will bring to life the sensuous experience of reading literature, and even revive the long-neglected "visible language" of the printed page. McGann believes that the electronic medium will amplify the older, ink-and-paper message, not consume it. This is a leap of faith that others may not be willing to make.

THE ROSSETTI PROJECT UNFOLDS in the corporate-style confines of UVA's Institute for Advanced Technology in the Humanities (IATH), a high-tech, multidisciplinary center founded three years ago with a substantial grant of equipment from IB M, as well as extensive support from UVA's library system. IATH's offices in the bowels of the university's graduate library reflect the austere sensibility of its initial sponsor: The large, relatively drab space is divided up into twenty or thirty semip rivate cubicles, many sporting high-powered workstations. IATH's director, John Unsworth, has the deportment of a computer-systems consultant at a Fortune 500 company. He's well-groomed and personable, wearing a crisp white shirt and tie -- the sort of ou tfit you would have found at IBM before the dressing-down revolution tore through its Armonk, New York, headquarters this past winter. It's almost surprising to discover that he's also an associate professor of English and the co-editor of Postmodern Cult ure, one of the first academic journals to publish exclusively via computer disks and e-mail. Next to Unsworth's polished appearance, McGann comes across as something of a maverick, wearing a blue-jean shirt with an alarming burgundy tie for good measure. When we head out of the office for lunch, he dons an improbable wide-brimmed hat. The general impression is that of a rancher dressed up for the annual cotillion.

Unsworth and McGann explain the mechanics of the Rossetti Archive while seated at a conference table in the IATH offices, with an RS-6000 lurking a few feet away. After about an hour of conversation, we surrender to the machine's siren song, and slide our chairs over to the workstation, where the two men show off the software. It's hardly a run-of-the-mill environment for a pair of English professors; I keep expecting Unsworth to jump out of his chair and start sketching flow charts and Pascal subroutines on the chalkboard.

But the real wild card in this mix is Dante Gabriel Rossetti himself. Literary critics dabbling in the new medium of electronic hypertext tend to lean heavily on the experimental metafictions of the Sixties and Seventies -- from Barth, Cortázar, Ca lvino. Fashionable theories of the "readerly text" sustain much of this work, and it's rare indeed to find an essay on the labyrinthine, multiple narratives of digital writing that doesn't contain an obligatory quote from Borges's "Library of Babel." When academic hypertextualists look beyond the recent past, their gaze usually settles on the eclectic self-reference of Don Quixote or Tristram Shandy. Next to the giddy free-for-all of the postmoderns and their antecedents, the florid stylizations of Rosset ti seem decidedly anachronistic. It's not really an exaggeration to say that the Pre-Raphaelite Brotherhood (and offshoots like William Morris's Arts and Crafts movement) exerts more influence over the field of interior design than over contemporary liter ary studies.

Why base a radical, pathbreaking experiment in digital scholarship on such an unsung character? The answer lies partly in McGann's fondness for Rossetti: His project is an act of aesthetic advocacy. But it also lies in the idiosyncrasies of Rossetti's wor k, idiosyncrasies that have foiled Rossetti's would-be editors for decades. An exceptionally "nervous writer" -- as McGann puts it -- Rossetti tinkered with his words obsessively, so much so that it's almost impossible to establish a single, definitive ve rsion for a work like The House of Life, a sonnet cycle that exists in dozens of states: printed books, copy text, final proofs, notebook writings. The work grows, over the course of its development, from 16 to 103 sonnets; the 1870 and the 1894 editions include eleven songs in addition to the poetry, while the 1881 edition ignores the songs altogether. Rossetti's writerly neuroses are a part of literary lore: He buried a batch of poems with his wife after her sudden death in 1862 (from an overdose of lau danum), only to exhume the body seven years later to recover the manuscript.

Rossetti also immersed himself in what we now call, somewhat inelegantly, multimedia. Like other Pre-Raphaelites, Rossetti explored the relationship between word and image; illustrations accompany many of his poems, including the celebrated "Blessed Damoz el," where the influence of medieval religion on the poem is greatly amplified by the sacramental iconography in the accompanying painting. Rossetti's visual sensibility also extends to the design of his books. "Rossetti's textual work was executed with a profound consciousness of the expressive character of documentary materials," McGann writes in a recent essay.

A tortuous manuscript history, a lifelong experimentation with mixed media, a typographer's sensitivity to the "visible language" of the page -- Rossetti's oeuvre pushes the traditional capabilities of print editing to their breaking point. When Mc Gann first began mulling over the idea of editing Rossetti, he hadn't yet heard of hypertext or multimedia, and the technical hurdles seemed insurmountable. "You couldn't do it," McGann says. "Nobody had edited Rossetti, because it couldn't be done, becau se it couldn't be done in the way you'd need to do it." If this impasse frustrated the editor in McGann, it stimulated the theorist. In the years before his high-tech conversion, McGann envisioned Rossetti as a kind of parable for textual studies, a way o f illuminating the blind spots or distortions built into every print-based act of scholarly editing. Rossetti's work provided a way of thinking about what critical editing couldn't do, or couldn't do within the paradigm of the book as we have come to know it.

"You can't have art without resistance in the materials." Over the years, William Morris's oft-quoted line became something of a mantra for McGann -- understandably so, given the extent to which his work concentrated on the necessary resistances encounter ed in scholarly editing, resistances that derive from the nature of the book medium itself. Print-based critical editions are bound by this significant convention: The data and the analysis of that data are built out of the same material (ink and paper). Of course, in the predigital era this seemed more of a practical necessity, and less of a convention. But McGann argues that a critical edition should ideally exist on a higher conceptual level than the original documents, just as, in the sciences, the pe riodic table uses words and diagrams to represent physical elements. In traditional critical editions, the analytic tools belong to the same conceptual order as their object of study, words on paper representing other words on paper, which is somewhat lik e building a periodic table out of titanium and zinc.

As readers, of course, we're likely to ignore the problem. We're trained to think of a book as a unified, self-contained whole, a snapshot, frozen in time. Editors, on the other hand, confront a more complex organism when they take on a project like, say , Wordsworth's Prelude, with its numerous drafts, revised endlessly over half a century. An editor must represent something larger -- more layered, more transformational -- than a single coherent work. Seen from an editor's perspective, Rossetti's The House of Life is not a run-of-the-mill sonnet cycle; it's a set of relationships between various drafts of the work, evolving in rich and complicated ways. Submitting this sort of multilayered, time-sensitive information to the dictates of the book form is like attempting to represent a three-dimensional space within a two-dimensional medium. Technically, it's possible to pull it off (think of the traditional scholarly concordance), but something is inevitably lost in the translation.

Already strained by the complexities of manuscript revisions, print-based critical editions run into still more difficulties when they have to account for other art forms using only the textual resources of the book medium. McGann offers the example of Ro bert Burns, the eighteenth-century Scottish poet and balladeer, whose melodies are as essential as his language to any critical edition of his work. You can reprint the sheet music, of course, but that doesn't do much for future scholars who can't sight-r ead musical notation. Matters are somewhat simpler with the visual arts, since a painting can be reproduced with some accuracy on a page. In practice, however, editors have traditionally oscillated between visual and textual fidelity: Some editors compile facsimile editions that pay homage to the original "visible language" of the manuscript page and its accompanying illustrations; others engineer elaborate, text-based concordances that neglect the graphic components altogether. With an artist like Rosset ti, where the textual history is fiendishly complex and the visuals are indispensable, this sort of either/or doesn't satisfy.

McGann first outlined the difficulties of editing Rossetti in The Textual Condition (Princeton, 1991). At the time, Rossetti's quirks were problems without solutions, conundrums for theoretical contemplation more than anything else. But it wasn't long before McGann stumbled on a computerized hypertext system during a visit to California. The new medium of digital hypertext, he realized, could render the opposition between visual fidelity and textual detail obsolete. The idea of compiling a digital version of Rossetti's works took another year to gestate, but by 1993 IATH was up and running, with the Rossetti Archive sharing center stage at the institute with Edward Ayers's multimedia Civil War project, The Valley of the Shadow. High technology wou ld do for Rossetti what a century of scholarly editors had failed to accomplish. "By freeing us from the limits of paper-based editing," McGann announced, "electronic textuality makes the marriage of facsimile and critical editing a practical goal." The R ossetti Archive would catalogue reams of textual variants, exhumed manuscripts, and marginalia, building an architecture of data that future scholars would be able to explore at will. The Archive would include page-layout information and annotated descrip tions of the paintings: a keyword search on the word "star" would zip you directly to the seven stars suspended delicately in the Blessed Damozel's hair in Rossetti's painting.

AT THE HEART OF the Rossetti Archive is the notion of a "markup" language. In all the hoopla over the scholarly revolution promised by digital technology, markup languages have been somewhat overshadowed by ringing calls for digitizing e xisting print books, led by electronic text advocacy groups like Project Gutenberg. Translating a book into binary code -- and leaving it at that -- is certainly a worthwhile project, but it's a far cry from a digital cure-all. If you feed a computer the entire text of Dickens's complete works, you'll be able to do all sorts of fancy tricks with the data: calculate the average length of the Dickensian sentence, track down each reference to the Marshalsea debtors' jail, inventory all the paragraphs that co ntain both "Chancery" and "fog." But more-nuanced questions generate less-impressive results: How many chapters end with a suggestive hint of a passing resemblance, half-glimpsed and then forgotten? How many times does Dickens abandon the narrative thread to thunder against the factory system? Ask a question of this nature and you'll receive nothing but a befuddled silence from your PC.

This silence is a result of the computer not understanding the text stored on its hard drive. All the computer recognizes is raw data -- text strings, punctuation, formal divisions between chapters or separate works. Human readers, on the other hand, expe rience the semantic depth of information, meaning, significance. When we read Dickens's extended lampoon of the "Noble Refrigerator" in Little Dorrit, we make sense of the passage on a number of levels simultaneously: through the literal, denotativ e content (the upper-crust patriarch pontificating at the dinner table) but also through the rhetorical technique of an extended metaphor, a deft use of satire, a characteristic Dickensian tirade against the withering aristocracy.

A markup language is a kind of bridge between the reader's information and the computer's data: an inventory of elements of a given text that extend beyond the words themselves. A library card catalogue exemplifies one kind of markup language. Each card h as a slot for a title, an author, a date of publication, and so on. A more advanced markup, like the one developed for the Rossetti Archive, takes things a step further. Rather than simply classifying the publication info about a given poem, the Rossetti markup -- called a Document Type Description (DTD) -- represents the content of the work as well. It "marks up" specific textual features -- like patterns of imagery, verse forms, and tropes -- with easily located "tags." A library catalogue can build you a functional bibliography in a matter of seconds, but a content-based markup language gives you powerful analytic tools with which to study texts themselves. The more extensive the markup language, the more powerful the tools. With a DTD, you're able to track down all the times Rossetti employed death imagery in the last line of a stanza, excluding all poems written before his wife died in 1862.

Creating a DTD may be an intellectually stimulating activity, but most academics will be bewildered by their first glance underneath the hood of the Rossetti Archive. The markup language itself doesn't score particularly high on the user-friendly scale, a nd the old-timers clinging resolutely to their Remingtons will shudder to find their favorite poems transformed into a long litany of arcane codes. Consider this excerpt:

<page n="3" image="fiz44-69_3">
<msAdds type="prtrdir">
<trans>This to be used as introductory and printed in italics</trans>
<desc>Marginal directions to the printer, written at top</desc>
<scribe>May Morris</scribe>
<note>May Morris transcript with DGR's corrections and additions; size: 22.2x17.3cm</note>
<work type="sonnet" title="Introductory Sonnet" workCode="1-1880">
<workHeader><del>The Sonnet</del>
<note>The title in the MS is originally "The Sonnet," but this is here canceled and the sonnet was not printed with a specific title by DGR; the title "Introductory Sonnet" was added later when WMR collected DGR's work and it has become traditional. </note>
<lg type="octave">
<l n="1">A Sonnet is a moment's monument,-
<l n="2"> Memorial from the Soul's eternity
<l n="2a" indent="2"><del>thy</del>

For literary scholars long accustomed to their discipline's shift from painstaking philology to criticism, compiling a markup language might seem like the sort of dreary, mechanical labor best relegated to research assistants. But McGann and Unsworth insi st that designing the markup language is more than just a technical exercise. Defining a poem in a language intelligible to a computer, they say, coerces you into seeing it more clearly. Theoretical assumptions about the nature of the poetic work are put to a practical test when transcribed into markup form. As McGann puts it, you're forced to describe a poem to "the dumbest of instruments" -- a machine that knows nothing about rhyme schemes or Romanticism, sonnets or synecdoche. Interacting with such abj ect ignorance demands that you understand the data thoroughly, since you can't rely on the computer's intuitive aesthetic sensibility to compensate for your mistakes. "Theory and practice are always implicated in each other," McGann says. "But in the mark up process, the practice aspect of theory is always in your face. It's very liberating, really, very mind-clearing." Unsworth, who is up for tenure next year, expresses concern that other academics don't understand the intellectual challenges involved in creating markup protocols. "It seems to me that there's this hierarchy between the technical and the theoretical, or art and craft. There's a certain type of act/ivity that is an intellectual exercise -- building a markup standard, for instance -- that is regarded nonetheless as technical work." Of course, this hierarchy may be short-lived. It's not hard to imagine future generations of professors proposing daring new markup languages to win their colleagues' approval.

THE FIRST QUESTION POSED by the markup process seems easy enough at first glance: Just what is a literary work? Different markups presuppose different definitions. A text-driven markup eliminates the visible language of graphic design fr om its working definition, while a markup that hones in on a single, definitive work neglects the evolutionary complexity of the manuscript history. McGann has tackled this issue by dividing the data into three major classes of objects: Rossetti Archive D ocuments, Pictures, and Works (RADs, RAPs, and RAWs). Each type of object is marked up differently. A Document is anything that can be considered original source material: manuscript pages, notebooks, copy text. Its markup places a heavy emphasis on the p hysical characteristics of the page, including elements as specific as the watermark of the paper and the type of binding used. Pictures, as the name suggests, catalogue the physical and semantic attributes of Rossetti's visual work: his illustrations for Tennyson's 1857 Poems, for instance, or the Arthurian murals he painted in the Oxford Union debating chamber in the late Fifties. The RAP markup tags the name of the artist's model, visual references to other literary works, and info about exhibit ions where the painting appeared. Finally, a Work corresponds to the more conventional notion of a literary text; its markup emphasizes the aesthetic properties of the language, and includes pointers to all the various renditions of the piece scattered ac ross the manuscript history. McGann describes the Work as "the term signifying that abstract entity, usually identified by a conventionally accepted title, that subsumes a number of individual documents bearing the same name or title, whether textual or p ictorial or both." The RAW markup includes tags that will be more familiar to traditional literary scholars: historical citations; genre classifications; meter and rhyme schemes; production histories.

As in any act of translation, defining a markup language inevitably runs the risk of interpretative bias. All literary scholars are likely to agree on the relevance of marking up titles or dates of publication, but other potential categories are necessari ly more ambiguous. A doctrinaire Freudian might catalogue all the tell-tale references to the family romance, while a Russian formalist might attempt to tag each use of metonymy. Choices made in defining the categories will profoundly shape the way future scholars interact with the Rossetti database, as well as other archives built on the Rossetti standards. McGann claims to have dealt with this problem by designing a markup language that is largely catholic in its tastes. McGann contrasts his own approac h with the rival Textual Encoding Initiative (TEI), a more text-based language designed by a wide consortium of scholars. "The Rossetti system marks up the usual things that TEI marks up," he says, "but it goes beyond TEI in trying to mark up image materi als, the typographical and physical characteristics of the documents. You never know when something will be significant. That's why everything has to be marked up."

But McGann's "everything" is somewhat overstated. Even if figurative language could be catalogued precisely, no markup can account for every element in a poetic work. It's a practical impossibility. The Rossetti DTD, for instance, devotes an entire cat egory to paper stock but doesn't include a built-in tag for Homeric allusions (the closest equivalent is a generic slot for "mythic citation" in the RAW markup). What's more, the Rossetti model does place a great deal of emphasis on the physical attribute s of the documents it surveys, an emphasis that other critics might consider exaggerated. This is where McGann's own interpretive perspective -- his bibliographic "materialism" -- becomes a matter of real importance. As a markup language becomes increasin gly powerful, the sorts of choices made in assembling it become more significant. For some scholars, querying the database may come to substitute for reading itself. And in that case, there's always a danger that those aspects of a text that are left out of the markup language may vanish altogether.

McGann and I talk about these issues after lunch, sitting in one of his three offices on the UVA campus. (He also has a cubicle in the IATH quarters, as well as space in the English Department.) It's a cramped, windowless room several floors above IATH in the graduate library. The space is littered with pages and pages of hard-to-decipher Rossetti DTDs. When I ask him about the emphasis placed on page design in the Rossetti DTD, he nods knowingly, as if he's heard the question before. "Most people don't r eally know how to read a book," he says. "Confront them with bibliographic information and it's just dead. And it's presented to them in a dead way. But it's very alive if you know how to do it." Part of McGann's argument is that today's print-based criti cal editions are hard-wired to exclude bibliographic information, pushing elements like paper stock and typography to the outskirts of critical significance, relegating them to the occasional facsimile edition or rare-book room in a university library. Th e Rossetti Archive makes it possible for researchers to interact with these long-neglected visual elements, using the high-powered querying tools of the modern computer. It may not convert future scholars into McGannesque materialists, but at least the gr aphic context of the page will be one potential variable in the equation. And that, for McGann, would be a happy outcome.

IT'S HARD TO TELL what Rossetti and his Pre-Raphaelite allies would have made of these high-tech advances. They'd doubtless have been heartened by the renewed attention to the tactile and visual qualities of the page, and by the fusion o f word and image enabled by modern hypermedia. But it's also likely that the glassy-eyed stupor of life in front of the computer monitor would have appalled Rossetti and his ilk. The Pre-Raphaelite retreat to the mystical Arthurian past was, after all, so mething of an escape from the force field of nineteenth-century industrial technology, and the tactile, sensuous experience of reading old-fashioned books was a central element of the Pre-Raphaelite sensibility. What happens to William Morris's all-import ant "resistance in the materials" when the materials are constructed out of the quicksilver medium of binary code? The sheer associative velocity of reading electronic hypertext, hopping recklessly from document to document, would probably have horrified bibliophiles like Rossetti and Morris, grounded, as they were, in the slower, more sedentary pleasures of traditional page turning. As essayist and technoskeptic Sven Birkerts notes, "The instructive nature of any historical artifact comes from confrontin g the otherness of another time." In the rush to rewire history for the multimedia age, Birkerts sees a "kind of time-lapse photography, where we project onto any older experience the codes and conventions of the present. We almost need cognitive 'speed b umps,' which would at certain intervals remind people that the context in which things were created was very different from our own time."

Whatever the Pre-Raphaelites would have made of it, the Rossetti Archive does provide a likely model for the future of scholarly editing -- largely because of the basic portability of a Rossetti-style markup language. Once defined, a standard can migrate effortlessly from author to author, even across disciplinary borders, creating a series of linked archives -- what McGann calls, somewhat rhapsodically, the "archive of archives." IATH has plans for a major Blake project that will set up shop this fall, a nd a number of graduate students at UVA have dedicated themselves to expanding the scope of the Rossetti DTD. (Peter Stallybrass has begun putting together a similarly-inspired Shakespeare archive at the University of Pennsylvania.) On my way out of his o ffice, McGann introduces me to Elizabeth Crocker, a grad student working on a DTD designed specifically for cartoons. Her specialty is Krazy Kat, filtered through a cultural studies sensibility. Last year an excerpt from her work appeared, oddly enough, i n The New York Times's Week in Review section: a few exemplary frames of Krazy Kat, with Crocker's commentary running alongside the graphics. Where the Rossetti DTD marks rhetorical units or line counts, Crocker's system reflects the visible language of t he funny pages, with a few lit-crit categories thrown in for good measure: "thought balloons" are indexed; so are "genderfucks."

The advent of digital text is a much contested subject these days. Everywhere you look, on-line romantics are making dramatic claims for the way hypertextual language will reinvent the traditional relationship between author and reader; meanwhile, analog humanists morosely contemplate the death of reading. In this sort of high-octane climate, it's all the more surprising to stumble across a place like Unsworth and McGann's institute, where a strong commitment to digital technology has produced, of all thi ngs, a rebirth of scholarly editing -- precisely the kind of thorough, labor-intensive scouring of words and images that Theory, with its immediate cerebral gratifications, was supposed to have killed off years ago. Sure, Crocker has her theoretical jargo n and her interest in low-brow cartoon culture, but it's the markup language she seems fascinated by, the intellectual challenge of programming a critical edition out of that strange, elusive medium of zeros and ones. When I ask about her work apart from the DTD standard, she waves her hand dismissively. "Oh, I've got loads of cultural analysis sitting around here. But what I really want to do," she says, turning to look at me intently, "is program."

Steven Johnson writes regularly about high-tech issues for the Guardian. He is also the editor of Feed, an on-line magazine of culture and technology.


