<<

ENG287 The Digital Text

Prof. Adam Hammond The limitations of plain text Implicit vs. explicit markup Implicit Markup ↣ The genre and structure of a text is given by typographical cues. Titles, footnotes, quotations, marginal glosses, etc. are marked by type size, use of bold, italics, different typefaces, white space, etc. ↣ Implicit markup is governed by conventions. You need to already be familiar with typographic conventions for indicating genre and structure to be able to make sense of them. ↣ Since these conventions are fluid and ambiguous, they are not suitable for machine processing.

Implicit vs. Explicit Markup Explicit Markup ↣ An explicit markup language describes features of texts so that these features can be understood by machines as well as . ↣ It facilitates and promotes a structural view of documents.

Implicit vs. Explicit Markup GOWER To sing a song that old was sung From ashes ancient Gower is come, Assuming man’s infirmities To glad your ear and please your eyes.

The First Four Lines of Shakespeare’s Pericles, Prince of Tyre, Implicit Markup Gower To sing a song that old was sung From ashes ancient Gower is come, Assuming man’s infirmities To glad your ear and please your eyes.

The First Four Lines of Shakespeare’s Pericles, Prince of Tyre, Explicit Markup Gower To sing a song that old was sung From ashes ancient Gower is come, Assuming man’s infirmities To glad your ear and please your eyes.

The First Four Lines of Shakespeare’s Pericles, Prince of Tyre, Explicit Markup Gower To sing a song that old was sung From ashes ancient Gower is come, Assuming man’s infirmities To glad your ear and please your eyes.

The First Four Lines of Shakespeare’s Pericles, Prince of Tyre, Explicit Markup Gower To sing a song that old was sung From ashes ancient Gower is come, Assuming man’s infirmities To glad your ear and please your eyes.

The First Four Lines of Shakespeare’s Pericles, Prince of Tyre, Explicit Markup speech

speaker’s name verse stanzas

individual lines of verse

Ordered Hierarchy of Content Objects (OHCO) Present in Explicit Markup Gower To sing a song that old was sung From ashes ancient Gower is come, Assuming man’s infirmities To glad your ear and please your eyes.

The First Four Lines of Shakespeare’s Pericles, Prince of Tyre, Imaginary XML Markup Gower To sing a song that old was sung From ashes ancient Gower is come, Assuming man’s infirmities To glad your ear and please your eyes.

The First Four Lines of Shakespeare’s Pericles, Prince of Tyre, TEI Explicit Markup GowerYesterday, my roommate Ruby stormed in and declared, I’ve had it with you. I’m moving out.

Apparently he was frustrated by the fact that I hadn’t done dishes in a week, and that I had lost his Arrested Development DVDs in a pile of filth 43.668264, -79.399889 .

C’est la vie, I said. Some people don’t like cleaning. It is your misfortune to live with one.

[. . .]

The Sample Text Tagged in TEI “Tags” Are Often Invisible from the Surface TEI Tags Can Become Display-Oriented HTML By a Process of Conversion… Value Attribute

C’est la vie

Start tag Element content End tag

A TEI Element with an Attribute and Attribute Value Data Value Attribute

C’est la vie

Start tag Element content End tag

Metadata

A TEI Element with an Attribute and Attribute Value Value Attribute

Click here

Start tag End tag Element content

An HTML Element with an Attribute and Attribute Value

XML Nesting: Properly Nested Elements a TEI Document (“Well-Formed”) Russian Nesting Dolls, a.k.a Matryoshki

XML Nesting: Improperly Nested Elements (Not “Well-Formed”)

XML Nesting: Improperly Nested Elements (Not “Well-Formed”)

Machines are unforgiving, literal readers. If you break

their rules, even slightly, they understand nothing.

XML Nesting: Improperly Nested Elements Gower To sing a song that old was sung From ashes ancient Gower is come, Assuming man’s infirmities To glad your ear and please your eyes.

If You Had Access to Every Shakespearean Marked Up Like This, What Would You Be Able to Do With It? [. . .]

Yesterday, my roommate Ruby stormed in and declared, I’ve had it with you. I’m moving out.

Apparently he was frustrated by the fact that I hadn’t done dishes in a week, and that I had lost his Arrested Development DVDs in a pile of filth43.668264, - 79.399889.

C’est la vie, I said. Some people don’t like cleaning. It is your misfortune to live with one.

[. . .]

If Every Every Written Was Marked Up In This Way, What Could You Do With It? Now for our class project, which is all about ” levels” We are participating in the first “Shared Task in the Digital Humanities”: Systematic Analysis of Narrative Texts through Annotation (SANTA) sharedtasksinthedh.github.io The goal of this project is to come up with a way of TAGGING frame , and then to use this tagging to teach machines how to recognize frame narratives. Why frame narratives? Because they should be reasonably easy to tag, and just about everything else to do with narrative is incredibly hard. Alas, though: even frame narratives are pretty challenging when you think about them. FRAME NARRATIVE.

Any time a character in a story begins to tell a story of his or her own, creating a narrative within a narrative, or a tale within a tale, you have a “frame narrative.” NARRATIVE:

The representation of a STORY (an event or series of events). In written fiction, narratives are told by NARRATORS. GENETTE’S DOODLE: CLASSIC EXAMPLE:

HAMLET

THE MOUSE- TRAP CLASSIC EXAMPLE: HAMLET

A

B CLASSIC EXAMPLE: HAMLET

A

B

TWO NARRATIVES TWO NARRATIVE LEVELS (A+B) THOUSAND AND ONE NIGHTS

A

B B B C THOUSAND AND ONE NIGHTS

A

B B B C FIVE NARRATIVES THREE NARRATIVE LEVELS (A+B+C) SHAKY TOTS (1st folio)

CHRISTOPHER SLY

TAMING OF THE SHREW SHAKY TOTS (1st folio)

A

B HANKY TOTS

“I” DOUGLAS GOVERNESS HANKY TOTS

A B C James’s TOTS is a great example of why explicitly tagging frame narratives is hard. Because this seems to me an equally justified way of representing it: HANKY TOTS

“I”

GOVERNESS HANKY TOTS

A

B And there are other ways, too.

For instance, in Chapter 11 of TOTS the governess tells Mrs. Grose a long story about what happened the previous night — Miles’s escape at night. This is the only way we learn about what happened that night: not directly narrated, but through the governess’s retelling to Mrs. Grose. Does that count as a narrative frame? HANKY TOTS?

A

B C? You could make an argument that this makes sense — that the retelling constitutes ”a narrative within a narrative.” But you could also argue the opposite.

So it’s like most literary problems. There’s no absolute answer. For ASSIGNMENT #2, you’re going to have to make these kinds of decisions and advance these kinds of arguments. You are going to be assigned one of 16 short texts. You are going to decide if it is a frame narrative and draw a box diagram. Then you are going to use XML-style markup to indicate where the frames occur. We are going to use one and only one XML element:

The element is going to take two attributes: level narr The level attribute states what narrative level the frame is on (A, B, C, D, etc.). Note that you can have multiple narratives at each level, like the three narratives at level B in my simplified representation of the Thousand and One Nights. The narr attribute states who the narrator of that frame is. We’ll just number them (1, 2, 3, 4, etc.). It’s mostly just a way of keeping track of whether the narrator at one level is the same narrator as at another level. THOUSAND AND ONE NIGHTS

A

B B B C nframe

nframe nframe nframe

nframe

OHCO STYLE REPRESENTATION HANKY TOTS

“I” DOUGLAS GOVERNESS nframe

nframe

nframe

OHCO STYLE REPRESENTATION HANKY TOTS

A

B nframe

nframe

OHCO STYLE REPRESENTATION HANKY TOTS?

A

B C? nframe

nframe

nframe

OHCO STYLE REPRESENTATION The literary part of this assignment will be tricky.

In drawing your box diagrams, you’re going to have to ask yourselves hard questions, like “Is this a separate narrative?” and “Who is speaking here?” The markup shouldn’t be hard.

All you’re doing is formalizing your box diagrams. You just need to keep track of narrative levels and speakers. I do want to add one little wrinkle, though: “closure” For me, the most interesting thing about the frames in TOTS (both of them) is that some of them aren’t “closed.” I think it’s important to represent this somehow. And I think BREAKING THE RULES of XML is best (the only?) way to do so. As I mentioned, XML requires elements to be perfectly nested within one another, with corresponding start and end tags.

XML Nesting: Properly Nested Elements a TEI Document (“Well-Formed”) Russian Nesting Dolls, a.k.a Matryoshki

XML Nesting: Improperly Nested Elements (Not “Well-Formed”)

XML Nesting: Improperly Nested Elements (Not “Well-Formed”)

Machines are unforgiving, literal readers. If you break

their rules, even slightly, they understand nothing.

XML Nesting: Improperly Nested Elements It seems to me that writers like James are specifically trying to break this kind of perfectly nested structure when they leave their frames open. Just as a web browser gets confused when it confronts HTML nesting errors, James wants us to read “ERROR” when we come to the end of TOTS and find that the opening frame(s) aren’t “closed.”

XML actually provides a brilliant way of representing what James does. It’s like he forgets to close the first nframe. There is an error in the coding — a missing end-tag — a missing

So why not just represent it like this?

Since we’re coming up with our own rules for marking up narrative frames, we can do crazy sh*t like this. Of course, so that we know you haven’t just made an error in your coding, you need to explain what you’ve done in your accompanying writeup. AND ONLY DO THIS FOR NARRATIVES THAT ARE LEFT OPEN LIKE TOTS!

For all normal cases, follow the usual nesting rules. Note that sometimes a story OPENS directly into an embedded B-level frame, and you don’t know it until the end, when it suddenly becomes clear that, all along, this was a story within a story. In that case, I would argue that nothing is actually “left open” — there’s just a delayed recognition. So I’d mark it up with the usual rules. There was a hippopotamus and he ate so many hamburgers that he died and it was really gruesome. The moral of this story is that you should eat healthy things like slugs and mice. Another special case are “mise-en-abyme” stories that are infinitely recursive. You can represent these by showing the paradoxical situation in which an A-level narrative is embedded within an A-level narrative. It was a dark and stormy night. The band of robbers huddled together around the fire. When he had finished eating, the first bandit said, "Let me tell you a story. It was a dark and stormy night and a band of robbers huddled together around the fire. When he had finished eating, the first bandit said: 'Let me tell you a story. It was a dark an stormy night and . . .'"

MISE-EN-ABYME

A A A It was a dark and stormy night. The band of robbers huddled together around the fire. When he had finished eating, the first bandit said, "Let me tell you a story. It was a dark and stormy night and a band of robbers huddled together around the fire. When he had finished eating, the first bandit said: 'Let me tell you a story. It was a dark an stormy night and . . .’” Okay, here’s how this assignment will actually work. 1. Claim one of the 16 sample texts by putting your name into this Google doc: goo.gl/Usuaow 2. Locate the corresponding .txt file on Blackboard under Assignments à Assignment #2. Download it, save it, open it, read it, interpret it, draw a box diagram for it. (There may be no framing in the story! Beware of placebos!) 3. Carefully read the Annotation Guidelines attached to the Assignment #2 prompt. 4. Open the .txt file in Sublime Text and add your annotations (tagging). Save it and add your name to the file name. 5. Complete the written portion of the assignment and submit it along with your saved annotation file electronically through Blackboard.

A video will explain all these steps. The VIDEO is the official version of the instructions — not that simplified list. Refer to it for the official procedure. There is also a pretty comprehensive written list on the Blackboard assignment page. DISCUSSION QUESTIONS! 1. Imagine a world where every literary text is totally, perfectly marked up with every conceivable TEI tag. Consider the Shakespeare and the “my roommate” texts, but go nuts. What kinds of cool new things could you find out? Would it be worth all the trouble needed to develop it? 2. Jerome McGann, editor of the TEI-based Rossetti Archive, argues that TEI “treats the humanities corpus […] as informational structures” and thus “violates some of the most basic reading practices of the humanities community.” He notes,

Poems, for example, are inherently non-hierarchical structures that promote attention to varying and overlapping sets of textual designs. […] But the computerized structures being imagined for studying these complex forms approach them as if they were expository, as if their ‘information’ were indexable, as if the works were not made from zeugmas and puns, metaphors and intertexts.

What do you think? Does the “hierarchical,” “information-centric” form of TEI violate something fundamental in the literary works they seek to encode?