Arezzo, 19-21 Gennaio 2006 Seminario internazionale digital philology and medieval texts Roberto...

Arezzo, 19-21 Gennaio 2006

Seminario internazionale

digital philology and medieval texts

Roberto Rosselli Del Turco

Dipartimento di Scienze del Linguaggio

Università di Torino

La digitalizzazione di testi letterari di area germanica:

problemi e proposte

2

Presentation Outline

• Introduction

• Character encoding

• Metrical markup

• Conclusion

The Digital Vercelli Book Project:

http://islp.di.unipi.it/bifrost/vbd/

3

Introduction

• Digital editions require “digital objects”• Image digitizing and processing relies on

reliable and mature techniques/tools• Text encoding can be a very time-consuming

and difficult process• Literary texts belonging to the Old Germanic

tradition present specific problems • Problems range from character encoding

(transcription level) to meter encoding (edition level)

4

Character Encoding

What does “text encoding” mean?What are characters for a computer?What does “character encoding” mean?

“code” really means “number”A = 65 (dec.) or 41 (hex.) or 0100001 (bin.)

The first encoding standards: ASCII (7 and 8 bit), EBCDICThe ISO ASCII-based standards: ISO 8859-1 etc.

more characters but interchange problems

5

Old English Characters

Ancient writing systems present very specific problemsF.i. scribes writing in Old English modified the Latin alphabet to reflect OE phonological features:

modified letters (æ œ ð)

new letters (þ ƿ)

unused letters (g v) <- ʒ fSignificant variations related to different times, places (scriptoria), scribal habits, writings

6

Problems in OE character visualization

• ASCII and ISO 8859-* miss a good number of important characters

• From an HTML page of the DOE corpus:

• The corresponding source code: <img src="T04290_files/etail-uppercase.gif" align="top" border="0">fne swa he cwæde: Micel is gefea

7

The Unicode Standard

The Unicode site: http://www.unicode.org/A “universal character encoding standard used for representation of text for computer processing”Fully compatible and synchronized with the corresponding versions of International Standard ISO/IEC 10646Latest major revision 4.1, 5.0 in beta97720 different characters, room for many more (about 1 million)Universal, efficient, unambiguousCharacters – glyphs distinction

8

The Unicode Standard

9

Characters in an Old English manuscript

Considerable variation of shapes for the same character:

a s y M

• Size variation:

a i e

• Special characters (abbreviations, punctuation):

10

Encoding OE Characters

Why encode “non standard” characters

– To allow for paleographical analysis– To track scribe habits– To obtain a high quality text-only facsimile

What to encode

Not every letter variation is meaningful

How to encode

Unicode + XML markup + MUFI compliant font

11

Entities

Entities are “empty boxes” (think about constants in programming languages)Entities must be declared at the beginning of the XML (or, more often, in a separate file):

<!ENTITY lows ""> <!ENTITY longs "ſ"> 

They allow for interchange with legacy operating systems and platformThey simplify the handling of “special characters” (and more)

12

TEI P4 and Unicode

• How to use entities:

&longs; “s” not very useful

N.B.: entity names are “lost” forever!!!

&longs; “” visualization<c type='longs'>s</c> visualization + search

• ... but what about “missing” characters?

13

TEI P5 and Unicode

Use the <g> element in the text:... <g ref=“#lows”/> ...

together with the <charDesc> one

<charDesc><char id=“lows”><charName>“LATIN SMALL LETTER S LOW UNDER

THE LINE”</charName><charProp>

<localName>entity</localName><value>lows</value>

</charProp><mapping type=“standardized”>s</mapping><mapping type=“PUA”>U+F127</mapping>

</charDesc>

14

TEI P5 and Unicode

Another example:

<charDesc><gliph id=“r1”>

<gliphName>LATIN SMALL LETTER R WITH ONE FUNNY STROKE</gliphName>

<charProp><localName>entity</localName><value>r1</value>

</charProp><graphic url=“r1img.png”/>

</gliph></charDesc>

15

Metrical Markup

Old Germanic meter features:non isosyllabicsyllabic quantity not particularly relevantlong verse composed of two half-lineshalf-lines bound by alliterationstress pattern

No specific solutions in the TEI guidelinesSeveral prosodic theories (Sievers to Hoover)Stylistic features problemsRisk of complex, overlapping markup

16

General Structure of Old Germanic Meter

A markup proposal:

<lg><l>

<hl>Hwæt! Ic swefna cyst</hl><hl>secgan wylle</hl>

</l></lg>

<lg> (line group) only needed where stanzas occur (Deor)<hl> (half line) syntactic sugar for <seg type="halfline"><hlA> and <hlB> not needed

17

Meter encoding v. 1

A simple method to encode meter using attributes of the <hl> element:

<hl><met name="Sievers" code="D1" scan="//\x"/><met name="Russom" code="x/Sx" scan="x|/\x"/><met name="Hoover" code="nAn" scan="xx /\x"/>...HWÆT! WE GARDENA

</hl>

Doesn't allow for alternative scansions using the same systemDoesn't take into account syllables (and disagreement in syllable counts/stress pattern)

18

Meter encoding v. 2

A more complete (and complex) method:

<hl n="3a"><met system="Sievers" resp="Schwab" totalSyllables="5" scansion="D-1" Anacrusis="0" Extrametrical="0" Lift="1,2,4" halfLift="3" dip="5" allitGlyph="w" allitSound="/w/" allitPosition="1,2" />

<met system="Sievers" resp="Fulk" totalSyllables="4" scansion="D-1" Anacrusis="0" Extrametrical="0" Lift="1,2" halfLift="3" dip="4" allitGlyph="w" allitSound="/w/" allitPosition="1,2" />

weorc wuldorfaeder</hl>

Scansion not associated to the actual text ...

19

Meter encoding v. 2

... in fact you could take it out of the text:

<hl n="3a" id="CH.3a">weorc wuldorfaeder</hl>

...

<met target="CH.3a" system="Sievers" resp="Schwab" totalSyllables="5" scansion="D-1" Anacrusis="0" Extrametrical="0" Lift="1,2,4" halflift="3" dip="5" AlitGlyph="w" allitSound="/w/" Allitposition="1,2" />

<met target="CH.3a" system="Sievers" resp="Fulk" totalSyllables="4" scansion="D-1" Anacrusis="0" Extrametrical="0" Lift="1,2" halflift="3" dip="4" AlitGlyph="w" allitSound="/w/" Allitposition="1,2" />

20

Meter encoding v. 2

To establish a direct connection between scansion and text you have to mark syllables

You could add this to the simple model:

<hl><met name="Russom" scan="/x|/xx" sylls="1a.1.1 1a.1.2 1a.1.3 1a.1.4 1a.1.5" />

<met name="Bliss" scan="/|\xx" sylls="1a.1.1 1a.1.3 1a.1.4 1a.1.5"/>

<syl id=1a.1.1>þe<syl id="1a.1.2">od</syl><syl id="1a.1.3">cyn</syl><syl id="1a.1.4">in</syl><syl id="1a.1.5">ga</syl>

</hl>

21

Meter encoding v. 3

The most complete (and complex!) method:

<fvLib id="PS" type="Prosodic Stress"> <ignored id="x"/> //ignored in scansion <dip id="SO"/> <dipResolution id="SOR"/> //second half of resolved lift

<halfLiftLongPosition id="S1LP"/> // = V+CC <halfLiftLongNature id="S1LN"/> // = long Vowel

<halfLiftShort id="S1S"/> <liftLongPosition id="S2LP"/> // lift long by position

...</fvLib>

22

Meter encoding v. 3

The Feature Structure looks complex, but need only be designed once:

<hl n="3a" id="CH.3a"><syll id="ch3a.1">weord</syll> <syll id="ch3a.2">wul</syll><syll id="ch3a.3">dor</syll><syll id="ch3a.4">fae</syll><syll id="ch3a.5">der</syll></hl>

....<linkGrp type="metrical prosody" domains="PS AT AP AG T1" targFunc="?">

<link id="L1" targets="ch3a.1 S2LP A1 APW AGW"/><link id="L2" targets="ch3a.2 S2LP A1 APW AGW"/>...

23

Stylistic features: the kenning

• Main element:

<kenning>

Using the <kenning> element without further markup is the simplest way to markup kenningar in a text

Examples:

<kenning>swanrād</kenning><kenning>beadolēoma</kenning>

24


Sub-elements

<bw> base word

To single out the base word in a kenning

<det> determinant

To single out the determinant

<refer> referent

Explicit markup of the object or person the kenning is referred to

25


Attributes

type specifies the type of kenning

level specifies the level, i.e. if the kenning is hosted/hosting another kenning and its

position in the hierarchy

class specifies a general semantic class which the kenning belongs to

func specifies the stylistic function of the kenning

26


Examples:

<kenning><det>beado</det><bw>lēoma</bw><refer>sweord</refer>

</kenning>

<kenning level="1"> <det> <kenning level="2">

<det>heofon</det><bw>engla</bw></kenning> </det> <bw>cyning</bw></kenning>

27

A Work in Progress...

• Coming soon on the Digital Medievalist site:

http://www.digitalmedievalist.org/

Collaborative edition on the wikiMetrical-markup list for discussion ([email protected])Feel free to ask and/or suggest!

28

Conclusion

The Digital Vercelli Book team:

Federica GoriaRaffaele CioffiEmilia Di MaioRoberto Rosselli Del Turco

The Metrical Markup team:

Dorothy Carr PorterDaniel Paul O'DonnellRoberto Rosselli Del Turco

Arezzo, 19-21 Gennaio 2006 Seminario internazionale digital philology and medieval texts Roberto...

Documents

Transcript of Arezzo, 19-21 Gennaio 2006 Seminario internazionale digital philology and medieval texts Roberto...