Arezzo, 19-21 Gennaio 2006 Seminario internazionale digital philology and medieval texts Roberto...
-
Upload
winifred-johns -
Category
Documents
-
view
215 -
download
1
Transcript of Arezzo, 19-21 Gennaio 2006 Seminario internazionale digital philology and medieval texts Roberto...
Arezzo, 19-21 Gennaio 2006
Seminario internazionale
digital philology and medieval texts
Roberto Rosselli Del Turco
Dipartimento di Scienze del Linguaggio
Università di Torino
La digitalizzazione di testi letterari di area germanica:
problemi e proposte
2
Presentation Outline
• Introduction
• Character encoding
• Metrical markup
• Conclusion
The Digital Vercelli Book Project:
http://islp.di.unipi.it/bifrost/vbd/
3
Introduction
• Digital editions require “digital objects”• Image digitizing and processing relies on
reliable and mature techniques/tools• Text encoding can be a very time-consuming
and difficult process• Literary texts belonging to the Old Germanic
tradition present specific problems • Problems range from character encoding
(transcription level) to meter encoding (edition level)
4
Character Encoding
What does “text encoding” mean?What are characters for a computer?What does “character encoding” mean?
“code” really means “number”A = 65 (dec.) or 41 (hex.) or 0100001 (bin.)
The first encoding standards: ASCII (7 and 8 bit), EBCDICThe ISO ASCII-based standards: ISO 8859-1 etc.
more characters but interchange problems
5
Old English Characters
Ancient writing systems present very specific problemsF.i. scribes writing in Old English modified the Latin alphabet to reflect OE phonological features:
modified letters (æ œ ð)
new letters (þ ƿ)
unused letters (g v) <- ʒ fSignificant variations related to different times, places (scriptoria), scribal habits, writings
6
Problems in OE character visualization
• ASCII and ISO 8859-* miss a good number of important characters
• From an HTML page of the DOE corpus:
• The corresponding source code: <img src="T04290_files/etail-uppercase.gif" align="top" border="0">fne swa he cwæde: Micel is gefea
7
The Unicode Standard
The Unicode site: http://www.unicode.org/A “universal character encoding standard used for representation of text for computer processing”Fully compatible and synchronized with the corresponding versions of International Standard ISO/IEC 10646Latest major revision 4.1, 5.0 in beta97720 different characters, room for many more (about 1 million)Universal, efficient, unambiguousCharacters – glyphs distinction
9
Characters in an Old English manuscript
Considerable variation of shapes for the same character:
a s y M
• Size variation:
a i e
• Special characters (abbreviations, punctuation):
10
Encoding OE Characters
Why encode “non standard” characters
– To allow for paleographical analysis– To track scribe habits– To obtain a high quality text-only facsimile
What to encode
Not every letter variation is meaningful
How to encode
Unicode + XML markup + MUFI compliant font
11
Entities
Entities are “empty boxes” (think about constants in programming languages)Entities must be declared at the beginning of the XML (or, more often, in a separate file):
<!ENTITY lows ""> <!-- low s letter --><!ENTITY longs "ſ"> <!-- long, f shaped s letter -->
They allow for interchange with legacy operating systems and platformThey simplify the handling of “special characters” (and more)
12
TEI P4 and Unicode
• How to use entities:
&longs; “s” not very useful
N.B.: entity names are “lost” forever!!!
&longs; “” visualization<c type='longs'>s</c> visualization + search
• ... but what about “missing” characters?
13
TEI P5 and Unicode
Use the <g> element in the text:... <g ref=“#lows”/> ...
together with the <charDesc> one
<charDesc><char id=“lows”><charName>“LATIN SMALL LETTER S LOW UNDER
THE LINE”</charName><charProp>
<localName>entity</localName><value>lows</value>
</charProp><mapping type=“standardized”>s</mapping><mapping type=“PUA”>U+F127</mapping>
</charDesc>
14
TEI P5 and Unicode
Another example:
<charDesc><gliph id=“r1”>
<gliphName>LATIN SMALL LETTER R WITH ONE FUNNY STROKE</gliphName>
<charProp><localName>entity</localName><value>r1</value>
</charProp><graphic url=“r1img.png”/>
</gliph></charDesc>
15
Metrical Markup
Old Germanic meter features:non isosyllabicsyllabic quantity not particularly relevantlong verse composed of two half-lineshalf-lines bound by alliterationstress pattern
No specific solutions in the TEI guidelinesSeveral prosodic theories (Sievers to Hoover)Stylistic features problemsRisk of complex, overlapping markup
16
General Structure of Old Germanic Meter
A markup proposal:
<lg><l>
<hl>Hwæt! Ic swefna cyst</hl><hl>secgan wylle</hl>
</l></lg>
<lg> (line group) only needed where stanzas occur (Deor)<hl> (half line) syntactic sugar for <seg type="halfline"><hlA> and <hlB> not needed
17
Meter encoding v. 1
A simple method to encode meter using attributes of the <hl> element:
<hl><met name="Sievers" code="D1" scan="//\x"/><met name="Russom" code="x/Sx" scan="x|/\x"/><met name="Hoover" code="nAn" scan="xx /\x"/>...HWÆT! WE GARDENA
</hl>
Doesn't allow for alternative scansions using the same systemDoesn't take into account syllables (and disagreement in syllable counts/stress pattern)
18
Meter encoding v. 2
A more complete (and complex) method:
<hl n="3a"><met system="Sievers" resp="Schwab" totalSyllables="5" scansion="D-1" Anacrusis="0" Extrametrical="0" Lift="1,2,4" halfLift="3" dip="5" allitGlyph="w" allitSound="/w/" allitPosition="1,2" />
<met system="Sievers" resp="Fulk" totalSyllables="4" scansion="D-1" Anacrusis="0" Extrametrical="0" Lift="1,2" halfLift="3" dip="4" allitGlyph="w" allitSound="/w/" allitPosition="1,2" />
weorc wuldorfaeder</hl>
Scansion not associated to the actual text ...
19
Meter encoding v. 2
... in fact you could take it out of the text:
<hl n="3a" id="CH.3a">weorc wuldorfaeder</hl>
...
<met target="CH.3a" system="Sievers" resp="Schwab" totalSyllables="5" scansion="D-1" Anacrusis="0" Extrametrical="0" Lift="1,2,4" halflift="3" dip="5" AlitGlyph="w" allitSound="/w/" Allitposition="1,2" />
<met target="CH.3a" system="Sievers" resp="Fulk" totalSyllables="4" scansion="D-1" Anacrusis="0" Extrametrical="0" Lift="1,2" halflift="3" dip="4" AlitGlyph="w" allitSound="/w/" Allitposition="1,2" />
20
Meter encoding v. 2
To establish a direct connection between scansion and text you have to mark syllables
You could add this to the simple model:
<hl><met name="Russom" scan="/x|/xx" sylls="1a.1.1 1a.1.2 1a.1.3 1a.1.4 1a.1.5" />
<met name="Bliss" scan="/|\xx" sylls="1a.1.1 1a.1.3 1a.1.4 1a.1.5"/>
<syl id=1a.1.1>þe<syl id="1a.1.2">od</syl><syl id="1a.1.3">cyn</syl><syl id="1a.1.4">in</syl><syl id="1a.1.5">ga</syl>
</hl>
21
Meter encoding v. 3
The most complete (and complex!) method:
<fvLib id="PS" type="Prosodic Stress"> <ignored id="x"/> //ignored in scansion <dip id="SO"/> <dipResolution id="SOR"/> //second half of resolved lift
<halfLiftLongPosition id="S1LP"/> // = V+CC <halfLiftLongNature id="S1LN"/> // = long Vowel
<halfLiftShort id="S1S"/> <liftLongPosition id="S2LP"/> // lift long by position
...</fvLib>
22
Meter encoding v. 3
The Feature Structure looks complex, but need only be designed once:
<hl n="3a" id="CH.3a"><syll id="ch3a.1">weord</syll> <syll id="ch3a.2">wul</syll><syll id="ch3a.3">dor</syll><syll id="ch3a.4">fae</syll><syll id="ch3a.5">der</syll></hl>
....<linkGrp type="metrical prosody" domains="PS AT AP AG T1" targFunc="?">
<!--...--><link id="L1" targets="ch3a.1 S2LP A1 APW AGW"/><link id="L2" targets="ch3a.2 S2LP A1 APW AGW"/>...
23
Stylistic features: the kenning
• Main element:
<kenning>
Using the <kenning> element without further markup is the simplest way to markup kenningar in a text
Examples:
<kenning>swanrād</kenning><kenning>beadolēoma</kenning>
24
Stylistic features: the kenning
Sub-elements
<bw> base word
To single out the base word in a kenning
<det> determinant
To single out the determinant
<refer> referent
Explicit markup of the object or person the kenning is referred to
25
Stylistic features: the kenning
Attributes
type specifies the type of kenning
level specifies the level, i.e. if the kenning is hosted/hosting another kenning and its
position in the hierarchy
class specifies a general semantic class which the kenning belongs to
func specifies the stylistic function of the kenning
26
Stylistic features: the kenning
Examples:
<kenning><det>beado</det><bw>lēoma</bw><refer>sweord</refer>
</kenning>
<kenning level="1"> <det> <kenning level="2">
<det>heofon</det><bw>engla</bw></kenning> </det> <bw>cyning</bw></kenning>
27
A Work in Progress...
• Coming soon on the Digital Medievalist site:
http://www.digitalmedievalist.org/
Collaborative edition on the wikiMetrical-markup list for discussion ([email protected])Feel free to ask and/or suggest!