Horizontal, vertical and other formats (2024)


Next: Task dependency/reuseability of resourcesUp: Problems and issues Previous: The size of the
Recommendations

Horizontal, vertical and other formats

We must, for practical as well as definitional reasons,restrict our attention to corpora considered as collections oftexts or textual samples of language. Texts are linear;syntactic structures, on the other hand, are often representedin two-dimensional terms, especially as tree structures, or(in greater detail) as tree structures, the nodes of which aresets of attributes and values. As far as syntactic annotationis concerned, we are interested only in how these two- ormulti-dimensioned structures are represented in relation tothe linearity of texts.

There are two general commonly-used linear formats for storing,inputting and outputting text data: horizontal and vertical. It ispossible to represent a syntactically annotated text in either ofthese formats, without changing the nature of the annotation. Theconversion of a horizontal to a vertical format or vice versa is arelatively trivial operation if undertaken automatically. However, fromthe user's point of view, the difference between the two formats iscertainly not trivial, as it may make the difference between anintelligible and an unintelligible presentation. We will use examplesfrom some corpora to illustrate this.

The first example is from the Associated Press Corpus withLancaster skeleton parsing annotation. The sentence in 2 can be representedin a horizontal format, as in table 1.

(2) The door, which wasequipped with neither bell nor knocker, was blistered and distained.
[N The_AT door_NN1 ,_, [Fr [N which_DDQ N] [V was _VBDZ
equipped_VVN [P with_IW [N neither_LE [ bell_NN1 nor_CC
knocker_NN1 ] N] P] V] Fr] N] ,_,
[V was_VBDZ [blistered_VVN and_CC distained_VVN ] V] ._.
Table 1: Horizontal format

The labelled bracketed analysis can be represented in avertical format, as in table 2.The original sentence is in the first column, the part-of-speech tags inthe second, and the brackets and labels constituting thesyntactic annotation appear in the third column.

The AT [N
door NN1
, ,
which DDQ [Fr[N]
was VBDZ [V
equipped VVN
with IW [P
neither LE [N
bell NN1 [
nor CC
knocker NN1 ]N]P]V]Fr]N]
, ,
was VBDZ [V
blistered VVN
and CC
distained VVN V]
. .
Table 2: Vertical format

Table 3 is an example in horizontal format from the IBM Paris Treebank (Langé 1994).

[N Ce_DDEMMS guide_NCOMS N] [V [P leur_PPCA6MP P]permet_VINIP3
[P de_PREPD [Vi se_PPRE6MP familiariser_VPRN [Pavec_PREP
[N les_DARDFP opérations_NCOFP [P de_PREPD [Nréseau_NCOMS
[A local_AJQMS A]N]P] [A effectuées_VTRPSFP [Ppar_PREP
[N les_DARDMP utilisateurs_NCOMP N]P]A]N]P]Vi]P]V] ._.
Table 3: Horizontal format: IBM Paris Treebank

The horizontal format is more compact, and is easier to read so longas the amount of syntactic information interspersed with the words isnot too dense. The vertical format is more convenient and morereadable if there is too much syntactic information to beconveniently shown in the horizontal format. Moreover, the verticalformat lends itself to a number of parallel fields of information, sothat (for example) the actual orthographic text (as a sequence ofword forms and punctuation marks) can be separated out from thesequence of morphosyntactic tags, and both of these separated fromthe representation of a phrase structure tree. Other fields maycontain corpus location references, and deep syntactic information(such as ellipsis) alongside in a separate field from the surfacesyntactic information. Table 4 is an example from the SUSANNEcorpus (Sampson 1995), which gives an impression of the variousaligned information types that can be given. The columns (i.e fields)contain the following information:

Field 1:
text references
Field 2:
part-of-speech tags
Field 3:
the text words
Field 4:
base-form (lemmatised forms of Field 3; e.g. said is lemmatised as `say')
Field 5:
syntactic annotation (brackets and labels)
A01:0010a YB <minbrk> [Oh.Oh]
A01:0010b AT The the [O[S[Nns:s.
A01:0010c NP1s Fulton Fulton [Nns.
A01:0010d NNL1cb County county .Nns]
A01:0010e JJ Grand grand .
A01:0010f NN1c Jury jury .Nns:s]
A01:0010g VVDv said say [Vd.Vd]
A01:0010h NPD1 Friday Friday [Nns:t.Nns:t]
A01:0010i AT1 an an [Fn:o[Ns:s.
A01:0010j NN1n investigation investigation .
A01:0020a IO of of [Po.
A01:0020b NP1t Atlanta Atlanta [Ns[G[Nns.Nns]
A01:0020c GG +<apos>s - .G]
A01:0020d JJ recent recent .
A01:0020e JJ primary primary .
A01:0020f NN1n election election .Ns]Po]Ns:s]
A01:0020g VVDv produced produce [Vd.Vd]
A01:0020h YIL <ldquo> - .
A01:0020i ATn +no no [Ns:o.
A01:0020j NN1u evidence evidence .
A01:0020k YIR +<rdquo> - .
A01:0020m CST that that [Fn.
A01:0030a DDy any any [Np:s.
A01:0030b NN2 irregularities irregularity .Np:s]
A01:0030c VVDv took take [Vd.Vd]
A01:0030d NNL1c place place [Ns:o.Ns:o]Fn]
Ns:o]Fn:o]S]
A01:0030e YF +. - .O]
Table 4: Vertical format: SUSANNE

The field that indicates the structure of the sentence can bemade more graphically explicit by the use of indentation. Theexample from TOSCA in table 5illustrates this. On the first level isUtterance, the second level NP, VP and PP, and so on. (Thisindented format is in fact an intermediate structure, the final outputbeing represented as a tree on the screen.)

-:TXTU()
UTT:S(act,indic,inter,motr,pres,unm)
INTOP:AUX(do,indic,pres){Does}
SU:NP()
NPHD:PN(pers,sing){he}
:VP(act,do,indic,motr)
MVB:LV(indic,infin,motr){realise}
OD:CL(act,indic,intens,pres,unm,zsub)
SU:NP()
NPHD:PN(pers,sing){he}
V:VP(act,indic,intens,pres)
MVB:LV(indic,intens,pres){is}
CS:AJP(prd)
AJHD:ADJ(prd){wrong}
PUNC:PM(qm){?}
Table 5: Indented format: TOSCA


Next: Task dependency/reuseability of resourcesUp: Problems and issues Previous: The size of the
Horizontal, vertical and other formats (2024)
Top Articles
Latest Posts
Article information

Author: Lakeisha Bayer VM

Last Updated:

Views: 5870

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Lakeisha Bayer VM

Birthday: 1997-10-17

Address: Suite 835 34136 Adrian Mountains, Floydton, UT 81036

Phone: +3571527672278

Job: Manufacturing Agent

Hobby: Skimboarding, Photography, Roller skating, Knife making, Paintball, Embroidery, Gunsmithing

Introduction: My name is Lakeisha Bayer VM, I am a brainy, kind, enchanting, healthy, lovely, clean, witty person who loves writing and wants to share my knowledge and understanding with you.