Interlinear gloss

In linguistics and pedagogy, an interlinear gloss is a gloss (series of brief explanations, such as definitions or pronunciations) placed between lines, such as between a line of original text and its translation into another language. When glossed, each line of the original text acquires one or more corresponding lines of transcription known as an interlinear text or interlinear glossed text (IGT) – an interlinear for short. Such glosses help the reader follow the relationship between the source text and its translation, and the structure of the original language. In its simplest form, an interlinear gloss is a literal, word-for-word translation of the source text.

History

Interlinear glosses have been used for a variety of purposes over a long period of time. One common usage has been to annotate bilingual textbooks for language education. This sort of interlinearization serves to help make the meaning of a source text explicit without attempting to formally model the structural characteristics of the source language. Such annotations have occasionally been expressed not through interlinear layout, but rather through enumeration of words in the object and meta language. One such example is Wilhelm von Humboldt's annotation of Classical Nahuatl: 1 2 3 4 5 6 7 8 9 ni- c- chihui -lia in no- piltzin ce calli 1 3 2 4 5 6 7 8 9 ich mache es für der mein Sohn ein Haus This "inline" style allows examples to be included within the flow of text, and for the word order of the target language to be written in an order which approximates the target language syntax. (In the gloss here, mache es is reordered from the corresponding source order to approximate German syntax more naturally.) Even so, this approach requires the readers to "re-align" the correspondences between source and target forms. More modern 19th- and 20th-century approaches took to glossing vertically, aligning the same sort of word-by-word content in such a way that the metalanguage terms were placed vertically below the source language terms. In this style, the given example might be rendered thus (here English gloss): ni- c- chihui -lia in no- piltzin ce calli I it make for to-the my son a house "I made my son a house." Here word ordering is determined by the syntax of the object language. Finally, modern linguists have adopted the practice of using abbreviated grammatical category labels. A 2008 publication which repeats this example labels it as follows: ni-c-chihui-lia in no-piltzin ce calli 1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus This approach is denser and also requires effort to read, but it is less reliant on the grammatical structure of the metalanguage for expressing the semantics of the target forms. In computing, special text markers are provided in the Specials Unicode block to indicate the start and end of interlinear glosses.

Structure

Though there is no formal specification for the IGT format, the Leipzig Glossing Rules are a set of guidelines that aim to standardize the format as much as possible. An interlinear text for linguistics will commonly consist of some or all of the following, usually in this order, from top to bottom: and finally As an example, the following Taiwanese Minnan clause has been transcribed with five lines of text: (1.) goá iáu-boē koat-tēng tang-sî boeh tńg-khì (2.) goa1 iau1-boe3 koat2-teng3 tang7-si5 boeh2 tng1-khi3. (3.) goa2 iau2-boe7 koat4-teng7 tang1-si5 boeh4 tng2-khi3. (4.) I not-yet decide when want return. (5.) "I have not yet decided when I shall return." Word-by-word alignment. According to the Leipzig Glossing Rules, it is standard to left-align the words in the object language with the corresponding words in the metalanguage; this alignment can be seen between lines (1-3) and line (4). Morpheme-by-morpheme correspondence. At the sub-word level, segmentable morphemes are separated by hyphens, both in the example and in the gloss. There should be the same number of hyphens in the example and in the gloss, as shown in the following example: Gila abur-u-n ferma hamišaluǧ güǧüna amuqʼ-da-č now they-OBL-GEN farm forever behind stay-FUT-NEG 'Now their farm will not stay behind forever.' Grammatical category labels. In amuqʼ-da-č, the stem (amuq) is translated into the corresponding English lexeme (stay) while the inflectional affixes (da) and (č) are inflectional affixes representing future tense and negation. These inflectional affixes are glossed as FUT and NEG; a list of standard abbreviations for grammatical categories that are widely used in linguistics can be found in the Leipzig Glossing Rules. One-to-many correspondences. When a single object-language element corresponds to several metalanguage elements, they are separated by periods. E.g., çık-mak come.out-INF 'to come out' Non-overt elements. if the morpheme-by-morpheme gloss (middle line) contains an element that does not correspond to an overt element in the example, a standard strategy is to include an overt "ø" in the object-language text, which is separated by a hyphen like an overt element would be: puer-ø boy-NOM 'boy' Reduplication is treated similarly to affixation but with a tilde (instead of the standard hyphen) that connects the copied element to the stem: bi~~bili IPFV~~buy 'is buying'

Punctuation

In interlinear morphological glosses, various forms of punctuation separate the glosses. Typically, the words are aligned with their glosses; within words, a hyphen is used when a boundary is marked in both the text and its gloss, a period when a boundary appears in only one. That is, there should be the same number of words separated with spaces in the text and its gloss, as well as the same number of hyphenated morphemes within a word and its gloss. This is the basic system, and can be applied universally. For example: oda-dan hız-lı çık-tı-m room-ABL speed-COM go.out-PFV-1sg room-from speed-with go_out-perfective-I 'I left the room quickly.' An underscore may be used instead of a period, as in go_out-PFV, when a single word in the source language happens to correspond to a phrase in the glossing language, though a period would still be used for other situations, such as Greek oikíais house.FEM.PL.DAT 'to the houses'. However, sometimes finer distinctions may be made. For example, clitics may be separated with a double hyphen (or, for ease of typing, an equal sign) rather than a hyphen. A French example: je⹀te⹀aime I⹀you⹀love 'I love you.' Affixes which cause discontinuity (infixes, circumfixes, transfixes, etc.) may be set off by angle brackets, and reduplication with tildes, rather than with hyphens: sulat su~~sulat s⟨um⟩ulat s⟨um⟩u~~sulat write contemplative mood~~write ⟨agent trigger.past⟩write ⟨agenttrigger⟩contemplative~~write (See affix for other examples.) Morphemes which cannot be easily separated out, such as umlaut, may be marked with a backslash rather than a period: unser-n Väter-n our-DAT.PL father\PL-DAT.PL 'to our fathers' (the singular of Väter 'fathers' is Vater) A few other conventions which are sometimes seen are illustrated in the Leipzig Glossing Rules.

Interlinear gloss resources

Efforts have been undertaken to digitize IGT for hundreds of the world's languages.

Online Database of Interlinear Text

The Online Database of Interlinear Text (ODIN) is a database of over 200,000 instances of interlinear glosses for more than 1,500 languages extracted from scholarly linguistic research. The database was constructed in two phases: automatic construction followed by manual correction. The automatic construction stage itself was completed in three steps: In the manual correction phase, the database creators manually corrected the boundaries of the interlinear gloss instances discovered by the sequence-labelling method in Step 2 of the automatic construction phase. The creators then verified the language names and language codes in a second and third pass over the data, respectively.

Automatic processing of interlinear gloss instances

Natural Language Processing models leveraging interlinear gloss resources, such as the Online Database of Interlinear Text, have been developed.

Automatic glossing

Natural Language Processing systems, for example, have been developed to automatically produce interlinear glosses.: mi-s ħumukuli elu-ab-ok'ek'-asi anu you-GEN camel we.OBL-ERG.1.PL-steal-PRT be.NEG 'We didn't steal your camel.' Given the morpheme segmented line (first line above) and the free translation line (third line above), the task is to produce the middle glossed line comprising stem translations (e.g., mi:you) and the grammatical category labels corresponding to affixes (e.g., a:ERG.1.PL). Sequence prediction models from Natural Language Processing have been used to perform this task. Two factors contribute to the difficulty of this task: Some constructed languages like Ithkuil and Lojban have automated tools that (in theory) will always result in accurate glossing due to the regularized and logical nature of these languages. Here are examples of glosses of Ithkuil and Lojban respectively: A'zvaţcaxüẓpöňḑeššaščëirktöňçogjahnói nnţ S1-“dog”-‘what is inferred to be X’₁-‘huge’₁-‘as a planned result of human action’₁-‘some or other’₁-DDF-'as powder or dust’₁-‘eaten as afternoon snack’₁-‘trustworthiness of source unknown, and info not verifiable’₁-‘conjecture/theory/hypothesis that is testable/verifiable’₁-COU-POT "It can only mean one thing..." There's only one explanation; can't prove this and my mental state is somewhat foggy, but it would definitely have been an ill formed fusion of that pair of different man-made huge creatures that seem to be dogs in the form of dust served as an afternoon snack way over there by you. Oh and don't quote me on that.mi lumci le creka le grasu le rirxe

Automatic discovery of morphological structure from glosses

Researchers have used interlinear glosses to obtain the morphological paradigms of the object language (i.e., the language being glossed). To automatically create morphological paradigms from interlinear glosses, researchers have created tables for every stem in the gloss and a (possibly empty) slot for every grammatical category (e.g., ERG) in the gloss. For instance, given the glossed sentence below: Vecher-om ya pobeja-la v magazin evening-INS 1.SG.NOM run-PFV.PST.SG.FEM in store.ACC 'In the evening I ran to the store.' There would be a paradigm for the stem pobeja with slots for PFV.PST.SG.FEM and PFV.PST.SG.MASC: The slot for PFV.PST.SG.FEM would be filled (since it was observed in the interlinear gloss data) but the slot for PFV.PST.SG.MASC would be empty (assuming that no other interlinear gloss instance contains pobeja inflected for the PFV.PST.SG.MASC grammatical category). A statistical machine learning model for morphological inflection can be used to fill in the missing entries.

This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
Bliptext is not affiliated with or endorsed by Wikipedia or the Wikimedia Foundation.

Tools

Edit History

Contents