Top  UNL Expression   Relations   Attributes  Universal Words  UNLKB  Knowledge Representation in UNL  Logical Expression in UNL  UNL System

UNL 2005 Specifications

7 June 2005
Copyright © UNL Center of UNDL Foundation


Chapter 1  UNL Expression

UNL expresses information or knowledge in the form of semantic network with hyper-node. Differently from natural languages, UNL expressions are unambiguous. In the UNL semantic network, nodes represent concepts and arcs represent relations between concepts. Concepts can be annotated. Such semantic network is called "UNL expression".

The unit of UNL expression is a UNL document. A UNL document is considered as a hyper-node composed of a semantic network among sentences or paragraphs. A paragraph and a sentence are hyper-nodes too. A hyper-node of a paragraph is composed of a semantic network among sentences or other paragraphs. A hyper-node of a sentence is composed of a semantic network among Universal Words (UWs).

The semantic network of the hyper-node of a UNL document consists of two parts: a sequence of hyper-nodes of paragraphs or sentences of the document, and a set of semantic relations among hyper-nodes of the paragraphs or sentences. A sequence of hyper-nodes can be considered as a directed (ordered) graph linked with the relation "nxt" which express the physical sequence of paragraphs or sentences. Likewise, the semantic network of a hyper-node of a paragraph also consists of two parts: a sequence of hyper-nodes of sentences or other paragraphs included in the paragraph, and a set of semantic relations among hyper-nodes of the paragraphs or sentences. The semantic network of a hyper-node of a sentence consists of a set of semantic relations between UWs.

Every hyper-node of a paragraph or sentence and UW can be referred to from any other hyper-node of a paragraph or a sentence or a UW.

As a language for representing information and knowledge described in natural languages, UNL has all the components corresponding to that of a natural language. It is composed of words expressing concepts called "Universal words", also referred to as UWs that are inter-linked with other UWs to form the UNL expressions of sentences. These links, called as "relations", specify roles of each word in a sentence. The subjective meanings intended by the author are expressed through "attributes".

This chapter describes how the UNL expression is and the formats of UNL document and UNL expression. It consists of the following three sections:

  1. Form of UNL Expression
  2. Format of UNL Document
  3. Format of UNL Expression

Meta-symbols used in descriptions of descriptive formats are for the following meanings:

 " and " indicate a predefined delimiter
 < and > indicate a non-terminal symbol
 { and } indicate a range
 [ and ] indicate an omissible part
 ... indicates more than 0 times repetition of the front part
 ::= indicates the left part can be replaced by the right part
 | indicates possible choices

1.1  Form of UNL Expression

UNL expression is a semantic network made up of a set of binary relations, each binary relation is composed of a relation and two UWs that hold the relation. A binary relation of UNL is expressed in the following way:

<relation>  (  <uw1>,  <uw2>  )

In <relation>, one of the relations defined in the UNL Specifications is described. In <uw1> and <uw2>, the two UWs that have the relation given by <relation> are described. A semantic network of UNL expression is a directed graph composed of binary relations with direction. The three elements of a binary relation have the following interrelationship:

<uw1> -- <relation> -> <uw2>

This binary relation is interpreted as that:

the UW given in <uw2> plays the role indicated by the relation given in <relation> held by the UW given in <uw1>; whereas the UW given in <uw1> holds the relation given in <relation> with the UW given in <uw2>.

Hyper-nodes are allowed in the semantic network of UNL expression. That is, each node in a graph, <uw1> and <uw2> of a binary relation, can be hyper node containing a semantic network. Such a hyper node made up of a semantic network of UNL expression is called a gscopeh. A scope can be connected with other UWs or scopes because a scope is considered as a UW. A binary relation in a scope is distinguished from others by assigning an ID to the <relations>.

The general format of binary relations of UNL expression allowing scopes is the following:

Table 1.1

<relation : <scope-ID> ( <node1> , <node2> )

Where,

1.2  Format of UNL Document

UNL expressions are described in UNL documents in the following format. A UNL document is a text file that includes the original sentences, UNL expressions, sentences in target languages, and tags of UNL document.

A UNL document is enclosed with tags g[D:<dinf>]h and g[/D]h. Within these tags, each paragraph is enclosed with a pair of tags g[P:<p_num>]h and g[/P]h, and each sentence is enclosed with a pair of tags g[S:<s_num>]h and g[/S]h. Inside a sentence, the text of original sentence is enclosed with g{org:<l_tag>}h and g{/org}h, its UNL expression is enclosed with g{unl:<uinf>}h and g{/unl}h. Sentences of target languages can also be stored in the UNL document. Each target sentence is enclosed with a pair of language tags g{<l_tag>}h and g{</l_tag>}h following the UNL expression of each sentence.

Descriptive format of a UNL document is the following:

Table 1.2

<UNL Document>

::= "[D:" <dinf> "]"
{ "[P:h <paragraph number> g]"
{ "[S:" <sentence number> "]"
  <sentence>
  "[/S]" 
 ["[RS]
  <reference description>
 
"[/RS]"
  "[DS]
  <d_structure description>
  "[/DS]"
 
]
}
...
  "[/P]"
 ["[RS]
  <reference description>
  "[/RS]"
  "[DS]
  <d_structure description>
  "[/DS]"
 
]

}
...
"[/D]"

<dinf>

::= <document name> "," <author name> [ "," <document ID> "," <date> "," <email address> ]

<document name>

::= "dn=" <character string>

<author name>

::= "on=" <character string>

<document ID>

::= "did=" <character string>

<date>

::= "dt=" <character string>

<email address>

::= "mid=" <character string>

<sentence>

::= "{org:" <l-tag> [ "=" <code> ] "}"
<source sentence>
"{/org}"
"{unl" [ ":" <uinf> ] "}"
<UNL expression of a sentence>
"{/unl}"
{ "{" <l-tag> [ "=" <code> ] [ ":" <sinf> "]" "}"
<target sentence>
"{/" <l-tag> "}" }
...
/* whole information that necessary for a sentence */

<l-tag>

::= "ab" | "cn" | "de" | "el" | "es" | "fr" | "id" | "hd" | "it" | "jp" | "lv" | "mg" | "pg" | "ru" | "sh" | "th"  /* language codes : language tags */

<code>

::= <character code name>

<character code name>

::= <character string>

<source sentence>

::= <character string>

<target sentence>

::= <character string>

<uinf>

::= <system name> "," <post-editor name> "," <reliability> [ "," <date> "," <email address> ]

<sinf>

::= <system name> "," <post-editor name> "," <reliability> [ "," <date> "," <email address> ]

<system name>

::= "sn=" <character string>

<post-editor name>

::= "pn=" <character string>

<reliability>

::= "rel=" <a number>

<paragraph number>

::= <a number>, must be unique within a UNL document and must be given in sequence

<sentence number>

::= <a number>, must be unique within a UNL document and must be given in sequence

The tags used in a UNL document are the following:

Table 1.3

[D:<dinf>]

indicates the beginning of a document and the necessary information about the document

[/D]

indicates the end of a document

[P:<p_num>]

indicates the beginning of a paragraph

[/P]

indicates the end of a paragraph

[S:<s_num>]

indicates the beginning of a sentence and the sentence number

[/S]

indicates the end of a sentence

[RS]

indicates the beginning of reference description

[/RS]

indicates the end of reference description

[DS]

indicates the beginning of document structure description

[/DS]

indicates the end of document structure description

{org:<l_tag>=<code>}

indicates the beginning of an original/source sentence, language and character code, g=<code>h can be omitted

{/org}

indicates the end of an original sentence

{unl:<uinf>}

indicates the beginning of the UNL expressions of a sentence and necessary information, g:<uinf>h can be omitted

{/unl}

indicates the end of the UNL expressions of a sentence

{<l_tag>}

indicates the beginning of a target sentence of the language indicated by <l_tag>

{/<l_tag>}

indicates the end of a target sentence of the language indicated by <l_tag>

Descriptive format of <reference description> is as follows:

Table 1.4

<reference description>

::= { <referent node> "," <referee node> }
 ...

<referent node>

::= <uw node1> | <sentence node> | <paragraph node> 

<referee node>

::= <uw node2> | <sentence node> | <paragraph node>

<uw node1>

::= { <UW>":"<UW-ID> | ":"<Scope-ID> } [":"<sentence node>]
<sentence node> can be omitted, in this case the UW or the scope mush exist in the current sentence. This description only can be used following [S] and [/S] description.

<uw node2>

::= { <UW>":"<UW-ID> | ":"<Scope-ID> } ":"<sentence node>
<sentence node> can not be omitted.

<sentence node>

::= ":S:"<sentence number>

<paragraph node>

::= ":P:"<paragraph number>

Descriptive format of <d_structure description> is as follows:

Table 1.5
<d_structure description>

::= { <relation> g(g <sentence node>|<paragraph node> ","  <sentence node>|<paragraph node> ")" }
 ...

For <UNL expression of a sentence> see next section.

1.3  Format of UNL Expression

A UNL expression of a sentence is identified with the following tags: {unl} and {/unl}.

Any component, such as a word, phrase and, of course, a sentence of a natural language can be represented with UNL expressions. A UNL expression therefore consists of a UW or a (set of) binary relation(s). In UNL documents, a UNL expression for a sentence is enclosed by the tags {unl} and {/unl} inside [S] and [/S]. If a UNL expression consists of a UW, this UW should be enclosed further by the tags [W] and [/W]. If necessary, the whole sentence can also be expressed as a scope. In this case, the Scope-ID of the scope should be enclosed by [W] and [/W].

There are two forms for expressing UNL expressions, one is the table form and the other is the list form. The table form is made up of a set of binary relations, and each binary relation is expressed by connecting the two related UWs directly. And the list form is divided into two parts: a list of UWs corresponding IDs and a list of binary relations described by the IDs. The table form of a UNL expression is more readable than the list form, but the list form of a UNL expression is more compact than the table form. These two forms are convertible with each other.

1.3.1  The Table Form of UNL Expression

Table 1.6

A UNL expression consists of a set of binary relations

{unl}
<binary relation>
...
{/unl}

A UNL expression consists of a UW

{unl}
[W]
<UW><attribute list>
[/W]
{/unl}

  A UNL expression consists of a scope

{unl}
[W]
h: h<Scope-ID><attribute list>
[/W]
<binary relation>
...
{/unl}

Each tag and binary relation should end with a return code: g0x0ah.

Syntax of Binary Relation

Descriptive format of a binary relation of the table form is the following:

Table 1.7

<binary relation>

::= <relation> [g:h<Scope-ID>] g(g
{{ <UW1> [":" <UW-ID1>]} | { g:h <Scope-ID1> }}[<attribute list>] g,h
{{ <UW2> [":" <UW-ID2>]} | { g:h <Scope-ID2> }}[<attribute list>] g)h

or

::= <relation> [g:h<Scope-ID>] g(g
{{ <UW1> [":" <UW-ID1>]} | { g:h <Scope-ID1> }}[<attribute list>] g,h
<referee node> g)h

or

::= <relation> [g:h<Scope-ID>] g(g
<referee node> g,h
{{ <UW2> [":" <UW-ID2>]} | { g:h <Scope-ID2> }}[<attribute list>] g)h

<relation>

::= a relation label, defined in gChapter 2 Relationsh

<UW>

::= a UW, see gChapter 4 Universal Wordsh

<attribute list>

::= { g.h <attribute> } c

<attribute>

::= an attribute, see gChapter 3 Attributesh

<UW-ID>

::= two alphanumeric characters of e0f - e9f and eAf - eZf

<Scope-ID>

::= two digits of g00h - g99h.  g00h must be used for the main sentence and can be omitted.

<referee node>

see table 1.4

Scope-ID

A UNL expression can include more than one scope. Scope-IDs are for identifying each concept specified by scopes in a UNL expression. A scope is a group of binary relations that can be referred to as a UW by indicating its Scope-ID in the format of g:<Scope-ID>h. A node described in this way in the UNL expression network that refers to a scope is called a gScope Nodeh.

UW-ID

UW-IDs are for identifying each concept specified by UWs in a UNL expression. If a UW appears in a UNL expression more than once and means different concepts (things or events), a unique UW-ID must be given to each concept of the UWs.

The following shows an example of UNL expressions of the sentence gI can hear a dog barking outsideh:

{unl}
agt(hear(icl>perceive(agt>person,obj>thing)).@entry,    I)
obj(hear(icl>perceive(agt>person,obj>thing)).@entry,    :01)
agt:01(bark(agt>dog).@entry,    dog(icl>canine))
plc:01(bark(agt>dog).@entry,    outside(icl>place))
{/unl}

In above UNL expression, gagth, gobjh and gplch are relation labels, gIh, gbark(agt>dog)h, gdog(icl>canine)h, ghear(icl>perceive(agt>person,obj>thing))h and goutside(icl>place)h are UWs. ga dog barking outsideh is expressed by a scope, and g01h is given as the Scope-ID to the scope. g:01h appears in the position of a UW is the scope node to refer to the scope. Binary relations indicated by the Scope-ID define the contents of the scope.

1.3.2  The List Form of UNL Expression

Note: reference description corresponding to <referee node> in <binary relation> as shown in table 1.7 has not yet been reflected in the following specifications.

The list form of a UNL expression consists of a set of UWs and a set of encoded binary relations (expressed by UW-IDs) of a sentence. In case a whole sentence is treated as a scope, the Scope-ID of the scope for the sentence can be included in the UW list between [W] and [/W].

Table 1.8

{unl}
[W]
{<UW> | {g:h<Scope-ID>}}[<attribute list>]h:h<UW-ID>
c
[/W]
[R]
<binary relation by UW-IDs>
c
[/R]
{/unl}  

The tags used above have the following meanings.

Table 1.9

[W]

indicates the beginning of UW list

[/W]

indicates the end of UW list

[R]

indicates the beginning of encoded binary relations

[/R]

indicates the end of encoded binary relations

Each tag, encoded binary relation and UW should end with a return code: g0x0ah.

UW List

UWs of a UNL expression must be listed between [W] and [/W] with different (unique) UW-IDs for different concepts. This means that the same UW expression but expressing different concepts (instances) must be given different UW-IDs. A scope must be defined again in the UW list.

Syntax of an Encoded Binary Relation

Table 1.10

<binary relation by UW-IDs>

::= <UW-ID1><relation[":"<Scope-ID>]<UW-ID2>

<UW-ID>

::= two alphanumeric character of "0" - "9" and "A" - "Z"

<Scope-ID>

::= two digits of "00" - "99"

For instance, the following shows an example of the list form of a UNL expression of the sentence gI can hear a dog barking outsideh.

{unl}
[W]
I:01
hear(icl>perceive(agt>person,obj>thing)).@entry:02
dog(icl>canine):03
bark(agt>dog).@entry:04
outside(icl>place):05
:01:06
[/W]
[R]
02aoj01
02obj06
04agt:0103
04plc:0105
[/R]
{/unl}  

In the above example, between [W] and [/W], UWs eIf, ehear(icl>perceive(agt>thing,obj>thing))f, edog(icl>canine)f, ebark(agt>dog),f eoutside(icl>place)f and the scope node g:01h are given a UW-ID from 01 to 06 respectively.

Between [R] and [/R], binary relations are described using the UW-IDs defined in the UW list. For example, h02obj06h in the second line shows that the concept identified by UW-ID 06 is the eobjf of the concept identified by UW-ID 02. UW-ID 06 means the concept of scope 01, and UW-ID 02 means the concept of ehear(icl>perceive(agt>thing,obj>thing))f.

Binary relations g04agt:0103h and g04plc:0105h express the UNL expression of scope 01. This is indicated by the Scope-ID g01h described following the relations eagtf and eplcf.


Top  UNL Expression   Relations   Attributes  Universal Words  UNLKB  Knowledge Representation in UNL  Logical Expression in UNL  UNL System