Universal Word

Specifications

Version 2.0

 

 

UNL Center, UNDL Foundation

Edit. 19 February 2002

 

 

Introduction

 

A UW (Universal Word) represents simple or compound concepts. There are two classes of UWs:

· simple, unit concepts called gUWsh (Universal Words), and

· compound structures of binary relations grouped together and called gCompound UWsh. These are indicated with Compound UW-IDs, as described below.

 

1  UWs

 

1.1  Syntax of UW

 

A UW is made up of a character string (an English-language word) followed by a list of constraints. The meaning and function of each of these parts is described in the next section, on Interpretation.

The following expressions provide a more formal statement of the syntax of UWs.

 

<UW>

::= <Head Word> [<Constraint List>]

<Head Word>

::= <character>c

<Constraint List>

::= g(g <Constraint> [ g,h <Constraint>]c g)h

<Constraint>

::= <Relation Label> { g>h | g<h } <UW> [<Constraint List>] |

<Relation Label> { g>h | g<h } <UW> [<Constraint List>]

[ { g>h | g<h } <UW> [<Constraint List>] ] c

<Relation Label>

::= gagth | andh | gaojh | gobjh | giclh | ...

<character>

::= gAh | ... | gZh | gah | ... | gzh | 0 | 1 | 2 | ... | 9 | g_h | h g | g#h | g!h | g$h | g%h | g=h | g^h | g~h | g|h | g@h | g+h | g-g | g<h | g>h | g?h

 

 

1.2  Interpretation

 

Head Word

 

The Head Word is an English word/compound word/phrase/sentence that is interpreted as a label for a set of concepts: the set made up of all the concepts that may correspond to that in English. A Basic UW (with no restrictions or Constraint List) denotes this set. Each Restricted UW denotes a subset of this set that is defined by its Constraint List. Extra UWs denote new sets of concepts that do not have English-language labels.

Thus, the Head Word serves to organize concepts and make it easier to remember which is which.

 

Constraints or Restrictions

 

The Constraint List restricts the interpretation of a UW to a subset or to a specific concept included within the Basic UW, thus the term gRestricted UWsh.

The Basic UW gdrinkh, without a Constraint List, includes the concepts of gputting liquids in the mouthh, gliquids that are put in the mouthh, gliquids with alcoholh, gabsorbh and others.

The Restricted UW gdrink(icl>do,obj>liquid)h denotes the subset of these concepts that includes gputting liquids in the mouthh, which in turn corresponds to verbs such as gdrinkh, ggulph, gchugh and gslurph in English.

The restrictions of Restricted UWs, their Constraint Lists, are Constraints. The Constraints that use the Relation Labels defined above can be seen as an abbreviated notation for full binary relations: drink(icl>do,obj>liquid) is the same as obj(drink(icl>do),liquid) which means something like gcases of drinking where the eobjf is a liquidh.

Every constraint in the Constraint List should use the Relation Labels listed in Appendix 2 and each of them should be sorted in alphabetical order.

The relation label "icl" can be omitted when it is repeated to restrict the upper concept. For instance, a UW like gxxx(icl>change(icl>occur))h can be simply defined as gxxx(icl>change>occur)h.

 

1.3  Types of UW

 

UWs, therefore, are character strings (words or expressions) that can be given specifications, attributes and Instance-IDs.  Their function in the UNL system is to represent simple concepts. The three types of UWs, in order of practical importance, are:

 

· Basic UWs, which are bare Head Words with no Constraint List, for example:

 

go

take

house

state

 

· Restricted UWs, which are Head Words with a Constraint List, for example:

 

state(icl>express)

state(icl>country)

state(icl>abstract thing)

state(icl>government)

 

· Extra UWs, which are a special type of Restricted UW, for example:

 

ikebana(icl>flower arrangement)

samba(icl>dance)

soufflé(icl>food)

 

 

Basic UWs

 

Basic UWs are character strings that correspond to an English word. A basic UW denotes all the concepts that may correspond to those in English. They are used to structure the knowledge base and as a fallback method for establishing correspondences between different language words when more specific correspondences cannot be found.

 

Restricted UWs

 

Restricted UWs are by far the most important. Each Restricted UW represents a more specific concept, or subset of concepts. The Constraint List restricts the range of the concept that a Basic UW represents.

The Basic UW gdrinkh, with no Constraint List, includes the concepts of gputting liquids in the mouthh, gliquids that are put in the mouthh, gliquids with alcoholh, gabsorbh and others.

The Restricted UW gdrink(icl>do(obj>liquid))h denotes the subset of these concepts that includes gputting liquids in the mouthh, which in turn corresponds to verbs such as gdrinkh, ggulph, gchugh and gslurph in English.

 

Consider again the examples of Restricted UWs given above:

 

state(icl>express) is a more specific concept (arbitrarily associated with the English word gstateh) that denotes an action in which humans express something.

state(icl>country) is a more specific sense of gstateh that denotes a nation or country.

state(icl>abstract thing) is a more specific sense of gstateh that denotes a kind of condition that persons or things are in. This UW is defined as a more general concept that can be referred to when defining other synonymous Uws, such as gsituationh or gconditionh.

state(icl>government) is a more specific sense of gstateh that denotes a kind of government.

 

The information in parentheses is the Constraint List and it describes some conceptual restrictions; this is why they are called Restricted UWs. Informally, the restrictions mean grestrict your attention to this particular sense of the wordh.  Thus, the focus is clearly the idea and not the specific English word.

It often turns out that in a given language there is a wide variety of different words for these concepts and not, coincidentally, all the same word, as in English.

It should be noted that by organizing these senses around the English words, the task of making a new UW/Specific Language dictionary is simplified. A bilingual English/Specific Language dictionary can be used, and proceeding from there, the number of different concepts necessary for each English word can be specified.

This, of course, does not mean that English words are translated; the English dictionary is simply used as a reminder of the concepts that will be dealt with so that the work can be organized more efficiently.

 

Extra UWs

 

Extra UWs denote concepts that are not found in English and therefore have to be introduced as extra categories. Foreign-language words are used as Head Words using English (Alphabetical) characters. Consider again the examples given above:

 

ikebana(icl>flower arrangement) is ga kind of flower arrangementh for the meaning of gsomething you do with flowersh,

samba(icl>dance) is ga kind of danceh, and

soufflé(icl>food) is ga kind of foodh.

 

To the extent that these concepts exist for English speakers, they are expressed with foreign-language loanwords and do not always appear in English dictionaries. So they simply have to be added to be able to use these specific concepts in the UNL system. The Constraint List or restrictions give the idea of what kind of concept is associated with these Extra UWs and the constraints provide the binary relations between this concept and other, more general, concepts already present (action, dance, food, etc.).

 

2  Compound UWs

 

Compound UWs are a set of binary relations that are grouped together to express a complex concept. A sentence itself is considered as a compound UW. This makes it possible to deal with situations like:

Women who wear big hats in movie theaters should be asked to leave.

Without Compound UWs, it would be impossible to build up complex ideas like gwomen who wear big hats in move theatersh and then relate them to other concepts.

 

Compound UWs denote complex concepts that are to be interpreted as unit concepts, understood as a whole so that one can talk about their parts all at the same time. Consider again the example given above.

[Women who wear big hats in movie theaters] should be asked [to leave].

The part of the sentence within square brackets is what should be asked. Only when they are grouped together and considered as a whole unit can the correct interpretation be obtained.

Just as such complex units can be related to other concepts with conceptual relations, attributes can be attached to them to express negation, speaker attitudes, etc., which are usually interpreted as modifying the main predicate within the Compound UW.

 

2.1  The way to define a Compound UW

 

A Compound UW is defined by placing a Compound UW-ID immediately after the Relation Label in all of the binary relations that are to be grouped together. Thus, in the example below, g:01h indicates all of the elements that are to be grouped together to define Compound UW number 01.

 

agt:01(wear(icl>do(obj>thing)), woman(icl>person).@pl)

obj:01(wear(icl>do(obj>thing)), hat(icl>clothes))

aoj:01(big(aoj>thing), hat(icl>clothes))

plc:01(wear(icl>do(obj>thing), theater(icl>facilities))

mod:01(theater(icl>facilities), movie(icl>entertainment))

agt:01(leave(icl>do).@entry, woman(icl>person).@pl)

 

After this group has been defined, wherever the Compound UW-ID is, for instance g01h in the above example, it can be used to cite the Compound UW. The way to cite a Compound UW is explained in the next section.

A Compound UW is considered as a sentence or sub-sentence, so in the definition of a Compound UW one entry node marked by @entry is necessary.

 

2.2  The way to cite a Compound UW

 

Once defined, a Compound UW can be cited or referred to by simply using the Compound UW-ID as an UW.  The method is to indicate the Compound UW-ID following a colon g:h.  The reference to a Compound UW is also called a Scope-Node.  The Scope-Node has the following syntax:

 

<Scope-Node>

::= g:h <Compound UW-ID> [ <Attribute List> ]

<Compound UW-ID>

::= two digits of a number 00 – 99

<Attribute List>

::= { g.h <Attribute Label> } c

<Attribute Label>

::= g@entryh | g@mayh | g@pasth | ...

 

To complete the example above, it could be continued with:

 

obj(ask(icl>do(obj>thing)).@should, :01)

gol(ask(icl>do(obj>thing)).@should, woman.@pl)

 

Again, g:01h is interpreted as the whole set of binary relations defined above. It means that g:01h should be understood as comprising all of these binary relations.  Compound UWs can be cited within other Compound UWs.

 

3  How to set UWs for a word

 

Following steps are recommendable when you want to register UWs for a word.

 

1)    Imagine the set of UW for a word

2)    Check the imagined UW is already registered in the UW dictionary. (utilize UW gate)

3)    If the imagined UW is not found in the UW dictionary,

register the imagined UW to the UW dictionary, and also

position new UW in the UW system(hierarchy). (utilize KB gate)

 

4  How to make new UWs for a word

 

Basically, an UW is made by an English word, including a compound word or a phrase. Such an UW represents all meanings that the original English word covers. This kind of UWs is called gBasic UWsh in the UNL System. To restrict the meanings of such an UW, the way is to attach restrictions to an UW.

 

Restrictions of an UW are using relations and English words also. The process of making narrower UWs for expressing each meaning of an English word is the following. This kind of UWs is called gRestricted UWsh in the UNL System, which appear with restrictions.

 

(How about UW hierarchy, need to mention??)

 

1.          First, decide to which category of the following four categories each concept (meaning) belongs.

-            Nominal concept

-            Verbal concept

-            Adjectival concept

-            Adverbial concept

 

a)         For a nominal concept, attach g(icl>thing)h to the Basic UW (the English word).

For example gswallowh can be divided into the following two meanings (UWs).

 

                   swallow(icl>thing)

                   swallow(icl>do)

 

b)         For a verbal concept, attach g(icl>do)h or g(icl>occur)h to the Basic UW.

For example gchangeh can be divided into the following three meanings (UWs), according to a) and b).

 

                   change(icl>do)

                   change(icl>occur)

                   change(icl>thing)

 

Where, g(icl>do)h is for expressing an event which is caused by something or someone, whereas g(icl>occur)h for expressing an event that happens of its own accord.

 

c)         For an adjectival concept, attach g(aoj>thing)h or g(mod>thing)h to the Basic UW.

For example gpositiveh can be divided into the following two meanings (UWs).

 

                   positive(aoj>thing)

                   positive(mod<thing)

 

Where, g(aoj>thing)h is for expressing an predicative concept, whereas g(mod>thing)h for expressing restrictive concept.

 

d)         For an adverbial concept, attach g(icl>how)h to the Basic UW.

For example gweeklyh can be divided into the followings two meanings (UWs), according to c) and d).

 

weekly(icl>how)

                   weekly(mod<thing)

 

2.          Second: if the UW still has ambiguity by attaching each of above category labels, the UW Hierarchy of UNL KB or Case Relations (the relationships with other UWs) will be used.

 

a)         For a nominal concept, choose a subordinate category (concept) from the UW Hierarchy instead of gthingh

 

For example, the nominal concepts of gswallowh can be defined by attaching the following subordinate category labels as:

 

swallow(icl>bird)                           expresses the bird as in One swallow does not make a summer

                   swallow(icl>action)                       expresses the action of swallowing as in at [in] one swallow

                   swallow(icl>quantity)                   expresses the quantity as in take a swallow of water

 

 

The Ambiguity of verbal concepts of gswallowh will be solved by attaching case relations each concept requires. Since a verb often has several meanings according to the sort of the object it takes. For example in gto swallow a glass of beerh and in gThe waves swallowed (up) the boath, gswallowsh are used for different meanings, this kind of ambiguity can also be solved by attaching further subordinate category labels, but it is much easier and clearer to use case relations.

 

b)         For a verbal concept, attach possible case relations that it takes, as gobj>thingh, gobj>personh, ggol>thingh, etc. Such attachments of case relations are given in pairs of UNL relations and target UWs. As for the target UWs, UW category label gthingh or further subordinate category labels in the UW Hierarchy must be used.

 

For example the verbal concepts of gspringh can be expressed as follows:

 

                   spring(icl>do(obj>wood))                              expresses to bend or divide something

                   spring(icl>do(obj>mine))                                expresses to blast something

                   spring(icl>do(obj>person,src>prison))        expresses to escape (from) prison

                   spring(icl>do(gol>place))                               expresses to jump up as in to spring up

                   spring(icl>do(gol>thing))                               expresses to jump on as in to spring on

                   spring(icl>occur(obj>liquid))                         expresses to gush out as in to spring out

 

c)         For an adverbial concept, attach a subordinate category label that can be modified by the adverbial concept and exists in the UW Hierarchy under verbal concepts gdoh or goccurh, using relation label gmanh, as gman>lookh, gman>seth, etc.

 

 

                   ahead(man<look)                                             expresses the direction as in to look ahead

                   ahead(man<set)                                                expresses the meaning of earlier as in

Set the clock ahead one hour

                   ahead(man<be)                                                 expresses the meaning of gbe superior toh as in

We are 5 points ahead.

 

 


Appendix 1:  Syntax Definition Notation

 

Symbol

Definition

::=

|

[ ]

{}

c

g h

< >

to indicate the left is defined as the right

to indicate two disjunctive elements: gorh

to indicate an optional element

to indicate an alternative element

to indicate repetition of the previous element, 0 or more than 1 time

to enclose a string of literal characters

to indicate a variable name

 

Appendix 2:  List of Relation Labels

 

© Copyright UNL Centre / UNDL Foundation. All rights reserved.

Edit. 19 Feb 2002