Meiying Zhu, Hiroshi Uchida
UNL Center, UNDL Foundation
What are Universal Words (UWs)? What are the
features and structure of a UW? How can UWs be defined and how are they related
to the UNL Knowledge Base? These are the main questions discussed in this
paper.
Universal Words (UWs) are words of the UNL. They
constitute the UNL vocabulary. They are the labels for concepts,
syntactic-semantic units that are combined to form UNL expressions. Every UW
denotes a concept. A combination of a set of UWs - linked by relations and
modified by attributes - expresses the meaning of a sentence. A UNL expression
is a hyper-graph of a semantic network. The UWs are nodes in the UNL hyper-graphs,
or arguments of the binary-directed relations that constitute UNL expressions.
A UW is a
character-string made up of two different parts: a headword and a constraint
(list). The constraint (list) is attached to the headword only when the
headword has a certain ambiguity. The headword
can be a word, a complex word or a phrase in English (except Extra UWs, which
take a non-English word as its headword; they are used to denote concepts not
found in English and therefore have to be introduced as extra UWs). A headword
should be interpreted as a label for a set of concepts: the set is made up of
all the concepts conveyed in its original language. Thus, the headword
indicates an entire range of references that can be used. The constraint (list) is used to delimit a
concept within the range so that it can be clearly and unambiguously indicated
by the UW. It restricts the interpretation of a headword to a subset or a
specific concept included within.
An English word is used as a UW if it is not
ambiguous, as shown in case 1. An ambiguous English word needs to be
constrained for the target concept, as in case 2. A non-English word is used
for a concept that does not exist in English, as in case 3. A collection of UWs is used to convey a
broader concept corresponding to various concepts in English, as in case 4. And
a compound concept is expressed by an English phrase, as in case 5.
Every
concept existing in any language can be expressed as a UW. As natural language
words, UWs represent concepts. These concepts are generally said to be
culture-dependent. Different cultures
lead to different particular ways of perceiving and categorizing the world.
Knowledge is considered to be organized uniquely according to the culture, and
it is conveyed by a specific set of concepts. UWs are supposed to be as
comprehensive as all individual concepts depicted by different cultures. Such
UWs should not represent common concepts only. They should also include
culture-dependent information and every relevant variation among similar
concepts.
Every UW
should be defined in the UNL Knowledge Base. A UW itself does not convey its
entire meaning. A UW is interpreted by referring to all its possible relations
with other UWs. These relations are defined in the UNL KB in order to render a
UW meaningful by creating links with these relations in the UNL KB.
The UNL
Knowledge Base (KB) is a semantic network comprising every directed binary
relation between UWs. All the binary relations of the UNL KB are in the
following format: 'relation(UW1, UW2)=c', where 'c' is the degree of certainty,
which has the value 0 (impossible) or 1 (certain). This binary relation means
UW1 takes UW2 as the relation in certainty value c, or UW2 plays the role of
relation for UW1 in certainty value c.
In the
previous section, we explained that every UW must be defined in the UNL KB and
must also be linked to other related UWs. But how should a UW be linked to
other UWs in the UNL KB and why? This section seeks to answer these questions.
The structure and mechanism of the UNL KB is important for this purpose.
In
the UNL KB, all UWs are linked to each other through 'icl', 'iof', and 'equ'
relations. These relations form a hierarchy of UWs, the UW system. 'icl' links
a UW of a sub-concept to a UW of a concept; 'iof' links a UW of an instance to
a UW of a class; 'equ' links a UW of an acronym to the UW of an original word.
In the UW system, lower UWs inherit the properties of upper UWs; and upper UWs
can replace lower UWs to convey a more general sense in the specific context of
the lower UWs. All these inheritance and replacement mechanisms are carried out
through the relations 'icl', 'iof' and 'equ'.
In the UNL
KB, every possible relation, such as 'agt', 'obj', etc, that a UW holds must be
defined for each UW. If all these relations are defined directly between every
UW, the number will be enormous. Utilizing the property inheritance mechanisms
of the UW system can reduce the number. Possible relations should be defined
among the most general UWs for fully utilizing the property inheritance
mechanism.
Replacement of lower UWs by upper UWs can cause problems by introducing ambiguities if the upper UWs are not close in meaning to the lower UWs. To avoid this, the upper UWs must be the closest UWs among all the more general UWs than the lower UWs. In other word, every UW must be positioned under the closest upper UWs.
The
uppermost UW in the UW system is the 'Universal Word', and all UWs are linked
to each other under it through the relations 'icl', 'iof', and 'equ'. The
hierarchy of the UW system is constructed by taking the property inheritance
and replacement mechanisms into consideration.
Figure1
As
discussed above, every UW must be defined in the UNL KB. In doing so, it is
very important to ensure consistency between a UW and its definition in the UNL
KB. Master Definitions are introduced for this purpose.
Master
Definitions are introduced to define UWs. They are definitions for concepts of UWs. A Master Definition is a set of
directed binary relations between related UWs. It conveys semantic information
on UWs through the binary relation set. In other words, Master Definitions are
composed of a UW and a relation list. A UW is a label to be defined and linked
to other UWs in the UNL Knowledge Base; a relation list is a set of relations
to other UWs used to define a concept of such a label. Both the concept and the
UW (label for the concept) are defined through a Master Definition. A Master
Definition can be considered as EQUAL to a UNL expression with the UW to be
defined as the entry node. In addition, a UNL expression denotes an instance of
a compound concept, whereas a Master Definition includes information about
upper concepts. Figure 2 shows the frame of how a Master Definition is
described.
A Master
Definition has three functions: to define a concept, to define the label
(UW) for the concept and to build up the UNL
KB.
A concept is defined by its relations
with other UWs. Such relations can be inherited from upper UWs through the UW
system of the UNL KB. Master Definitions should fully utilize the property
inheritance and replacement mechanisms of the UW system. It should guarantee
that every UW (for the concept to be defined) is linked to the UWs in the UNLKB
and be under the closest upper UWs. To define a concept is to link the UW for
the concept to related UWs in the UNL KB. If necessary, UWs specifically
related to the UW should be linked, and additional relations should be added.
All UWs used in Master Definitions must be pre-defined.
A label (UW) for a concept is made up by
part of a Master Definition. It should contain a headword and part of a
constraint list of a Master Definition when the headword reveals a certain
ambiguity.
Syntax of a
Master Definition
The syntax
of Master Definition (MD) is as follows:
<MD> |
::=
<Headword><MD Constraint List> |
<Headword> |
::=
<character>... |
<MD
Constraint List> |
::= ( [^] <MD Constraint> [, [^] <MD Constraint> ]... ) |
<MD Constraint> |
=
<relation> { > | < } { <UW> | <Constraint UW>} |
<Constraint UW> |
::= <headword><no-rel constraint> |
<no-rel constraint> |
::= { > <headword> }... |
<relation> |
::= agt | ... |
via | equ | icl | iof |
<character> |
::= A | ... | Z |
a | ... | z | 0 | 1 | ... | 9 | _ | | # | ! | $ | % |
= | ~ | | | @ | + | - | < | > | ? | ΄ | . |
Where,
<
> indicates a non-terminal
symbol
indicates a terminal symbol
::= indicates ... is defined as ...
| indicates disjunction (or) between
elements
[ ] indicates an optional element
{ } indicates a range of alternative
elements
... indicates repetition of more than 0
times
^ indicates negation(not)
UWs are made by
Master Definitions excluding unnecessary parts of the constraint list enclosed
in braces { and }. The fundamental principle of a UW is that a constraint (list) is necessary only when
the headword is ambiguous. Such a constraint (list) should play the role of
distinguishing the UW from others made by the same headword. If this role can
be filled, the constraint of a UW should be the simpler, clearer and easier to
understand. Different kinds of concepts will have different types of Master
Definitions. How can UWs for various concepts be defined? Detailed information
will be included in a UNL Reference Book entitled Universal Word and UNL
Knowledge Base How to develop UWs to be published in 2003. The following
table shows some examples of UWs and Master Definitions.
Master Definition |
UW |
'Universal
Word' |
'Universal Word' |
'uw{(equ>Universal
Word)}' |
'uw' |
'nominal
concept{(icl>uw)}' |
'nominal concept' |
'thing{(and>thing,aoj>thing,cao>thing,
cnt>thing,fmt>thing,frm>thing,icl>nominal
concept,mod<thing,nam>thing,or>thing,
per>thing,plc>thing,pof>thing,pos>volotional
thing,pur>uw,qua>quantity,scn>thing,tim>time>abstract
thing,to>thing)}' |
'thing' |
'abstract
thing{(icl>thing)}' |
'abstract thing' |
'activity(icl>abstract
thing)' |
'activity(icl>abstract thing)' |
'broadcasting(icl>activity{>abstract
thing})' |
'broadcasting(icl>activity)' |
'tale(icl>information{,icl>literature>art})' |
'tale(icl>information)' |
'above(icl>direction{>abstract
thing,icl>directional place>place})' |
'above(icl>direction)' |
'month(icl>date{>time,icl>period>time,
pof>year>date})' |
'month(icl>date)' |
'April{(icl>month>date)}' |
'April' |
'do(agt>thing{,^gol>thing,icl>do,^obj>thing,
^ptn>thing,^src>thing})' |
'do(agt>thing)' |
'dance({icl>do(}agt>person{)})' |
'dance(agt>person)' |
'bark(agt>dog{>canine,icl>sound(agt>thing)})' |
'bark(agt>dog)' |
'explain(icl>express(agt>thing,gol>person,
obj>thing))' |
'explain(icl>express(agt>thing,gol>person,obj>thing))' |
This paper describes how UWs and relations with
other UWs should be defined and how such UWs construct the UNL KB. Further
information is available at URL: www.undl.org/index_unlc.html.
In addition, a UNL reference book, Universal
Word and UNL Knowledge Base How to develop UWs, will be available in 2003.
This book explains how to define UWs for various concepts. This UNL reference
book will not only spread knowledge about the UNL but will also reduce the
number of wrong, inappropriate or redundant UWs for concepts. The methodology
introduced in this paper should be used as a means to represent knowledge and
concepts of various languages for all people in the world.