Universal Word and UNL Knowledge Base

 

Meiying Zhu, Hiroshi Uchida

UNL Center, UNDL Foundation

 

1.    INTRODUCTION

 

What are Universal Words (UWs)? What are the features and structure of a UW? How can UWs be defined and how are they related to the UNL Knowledge Base? These are the main questions discussed in this paper.

 

2.    WHAT ARE UWs?

 

The Role of UWs

Universal Words (UWs) are words of the UNL. They constitute the UNL vocabulary. They are the labels for concepts, syntactic-semantic units that are combined to form UNL expressions. Every UW denotes a concept. A combination of a set of UWs - linked by relations and modified by attributes - expresses the meaning of a sentence. A UNL expression is a hyper-graph of a semantic network. The UWs are nodes in the UNL hyper-graphs, or arguments of the binary-directed relations that constitute UNL expressions.

             

The Structure of UWs

              A UW is a character-string made up of two different parts: a headword and a constraint (list). The constraint (list) is attached to the headword only when the headword has a certain ambiguity. The headword can be a word, a complex word or a phrase in English (except Extra UWs, which take a non-English word as its headword; they are used to denote concepts not found in English and therefore have to be introduced as extra UWs). A headword should be interpreted as a label for a set of concepts: the set is made up of all the concepts conveyed in its original language. Thus, the headword indicates an entire range of references that can be used. The constraint (list) is used to delimit a concept within the range so that it can be clearly and unambiguously indicated by the UW. It restricts the interpretation of a headword to a subset or a specific concept included within.

 

The Comprehensibility of UWs

UWs are based on English words. The main issues that need to be solved are the following. How to deal with a concept not found in English by the UWs? How to deal with a concept that is different from English? And finally, how to deal with differences between concepts? Table 1 shows how UWs are considered in various cases.

An English word is used as a UW if it is not ambiguous, as shown in case 1. An ambiguous English word needs to be constrained for the target concept, as in case 2. A non-English word is used for a concept that does not exist in English, as in case 3.  A collection of UWs is used to convey a broader concept corresponding to various concepts in English, as in case 4. And a compound concept is expressed by an English phrase, as in case 5.

Table1

 

Each UW does not exist alone. They are inter-linked with each other in the UNL KB, as explained in the following section. The concept of a UW is defined by its relations with other UWs. Such UWs must be interpreted according to the relations inherent in each UW. A non-English UW is therefore interpretable through these relations, and similar but different concepts are interpretable reciprocally through a more general concept that is common to both.

 

The Features of a UW

Every concept existing in any language can be expressed as a UW. As natural language words, UWs represent concepts. These concepts are generally said to be culture-dependent.  Different cultures lead to different particular ways of perceiving and categorizing the world. Knowledge is considered to be organized uniquely according to the culture, and it is conveyed by a specific set of concepts. UWs are supposed to be as comprehensive as all individual concepts depicted by different cultures. Such UWs should not represent common concepts only. They should also include culture-dependent information and every relevant variation among similar concepts.

Every UW should be defined in the UNL Knowledge Base. A UW itself does not convey its entire meaning. A UW is interpreted by referring to all its possible relations with other UWs. These relations are defined in the UNL KB in order to render a UW meaningful by creating links with these relations in the UNL KB.

 

3.    WHAT IS THE UNL KNOWLEDGE BASE?

             

The UNL Knowledge Base (KB) is a semantic network comprising every directed binary relation between UWs. All the binary relations of the UNL KB are in the following format: 'relation(UW1, UW2)=c', where 'c' is the degree of certainty, which has the value 0 (impossible) or 1 (certain). This binary relation means “UW1 takes UW2 as the relation in certainty value c”, or “UW2 plays the role of relation for UW1 in certainty value c”.

In the previous section, we explained that every UW must be defined in the UNL KB and must also be linked to other related UWs. But how should a UW be linked to other UWs in the UNL KB and why? This section seeks to answer these questions. The structure and mechanism of the UNL KB is important for this purpose.

 

The UW System

              In the UNL KB, all UWs are linked to each other through 'icl', 'iof', and 'equ' relations. These relations form a hierarchy of UWs, the UW system. 'icl' links a UW of a sub-concept to a UW of a concept; 'iof' links a UW of an instance to a UW of a class; 'equ' links a UW of an acronym to the UW of an original word. In the UW system, lower UWs inherit the properties of upper UWs; and upper UWs can replace lower UWs to convey a more general sense in the specific context of the lower UWs. All these inheritance and replacement mechanisms are carried out through the relations 'icl', 'iof' and 'equ'.

In the UNL KB, every possible relation, such as 'agt', 'obj', etc, that a UW holds must be defined for each UW. If all these relations are defined directly between every UW, the number will be enormous. Utilizing the property inheritance mechanisms of the UW system can reduce the number. Possible relations should be defined among the most general UWs for fully utilizing the property inheritance mechanism.

              Replacement of lower UWs by upper UWs can cause problems by introducing ambiguities if the upper UWs are not close in meaning to the lower UWs. To avoid this, the upper UWs must be the closest UWs among all the more general UWs than the lower UWs. In other word, every UW must be positioned under the closest upper UWs.

 

The uppermost UW in the UW system is the 'Universal Word', and all UWs are linked to each other under it through the relations 'icl', 'iof', and 'equ'. The hierarchy of the UW system is constructed by taking the property inheritance and replacement mechanisms into consideration.

Figure1

 

4.    HOW TO DEFINE A UW IN THE UNL KB?

 

As discussed above, every UW must be defined in the UNL KB. In doing so, it is very important to ensure consistency between a UW and its definition in the UNL KB. Master Definitions are introduced for this purpose.

 

Master Definitions

Master Definitions are introduced to define UWs. They are definitions for concepts of UWs. A Master Definition is a set of directed binary relations between related UWs. It conveys semantic information on UWs through the binary relation set. In other words, Master Definitions are composed of a UW and a relation list. A UW is a label to be defined and linked to other UWs in the UNL Knowledge Base; a relation list is a set of relations to other UWs used to define a concept of such a label. Both the concept and the UW (label for the concept) are defined through a Master Definition. A Master Definition can be considered as EQUAL to a UNL expression with the UW to be defined as the entry node. In addition, a UNL expression denotes an instance of a compound concept, whereas a Master Definition includes information about upper concepts. Figure 2 shows the frame of how a Master Definition is described.

Figure 2

 

Functions of a Master Definition

A Master Definition has three functions: to define a concept, to define the label (UW) for the concept and to build up the UNL KB.

A concept is defined by its relations with other UWs. Such relations can be inherited from upper UWs through the UW system of the UNL KB. Master Definitions should fully utilize the property inheritance and replacement mechanisms of the UW system. It should guarantee that every UW (for the concept to be defined) is linked to the UWs in the UNLKB and be under the closest upper UWs. To define a concept is to link the UW for the concept to related UWs in the UNL KB. If necessary, UWs specifically related to the UW should be linked, and additional relations should be added. All UWs used in Master Definitions must be pre-defined.

A label (UW) for a concept is made up by part of a Master Definition. It should contain a headword and part of a constraint list of a Master Definition when the headword reveals a certain ambiguity.

A Master Definition builds up the UNL KB. Every relation in a Master Definition makes a link between the UW to be defined and an existing UW in the UNL KB. As a Master Definition links a UW in the UNLKB, necessary information about the UW can be inherited from the upper UWs. And this type of UW will be selected as the upper UW from among other UWs. Semantic information on the upper UW will be inherited by the lower UWs.

 

Syntax of a Master Definition

The syntax of Master Definition (MD) is as follows:

 

<MD>

::= <Headword><MD Constraint List>

<Headword>

::= <character>...

<MD Constraint List>

::= “(” [“^”] <MD Constraint> [“,” [“^”] <MD Constraint> ]... “)”

<MD Constraint>

= <relation> { “>” | “<” } { <UW> | <Constraint UW>}

<Constraint UW>

::= <headword><no-rel constraint>

<no-rel constraint>

::= { “>” <headword> }...

<relation>

::= “agt” | ... | “via” | “equ” | “icl” | “iof”

<character>

::= “A” | ... | “Z” | “a” | ... | “z” | “0” | “1” | ... | “9” | “_” | “ ” | “#” | “!” | “$” | “%” | “=” | “~” | “|” | “@” | “+” | “-” | “<” | “>” | “?” | “΄” | “.”

 

              Where,

                            < >         indicates a non-terminal symbol

                            “ ”          indicates a terminal symbol

                            ::=          indicates ... is defined as ...

                            |            indicates disjunction (“or”) between elements

                            [ ]           indicates an optional element

                            { }           indicates a range of alternative elements

                            ...           indicates repetition of more than 0 times

                            ^            indicates negation(“not”)

 

              UWs are made by Master Definitions excluding unnecessary parts of the constraint list enclosed in braces “{” and “}”. The fundamental principle of a UW is that  a constraint (list) is necessary only when the headword is ambiguous. Such a constraint (list) should play the role of distinguishing the UW from others made by the same headword. If this role can be filled, the constraint of a UW should be the simpler, clearer and easier to understand. Different kinds of concepts will have different types of Master Definitions. How can UWs for various concepts be defined? Detailed information will be included in a UNL Reference Book entitled “Universal Word and UNL Knowledge Base –How to develop UWs–” to be published in 2003. The following table shows some examples of UWs and Master Definitions.

 

Master Definition

UW

'Universal Word'

'Universal Word'

'uw{(equ>Universal Word)}'

'uw'

'nominal concept{(icl>uw)}'

'nominal concept'

'thing{(and>thing,aoj>thing,cao>thing, cnt>thing,fmt>thing,frm>thing,icl>nominal concept,mod<thing,nam>thing,or>thing, per>thing,plc>thing,pof>thing,pos>volotional thing,pur>uw,qua>quantity,scn>thing,tim>time>abstract thing,to>thing)}'

'thing'

'abstract thing{(icl>thing)}'

'abstract thing'

'activity(icl>abstract thing)'

'activity(icl>abstract thing)'

'broadcasting(icl>activity{>abstract thing})'

'broadcasting(icl>activity)'

'tale(icl>information{,icl>literature>art})'

'tale(icl>information)'

'above(icl>direction{>abstract thing,icl>directional place>place})'

'above(icl>direction)'

'month(icl>date{>time,icl>period>time, pof>year>date})'

'month(icl>date)'

'April{(icl>month>date)}'

'April'

'do(agt>thing{,^gol>thing,icl>do,^obj>thing, ^ptn>thing,^src>thing})'

'do(agt>thing)'

'dance({icl>do(}agt>person{)})'

'dance(agt>person)'

'bark(agt>dog{>canine,icl>sound(agt>thing)})'

'bark(agt>dog)'

'explain(icl>express(agt>thing,gol>person, obj>thing))'

'explain(icl>express(agt>thing,gol>person,obj>thing))'

 

 

5.         CONCLUSION

 

This paper describes how UWs and relations with other UWs should be defined and how such UWs construct the UNL KB. Further information is available at URL: www.undl.org/index_unlc.html.

In addition, a UNL reference book, “Universal Word and UNL Knowledge Base –How to develop UWs–”, will be available in 2003. This book explains how to define UWs for various concepts. This UNL reference book will not only spread knowledge about the UNL but will also reduce the number of wrong, inappropriate or redundant UWs for concepts. The methodology introduced in this paper should be used as a means to represent knowledge and concepts of various languages for all people in the world.