Home UNL System Language Server

A new UNL Explorer has been developed and is provided at www.undl.org/unlexp/ and www.unl.undl.org/unlexp/ by the UNL Center.



The UNL Explorer is a UNL-based Multilingual Intelligent Information and Knowledge Management System.



The UNL Explorer provides users with an integrated environment that users can search for and edit knowledge and information based on UNL.



Using the UNL Explorer, search for information of UNL Encyclopedia can be made by content, keyword or keyconcept search, or though navigation of the UW System of the UNL Ontology



The UNL Explorer has the following characteristics:


Semantic Co-occurrence Relation Search

whether a relation between two UWs is true or not, what UWs can have a relation with another UW, or what relations are possible between two UWs can be verified based on the UNL Ontology.




UNL-based Semantic Network Search

Content search is carried out based on UNL Expressions.




Intensional Definition-based Inference

every UW is given an intentional definition to specify its all the essential properties. the definition is used in inference about all concepts it related.




Multilingual Information Processability

the UNL Explorer allows users to search for information described in whatever language using their native languages, or to provide or edit information using their native languages.




UNL Graphical Editability

A graphical editor of UNL Expressions works together with the UNL Explorer.

Page Access Counter

     About UNL : 14337
     UNL Programme : 5562
     UNL System : 18197
     UNL Applications : 5502
     UNL Materials : 13102
     UNL & Related Activities : 4298
     UNL Society : 6219
     UDS : 8558
     About UNDLF : 14011
     UW Gate : 1184
     UNL Platform : 1540
     UNL Explorer : 975

Outline



The UNL System consists of three major components: language resources, software for processing language resources, and supporting tools for maintaining and operating language processing software or developing language resources. Language resources are divided into language dependent part and language independent part. Knowledge about concepts and relations between concepts of words that is universal to every language is considered language independent and to be stored in the common database UNL Ontology (UNLKB). Language dependent resources like word dictionaries and analysis and generation rules, as well as the software for language processing, are stored in each Language Server (LS). Language Servers are connected and operate through the Internet. Supporting tools for producing language resources such as UNL Expressions are basically to be used in a local computer. Verification of UNL Expressions can be carried out through the Internet or in a local computer. These tools operate with consulting Language Servers through the Internet. Supporting tools for developing and maintaining the UNL Ontology for holding the language independent linguistic knowledge, and the UW dictionary for holding the links between UWs and every language are stored in the server under the management of the UNL Center, they can be accessed through the Internet from everywhere.

Such UNL System with all its components enables the UNL functions. It makes it possible to produce UNL Expressions (Documents) from natural languages and provide people with access to the UNL Documents.

Mechanism of Conversion between Languages and UNL Expressions



Figure 1 shows the mechanism how the conversion between a natural language and a UNL Document is carried out in the UNL System. Arrows in solid line show dataflow and arrows in broken line show access.

The EnConverter and DeConverter are the core software in the UNL System. The EnConverter converts natural language sentences into UNL Expressions. The Universal Parser (UP) is a specialized version of the EnConverter. It generates UNL Expressions from annotated sentences with referring to the UW dictionary without using grammatical features. All UNL Expressions are verified by the UNL Verifier, and then to be stored in the format of UNL Document. The DeConverter converts UNL Expressions to natural language sentences. Both the EnConverter and DeConverter perform their functions based on a set of grammar rules and a word dictionary of a target language. Whether consulting the UNL Ontology and/or a co-occurrence dictionary in EnConverter or DeConverter is optional.

Conversion mechanism of the UNL System

Figure 1. Conversion mechanism of the UNL System


Figure 2 shows the structure of the UNL System and how it is connected with supporting tools and UNL-based applications. Highlighted parts are the components of the UNL System.

Structure of the UNL System and applications

Figure 2. Structure of the UNL System and applications


Each component of figures 1 and 2 is the following.

EnConverter


The EnConverter is a language independent parser, which provides a framework for morphological, syntactic, and semantic analysis synchronously. It would be impossible to solve all the morphological ambiguities if the syntactic or semantic analysis is not performed synchronously. And, it would be impossible to solve every syntactic ambiguity in the absence of semantic analysis.

The EnConverter works based on a word dictionary and a set of enconversion rules (grammar rules of enconversion). It analyzes sentences according to the enconversion rules. It can deal with various natural languages by using respective word dictionaries and sets of enconversion rules.

The EnConverter works in the following way. An input string of text of a sentence is scanned from left to right. When an input string is scanned, all matched morphemes from the beginning (left) of the string are retrieved from the word dictionary and become the candidate morphemes. These candidate morphemes are sorted according to priority. Word selection is done by applying grammar rules of enconversion to these candidate morphemes. Syntactic and semantic analysis is carried out by applying the rules to already selected words to build up a syntactic tree and a semantic network for the input sentence. This process continues until all words of the sentence are inputted, and a complete semantic network of the input sentence is made. The output of this whole process is a semantic network expressed in the UNL format.

The EnConverter also has the function to consult the UNL Ontology. The UNL Ontology helps to select appropriate UWs for ambiguous words and appropriate relations between UWs.

Figure 3 shows the structure of the EnConveter. “A” indicates analysis windows, and “C” indicates condition windows of the EnConverter. The EnConverter operates on the node-list through analysis windows. Condition windows are used to check conditions when applying a rule. In the initial stage, only one node of an input sentence exists in the node-list. At the end of enconversion, a syntactic tree together a semantic network is made, and the root node remains in the node-list.

Structure of EnConverter

Figure 3. Structure of EnConverter

For details see specifications of the EnConveter at http://www.undl.org/unlsys/ds.html

DeConverter


The DeConverter is a language independent generator, which provides a framework for syntactic and morphological generation synchronously. It can convert UNL Expressions into a variety of natural languages, by using respective word dictionaries and sets of grammar rules of deconversion of the languages. A word dictionary contains the information of words that correspond to UWs included in the input of UNL Expressions and grammatical attributes (features) that describe the behaviors of the words. Deconversion rules (grammar rules of deconversion) describe how to construct a sentence using the information from the input of UNL Expressions and defined in a word dictionary. The DeConverter converts UNL Expressions into sentences of a target language following the descriptions of Deconversion rules.

Co-occurrence relation-based word selection for natural collocation can also be carried out synchronously. For this purpose, a co-occurrence dictionary of the target language is necessary. The UNL Ontology is also helpful when no correspondent word for a particular UW exist in a language. In this case, the DeConverter consults to the UNL Ontology to try to find a more general (upper) UW of which a correspondent word exists in its word dictionary and use the word of the upper UW to generate the target sentence instead.

The DeConverter works in the following way. It first transforms the input of a UNL expression – a set of binary relations - into a directed graph structure with hyper-nodes called node-net. The root node of a node-net is called entry node and represents the head (e.g. the main verb) of a sentence. Deconversion of a UNL Expression is carried out by applying Deconversion Rules to the nodes of node-net. It starts from the entry node, to find an appropriate word for each node and generate a word sequence (a list of words in grammatical order) of a target language. In this process, the syntactic structure is determined by applying syntactic rules, and morphemes are similarly generated by applying morphological rules. The deconversion process ends when all words for all nodes are found and a word sequence of target sentence is completed.

Figure 4 shows the structure of the DeConveter. “G” indicates generation windows, and “C” indicates condition windows of the DeConverter. The DeConverter operates on the node-list through generation windows. Condition windows are used to check conditions when applying a rule. In the initial stage, in opposite to the EnConveter, the entry node of a UNL Expression exists in the node-list. At the end of deconversion, the node-list is the list of all morphemes, with each as a node, that are converted from the node-net and constitute the target sentence.


Structure of DeConverter

Figure 4. Structure of DeConverter

For details see specifications of the DeConveter at http://www.undl.org/unlsys/ds.html

Dictionary Builder


The Dictionary Builder (abbreviated as DicBld) is a tool used to convert text data of dictionary entries into IBAM (Index Based Access Method) formatted dictionary files. IBAM is invented for quick search of data. In the UNL System, all dictionary data like word dictionary, UNL Ontology, co-occurrence dictionary, KCIC, etc. are stored in IBAM format. Data format of input of DicBld are explained at Word Dictionary below. Figure 5 shows the structure of DicBld.

Structure of DicBld

Figure 5. Structure of DicBld


Word Dictionary


Word dictionaries are prepared for respective languages. All information about words of a natural language is stored in a word dictionary. An entry of the word dictionary contains three parts basically: a headword, a UW and a set of grammatical attributes (features). A headword is a word or a morpheme of a natural language. A sequence of such words or morphemes composes a sentence. As a result, a word or morpheme of a word dictionary is used as a trigger to obtain an appropriate UW in order to create the UNL Expression from an input sentence in the enconversion process, and forms the target sentence of a natural language from a UNL Expression in the deconversion process. The UW of an entry expresses the meaning of its headword. It appears in the UNL Expression of the result of the enconverson process, and is used as a trigger to obtain an appropriate word or a morpheme in order to generate a target sentence of a natural language from UNL Expression in the deconversion process. Grammatical attributes define how a word or a morpheme behaves in a sentence. They are used in both the enconversion and the deconversion rules.

Data format of a word dictionary entry is the following. An entry must end with a semicolon. <FLG,FRE,PRI> can be omitted.

[HW]{ID} “UW” (ATTR,ATTR,…) <FLG,FRE,PRI>;

HWheadword of a language
IDidentifier, can be empty
UWUniversal Word, can be empty if not necessary
ATTRgrammar code
FLGlanguage flag, one character in ASCII code
FREfrequency to be used in EnCo
PRIpriority to be used in DeCo

Examples of entries of English dictionary:

[a]{} "" (ART,IART) <E,1,1>;
[book]{} "book(icl>document)" (BA,C,N,PLS) <E,1,1>;
[buy]{} "buy(icl>purchase(agt>thing,obj>thing))" (3SGS,AGT.S,BA,INGING,IRG,OBJ.DO,V,VDO,VDON) <E,1,1>;
[bought]{} "buy(icl>purchase(agt>thing,obj>thing))" (AGT.S,ED,EN,IRG,OBJ.DO,V,VDO,VDON) <E,1,1>;
[I]{} "I(icl>person)" (1SG,HPRON,PRON,SUBJ) <E,1,1>;
[me]{} "I(icl>person)" (1SG,HPRON,OBJ,PRON) <E,1,1>;


Grammar Rule


The DeConverter works based on a set of deconversion rules. Likewise, the EnConverter works based on a set of enconversion rules. Both deconversion and enconversion rules must be prepared for each language. Each set of these rules controls the process of enconversion or deconversion. The ability of a rule is designed to be able to describe on what condition to perform what operation using the grammatical features both or either defined by the rules and/or given in a Word Dictionary. With this, a set of enconversion or deconversion rules can be prepared for a desired language thus allowing the EnConverter or the DeConverter to deal with the language.

Rule applications can be controlled by priorities and allow backtracking. When the EnConverter or the DeConverter encounters an ungrammatical or illogical situation, the rules can force the process to backtrack. Backtracking returns back to the previous state that allows selecting a different word or morpheme, or applying the next priority rule. In the previous state, the next candidate is selected or the next priority rule is applied, and conversion process proceeds.

What operation to perform in enconversion or deconversion is given by the type of a rule in combination with the sort of the rule. There are three sorts of rules as follows.

Rewriting rule
<TYPE> (<PRE>)… {<LNODE>} {<RNODE>} (<SUF>)… P<PRI>;

Left-insertion rule
<TYPE> (<PRE>)… “<LNODE>” (<MID>)… {<RNODE>} (<SUF>)… P<PRI>;

Right-insertion rule
<TYPE> (<PRE>)… {<LNODE>} (<MID>)… “<RNODE>” (<SUF>)… P<PRI>;

<TYPE> type of rule
<LNODE>
<RNODE>
::= <COND>:<ACTION>:<RELATION>:<ROLE>
<PRE>
<MID>
<SUF>
::= <COND>, can be omitted
<COND> conditions for applying the rule
<ACTION> actions (changes of conditions) after applied the rule
<RELATION> semantic relation between <LNODE> and <RNODE>
<ROLE> co-occurrence relation between <LNODE> and <RNODE>
<PRI> priority of the rule

For details see specifications of DeConverter and EnConverter at http://www.undl.org/unlsys/ds.html

UNL Document


For information on UNL Document see UNL Document under UNL Expression of UNL.

UNL KCIC


UNLKCIC are information on Key Concept in Context (KCIC) of UNL Expressions. The UNLKCIC is a collection of such information made for every binary relation of UNL Expressions. Such UNLKCIC is used in searching UNL Documents for information. Through the UNLKCC, every UW of the UNL Ontology is linked with the UNL Documents where the UW is included. Consequently, all UWs included in a UNL Documents are linked with corresponding UNL Documents through the UNL Ontology. For realizing this inter-linkage of UWs crossing UNL Documents, every UW must be registered in the UNL Ontology.

UW Dictionary


UWs and correspondent words of natural languages are stored in the UW Dictionary. Such UW Dictionary can be used as a multilingual dictionary, both for people and for computers, with all synonyms and equivalent words of different natural languages linked together through UWs. For example, in the UW Gate and the UNL Explorer, UNL Ontology can be shown in and searched by natural languages, etc. Where, the UW Dictionary is used.

UW Gate


The UW Gate provides people with the means to access the UNL Ontology and the UW Dictionary through the Internet. Using the UW Gate, people can search for desired UWs, relations between UWs, equivalent words of desired natural languages, etc. Authorized persons can also define new UWs or register new equivalent words of natural languages. New UWs are to be put in appropriate positions on the UW System by following the guidance of the UW Gate, so that they can make the functions of the UNL Ontology work well. New equivalent words of natural languages are registered to link to existing UWs. The UW Gate can be used as an online multi-lingual dictionary.

The main characteristic of the UW Gate is semantic co-occurrence relation search. It can be made in three ways of the UNL Ontology: to ask whether a relation between two UWs is true or not, to ask what UWs can have a relation with another UW, or to ask what relations are possible between two UWs. Every search is carried out by inference using the property inheritance mechanism of the UW System.

The UW Gate can be used at http://www.undl.org/uwgate/.

Universal Parser


The Universal Parser generates UNL Expressions from sentences without using language dependent grammatical information but only language independent annotations. Sentences of input of the Universal Parser must be annotated with UNL Annotation. The Universal Parser analyzes the annotated input sentences using Universal Parser Rules and a UW Dictionary.

The Universal Parser Rules describe operations for creating UNL Expressions only using the information of tags inserted in input sentences. The UW Dictionary provides information of UWs linked with words of input sentences. The Universal Parser analyzes the input sentences according to the descriptions of the Universal Parser Rules, and generates UNL Expressions using UWs linked with words of input sentences.

This mechanism makes it possible that the Universal Parser can deal with any language. In this sense, the Universal Parser is “universal”.

For using the truly Universal Parser, either including every form of a word in the UW Dictionary, or changing all inflected forms of words of input sentences into base forms if the UW Dictionary only contain base forms is necessary. Instead, by simply extending the Universal Parser Rules to include a set of morphological analysis rules of a language, a morphologically customized annotation-based Parser of a language can be easily made.

Structure of Universal Parser

Figure 6. Structure of Universal Parser
For details see http://www.undl.org/unlsys/uparser/UP.htm (1.0, 2003). The UP can be used at www.undl.org/up/

For more information about the UP please contact the UNL Center at This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

UNL Verifier


The UNL Verifier verifies whether a UNL Expression is correct syntactically, lexically and semantically. The syntax check of a UNL expression is carried out following the UNL Specifications. In the lexical check, whether all UWs of a UNL Expression are defined in the UNL Ontology is checked. In the semantic check, whether each binary relation of a UNL Expression is defined as possible is certified with consulting the UNL Ontology. Figure 7 shows flowchart of the UNL Verifier and how dictionaries are used.

Flowchart of UNL Verifier

Figure 7. Flowchart of UNL Verifier


Language Server


UNL Language Servers (LSs) are located on the Internet to carry out the conversion processes between natural languages and UNL Expressions. A Language Server contains an EnConverter and a DeConverter of a language. UNL Language Servers start conversions when receiving requests from any applications including web applications, and provide the results when the conversions are completed. Figure 8 shows an example of how Language Servers of the UNL System work through the Internet.

How Language Servers work through the Internet

Figure 8. How Language Servers work through the Internet