|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|
UNL Development Set |
|||||||||||||||||||||||||||||||||||||||||||||||||||
| @
@ The UDS is open to the Members of
the UNL Society who signed the UDS agreement, further information on
the UNL Society is available at here @ for more information or questions please contact us at unlcenter@undl.org |
The
UDS (UNL Development Set) is a set of tools
of the UNL System for developers to use to develop conversion
modules between languages and UNL. It contains the DeConverter,
the EnConveter, the Word Dictionary Builder, and specifications
or manuals of the tools. More information on these tools is
available at the UNL
System. Who can use the UDS? For using the UDS, it is necessary to
sign the following agreements: "AGREEMENT TO ENTER THE UNL SOCIETY" "UNL
DEVELOPMENT SET LICENSE OF AGREEMENT " For more information on how to enter the
UNL Society see "how to enter" under "UNL
Society" at: http://www.undl.org/ To
develop a language deconversion module using the DeConverter
provided by UNL Center needs to develop a word dictionary and
deconversion rules of the language. The word dictionary provides
correspondent words of the language of UWs that appear in UNL
Expressions of input of the DeConverter, and grammatical
attributes (features) of the headwords. Deconversion rules of
the language describe operations of processes to deconvert UNL
Expressions to sentences of the language. Detailed information
on the DeConverter, deconversion rules and word dictionary is
given in the specifications of the DeConverter and the manual of
the Word Dictionary Builder. All tools, specifications and
manuals of the UDS can be downloaded at to downlaod. In
the following explanation, "d.txt" is a list of
examples of entry of English word dictionary. "elgexam.txt" is a set of examples of English deconversion rules, which
includes necessary rules for deconverting gexample.unlh. gExample.unlh is an example of UNL Expression. To
start on developing a deconversion module can simply follow the
steps. STEP 1 To
prepare text data of entries of word dictionary of a language Description format of text data of
entries of word dictionary is given in the manual of the Word Dictionary Builder. STEP 2 To convert text data of word dictionary
entries into IBAM formatted files gDicBldL.exeh is used to convert
one-byte code language word dictionary data. gDicBldC.exeh is used to convert
two-byte code language word dictionary data. Usage of the Dictionary Builder tools is
shown in the manual. "d.dic" and "d.pix" are examples of the IBAM
formatted files made from gd.txth using "DicBldL.exe". STEP 3 To
write deconversion rules Information
on how to write deconversion rules is given in the specifications
of the DeConveter. "elgexam.txt" is an
example of deconversion rules of English for deconverting "example.unl". STEP 4 To
deconvert The gDeCoLh version is used to
deconvert UNL into one-byte code languages. The gDeCoCh version is used to
deconvert UNL into two-byte code languages. "example_decoe.txt" shows the
results of deconversion from "example.unl". Usage of the DeConverter is shown in the
specifications. STEP
5 To check the results, if not correct
to revise dictionary entries or rules The DeConverter can output detailed traces of
deconversion processes. Problems can be detected by checking the
traces. What information is included in the traces is explained
in the specifications. To
develop a language enconversion module using the EnConverter
provided by UNL Center needs to develop a word dictionary and
enconversion rules of the language. The word dictionary provides
correspondent UWs of words included in input sentences of the
language, and grammatical attributes (features) of the words.
Enconversion rules of the language describe operations of
processes to enconvert sentences of the language into UNL
Expressions. Detailed information on the EnConverter,
enconversion rules and word dictionary is given in the
specifications of the EnConverter and the manual of the Word
Dictionary Builder. All tools, specifications and manuals of the
UDS can be downloaded at to downlaod. To
start on developing a deconversion module can simply follow the
steps. STEP 1 To
prepare text data of word dictionary entries for words included
in input sentences Correspondent UWs must be given to
meaningful words. The EnConverter uses the UWs to create UNL
Expressions. Description format of text data of word
dictionary entries is given in the manual of the Word Dictionary Builder. "eng.txt" is an example of English input
sentences. "d.txt" is an example of English word
dictionary, which includes the entries of words included in geng.txth. STEP
2 To convert text data of word dictionary
entries into IBAM formatted files gDicBldL.exeh is used to convert
one-byte code language word dictionary data. gDicBldC.exeh is used to convert
two-byte code language word dictionary data. Usage of the Dictionary Builder tools is
shown in the manual. "d.dic" and "d.pix" are examples of the IBAM
formatted files made from gd.txth using "DicBldL.exe". STEP
3 To write enconversion rules Information
on how to write enconversion rules is provided in the specifications
of the EnConveter. "elaexam.txt" is an
example of enconversion rules of English for enconverting "eng.txt". STEP
4 To enconvert The gEnCoLh version is used to
enconvert sentences of one-byte code languages. The gEnCoCh version is used to
enconvert sentences of two-byte code languages. "eng.unl" is the
results of enconversion from "eng.txt". Usage of the DeConverter is shown in the
specifications. STEP
5 To check the results, if not correct
to revise dictionary entries or rules The EnConverter can output detailed traces of
enconversion processes. Problems can be detected by checking the
traces. What information is included in the traces is explained
in the specifications. There are two versions of the
DeConverter, EnConverter and Dictionary Builders, C - Version
and L - Version. The C - Versions are developed for dealing with
two-byte code languages of Chinese (GB code), Korean (KIS code),
Thai language, and so on. The L - Versions are developed for
dealing with ASCII codes, any one-byte code languages such as
Arabic, Latin languages, and Hindi.
|
||||||||||||||||||||||||||||||||||||||||||||||||||