Xml-based corpus building for cebuano language using wolff dictionary (Record no. 47049)

000 -LEADER
fixed length control field 02047nam a2200169Ia 4500
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20200308073504.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 190927s9999 xx 000 0 und d
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number T Al863 2012
100 ## - MAIN ENTRY--PERSONAL NAME
Preferred name for the person Alvarez, Dennis C.
245 #0 - TITLE STATEMENT
Title Xml-based corpus building for cebuano language using wolff dictionary
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of Publication Cebu City
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Name of Publisher CIT-U
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Date of Publication 2012
520 ## - SUMMARY, ETC.
Summary, etc Dennis C. Alvarez; Cebu Institute of Technology-Unversity; March 2012; XML-based Corpus Buildin for Cebuano Language using Wolf Dictionary<br/><br/>Adviser: Dr. Larmie T. Santos-Feliscuzo<br/><br/> XML-based Corpus Buildin for Cebuano Language using Wolf Dictionary is a research study that is under the Naural Language Processing or NLP. It is a system where the entries of the Cebuano Dictionary that is in PDF format are extracted and given proper tags and then cenverted into an XML file. The XML file tha will be generated is going to be helpful in managing the data entries of the dictionary. <br/><br/>The system is implemented by using Java NetBeans IDE 6.9.1, with the help of libraries like Apache PDFBox and POI Jakarta. The Apache PDFBox library is used to extract the data from the PDF file and convert it into a text file. The POI Jakarta library is used to write a file in excel for the randomization process in order to identify which six letters are going to be process for the testing phase. The File Handling process is used to convert the proper tag entries into its equivalent XML file. The XML file now contains the entries from the dictionary to have proper tags and data are managed within the XML file. It can now be used by other NPL projects like machine translation, Text-to-Speech, Spell Checker and others.<br/>For further enhancement, it would be best if the data extracted from the PDF file will be exactly just like in the file. Also, the NFA or the Non-deterministic Finite Automaton that is the algorithm used in making the XML tagger will be improved in order to tag those entries that have a unique structure in the dictionary.
526 ## - STUDY PROGRAM INFORMATION NOTE
-- 000-099
942 ## - ADDED ENTRY ELEMENTS
Item type RESERVED BOOKS
Source of classification or shelving scheme
Holdings
Withdrawn status Lost status Damaged status Not for loan Permanent Location Current Location Date acquired Full call number Barcode Date last seen Price effective from Item type
        COLLEGE LIBRARY COLLEGE LIBRARY 2019-09-27 T Al863 2012 T1692 2019-09-27 2019-09-27 RESERVED BOOKS