Normal view MARC view ISBD view

Xml-based corpus building for cebuano language using wolff dictionary (Record no. 47049)

000 -LEADER
fixed length control field	02047nam a2200169Ia 4500
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20200308073504.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	190927s9999 xx 000 0 und d
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	T Al863 2012
100 ## - MAIN ENTRY--PERSONAL NAME
Preferred name for the person	Alvarez, Dennis C.
245 #0 - TITLE STATEMENT
Title	Xml-based corpus building for cebuano language using wolff dictionary
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of Publication	Cebu City
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Name of Publisher	CIT-U
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Date of Publication	2012
520 ## - SUMMARY, ETC.
Summary, etc	Dennis C. Alvarez; Cebu Institute of Technology-Unversity; March 2012; XML-based Corpus Buildin for Cebuano Language using Wolf Dictionary<br/><br/>Adviser: Dr. Larmie T. Santos-Feliscuzo<br/><br/> XML-based Corpus Buildin for Cebuano Language using Wolf Dictionary is a research study that is under the Naural Language Processing or NLP. It is a system where the entries of the Cebuano Dictionary that is in PDF format are extracted and given proper tags and then cenverted into an XML file. The XML file tha will be generated is going to be helpful in managing the data entries of the dictionary. <br/><br/>The system is implemented by using Java NetBeans IDE 6.9.1, with the help of libraries like Apache PDFBox and POI Jakarta. The Apache PDFBox library is used to extract the data from the PDF file and convert it into a text file. The POI Jakarta library is used to write a file in excel for the randomization process in order to identify which six letters are going to be process for the testing phase. The File Handling process is used to convert the proper tag entries into its equivalent XML file. The XML file now contains the entries from the dictionary to have proper tags and data are managed within the XML file. It can now be used by other NPL projects like machine translation, Text-to-Speech, Spell Checker and others.<br/>For further enhancement, it would be best if the data extracted from the PDF file will be exactly just like in the file. Also, the NFA or the Non-deterministic Finite Automaton that is the algorithm used in making the XML tagger will be improved in order to tag those entries that have a unique structure in the dictionary.
526 ## - STUDY PROGRAM INFORMATION NOTE
--	000-099
942 ## - ADDED ENTRY ELEMENTS
Item type	RESERVED BOOKS
Source of classification or shelving scheme

Holdings
Withdrawn status	Lost status	Damaged status	Not for loan	Permanent Location	Current Location	Date acquired	Full call number	Barcode	Date last seen	Price effective from	Item type
				COLLEGE LIBRARY	COLLEGE LIBRARY	2019-09-27	T Al863 2012	T1692	2019-09-27	2019-09-27	RESERVED BOOKS