000 | 02047nam a2200169Ia 4500 | ||
---|---|---|---|
999 |
_c47049 _d47049 |
||
005 | 20200308073504.0 | ||
008 | 190927s9999 xx 000 0 und d | ||
082 | _aT Al863 2012 | ||
100 | _aAlvarez, Dennis C. | ||
245 | 0 | _aXml-based corpus building for cebuano language using wolff dictionary | |
260 | _a3 | ||
260 | _b602 | ||
260 | _c2012 | ||
520 | _aDennis C. Alvarez; Cebu Institute of Technology-Unversity; March 2012; XML-based Corpus Buildin for Cebuano Language using Wolf Dictionary Adviser: Dr. Larmie T. Santos-Feliscuzo XML-based Corpus Buildin for Cebuano Language using Wolf Dictionary is a research study that is under the Naural Language Processing or NLP. It is a system where the entries of the Cebuano Dictionary that is in PDF format are extracted and given proper tags and then cenverted into an XML file. The XML file tha will be generated is going to be helpful in managing the data entries of the dictionary. The system is implemented by using Java NetBeans IDE 6.9.1, with the help of libraries like Apache PDFBox and POI Jakarta. The Apache PDFBox library is used to extract the data from the PDF file and convert it into a text file. The POI Jakarta library is used to write a file in excel for the randomization process in order to identify which six letters are going to be process for the testing phase. The File Handling process is used to convert the proper tag entries into its equivalent XML file. The XML file now contains the entries from the dictionary to have proper tags and data are managed within the XML file. It can now be used by other NPL projects like machine translation, Text-to-Speech, Spell Checker and others. For further enhancement, it would be best if the data extracted from the PDF file will be exactly just like in the file. Also, the NFA or the Non-deterministic Finite Automaton that is the algorithm used in making the XML tagger will be improved in order to tag those entries that have a unique structure in the dictionary. | ||
526 | _a000-099 | ||
942 |
_cRB _2ddc |