Probabilistic speech recognition for tagalog lecture video / Chuchi B. Montenegro.

By: Montenegro, Chuchi B [author]
Description: 79 leavesContent type: text Media type: unmediated Carrier type: volumeSubject(s): Speech processing systems | Human engineering | Automatic speech recognitionDDC classification: 006.454 Dissertation note: Thesis (Master in Computer Science) -- Cebu Institute of Technology - University, March 2009. Summary: This project is a development of a system that would allow the searching of Tagalog texts from Tagalog-spoken speeches. This project makes use of pattern matching technique on text dependent speech recognition framework to return the most likely time occurrence of the searched word. Audio files are sampled at 16 KHz 16 bit mono format in a controlled environment, windowed at 256 samples per frame. Transformation of the signal into its frequency domain is done using a windowed Fast Fourier Transform (FFT). End-point detection algorithm is used to classify voiced and unvoiced signal of the sample audio files. The FFT analyzes each of the voiced signals and converts the audio data into the frequency domain. Each voiced signal classification results represents a graph of the amplitudes of frequency components, describing the sound heard for that particular signal. Probabilistic Speech Recognition for Tagalog Lecture Videos (PSRTLV) encompasses a database of such graphs (called a codebook) that identify different types of sounds the human voice can make. The sound is identified by matching it to its closest entry in the codebook using Euclidean distance computation. Experimental results yield an average recognition rate of 30%.
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
Item type Current location Home library Call number Status Date due Barcode Item holds
THESIS / DISSERTATION THESIS / DISSERTATION GRADUATE LIBRARY
GRADUATE LIBRARY
T M7646 2009 (Browse shelf) Not for loan CL-T1507
Total holds: 0

Thesis (Master in Computer Science) -- Cebu Institute of Technology - University, March 2009.

This project is a development of a system that would allow the searching of Tagalog texts from Tagalog-spoken speeches. This project makes use of pattern matching technique on text dependent speech recognition framework to return the most likely time occurrence of the searched word. Audio files are sampled at 16 KHz 16 bit mono format in a controlled environment, windowed at 256 samples per frame. Transformation of the signal into its frequency domain is done using a windowed Fast Fourier Transform (FFT). End-point detection algorithm is used to classify voiced and unvoiced signal of the sample audio files. The FFT analyzes each of the voiced signals and converts the audio data into the frequency domain. Each voiced signal classification results represents a graph of the amplitudes of frequency components, describing the sound heard for that particular signal. Probabilistic Speech Recognition for Tagalog Lecture Videos (PSRTLV) encompasses a database of such graphs (called a codebook) that identify different types of sounds the human voice can make. The sound is identified by matching it to its closest entry in the codebook using Euclidean distance computation. Experimental results yield an average recognition rate of 30%.

There are no comments for this item.

to post a comment.