Normal view MARC view ISBD view

Probabilistic speech recognition for tagalog lecture video / Chuchi B. Montenegro.

By: Montenegro, Chuchi B [author]

Description: 79 leavesContent type: text Media type: unmediated Carrier type: volumeSubject(s): Speech processing systems | Human engineering | Automatic speech recognitionDDC classification: 006.454 Dissertation note: Thesis (Master in Computer Science) -- Cebu Institute of Technology - University, March 2009. Summary: This project is a development of a system that would allow the searching of Tagalog texts from Tagalog-spoken speeches. This project makes use of pattern matching technique on text dependent speech recognition framework to return the most likely time occurrence of the searched word. Audio files are sampled at 16 KHz 16 bit mono format in a controlled environment, windowed at 256 samples per frame. Transformation of the signal into its frequency domain is done using a windowed Fast Fourier Transform (FFT). End-point detection algorithm is used to classify voiced and unvoiced signal of the sample audio files. The FFT analyzes each of the voiced signals and converts the audio data into the frequency domain. Each voiced signal classification results represents a graph of the amplitudes of frequency components, describing the sound heard for that particular signal. Probabilistic Speech Recognition for Tagalog Lecture Videos (PSRTLV) encompasses a database of such graphs (called a codebook) that identify different types of sounds the human voice can make. The sound is identified by matching it to its closest entry in the codebook using Euclidean distance computation. Experimental results yield an average recognition rate of 30%.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes
Comments ( 0 )

Item type	Current location	Home library	Call number	Status	Date due	Barcode	Item holds
THESIS / DISSERTATION	GRADUATE LIBRARY	GRADUATE LIBRARY	T M7646 2009 (Browse shelf)	Not for loan		CL-T1507

Total holds: 0

Thesis (Master in Computer Science) -- Cebu Institute of Technology - University, March 2009.

This project is a development of a system that would allow the searching of Tagalog texts from Tagalog-spoken speeches. This project makes use of pattern matching technique on text dependent speech recognition framework to return the most likely time occurrence of the searched word. Audio files are sampled at 16 KHz 16 bit mono format in a controlled environment, windowed at 256 samples per frame. Transformation of the signal into its frequency domain is done using a windowed Fast Fourier Transform (FFT). End-point detection algorithm is used to classify voiced and unvoiced signal of the sample audio files. The FFT analyzes each of the voiced signals and converts the audio data into the frequency domain. Each voiced signal classification results represents a graph of the amplitudes of frequency components, describing the sound heard for that particular signal. Probabilistic Speech Recognition for Tagalog Lecture Videos (PSRTLV) encompasses a database of such graphs (called a codebook) that identify different types of sounds the human voice can make. The sound is identified by matching it to its closest entry in the codebook using Euclidean distance computation. Experimental results yield an average recognition rate of 30%.

There are no comments for this item.

to post a comment.