advancement of Information and Communication Technology has affected all parts
of the human life. It has changed the way we work, travel, think about and
the real world dumb people normally uses there gestures to express their feelings
with the others. it is a very difficult task to do. We can observe that the
technology has been developed very fast and presents each action in digital
form then it may be in images or audio format. In order to make their life more
advanced, an application is needed to be developed so they can get opportunity
to express their feelings and ideas and also they can get a chance to introduce
with new technologies.
to speech converter is a concept which helps the people to communicate better
with the rest of the world and among themselves by means of an electronic
device as a medium. This application
is mainly useful for the people who are visually impaired and the people who
are dumb. And provides a
more close to people in the real world. And also helps the people to learn new
technologies by expressing their feelings effectively.
character recognition (OCR), Text-to-Speech synthesis (TTS).
people are usually deprived of normal communication with other people in the
society. It has been observed that they find it really difficult at times to
interact with normal people with their gestures, as only a very few of those
are recognized by most people. Text to speech conversion concept will removes
this problem of the visually impaired and dumb people.
The main objective of this application is to convert the
given text into a corresponding spoken waveform. Character recognizing, Text
processing and speech generation are the main components this system.
A Text image or Text to speech converter convert’s
printed, written text image or text into speech. It is an artificial production
of human speech from the text that is input to an electronic machine. The
process of converting text to speech is called a speech synthesizing 2.
Text image or Text to speech converter application takes
input as a printed, written text image or text 1.
For image input it
has to follow
certain steps for speech
analysis like database creation, character recognition and text to speech
For the text input
it simply performs
a text to speech
This application takes input as the text image through
cam available to the mobile phone and text through text box, the image is
transferred to optical recognizer phase to convert into text and the output
from the OCR is input to the text-to-speech engine the speech engine will
provides the output in the from audio
and also at the same time the audio will be recorded automatically.
application will take an image as a input or the text through the text box and
also there is facility to retrieve the previous speeches those are stored
character recognition (OCR):
will take input as a printed text image or hand written text image and through
different steps like scanning, pre-processing, segmentation, feature extraction
and selection and finally it produces a text output. The output from the
optical character recognition is input to the speech engine
This phase is
generally implemented by text to speech engine which is a predefined software
provided by the android which consists of different phases like text analysis,
linguistic analysis and finally speech generation 910.
stores the previously generated speeches in the memory. Which provides a great
advantage because if the input is already once converted to speech the user
will not repeat all the process again. By simply retrieving the speech file
from the memory we can reuse it 8.
Optical Character Recognition System (Paper
Speech Synthesis(text to speech)
OPTICAL CHARACTER RECOGNITION (OCR):
is a mechanical or electronic translation of images to text, type written or
printed text 3.
BLOCK DIAGRAM: optical
takes input a
text image or text and sent to scanner.
Text digitization is a process
to convert the image into proper digital image.
3.3.Scanned image has a
resolution level typically 300- 1000 dot per inch for better accuracy of
text extraction and saves it in preferably TIF, JPG and GIF format.
consists of a number of preliminary steps to make the raw
data usable for recognizer. Firstly the scanned image is converted to gray
scale image by binarization method.
skew detection and correction method is necessary to
digitized image to make text lines horizontal. The noise free image is passed
to the segmentation step 4.
Here the image
3.7.Feature extraction and Classification:
All characters will be divided into
geometric elements like lines, arc and circles and compare the combination of
these elements with stored combination of known characters 5.
automatic generation of speech waveforms that convert the input text data to
speech waveforms 4.
prerecorded speech that is stored in database produces synthesized speech. The
TTS synthesizer is composed of two phases as mentioned in the fig. front end
and back end 67.
The two are
which converts the input text to a phoneme in the front end.
the phoneme to waveforms that can output as sound.
4. EXPERIMENTAL OUTPUTS:
In this way, we have completed the
design part of the project with the requirement specification. Modules of the
project are designed and are well studied in order to fulfill the requirements
of the project.
the completion of partial report is being completed with full hard work and
complete support and guidance of our guide and project plan is made to ensure
the proper planning of the project
and Simon King, Subjective Evaluation of Join Costand Smoothing Methods for
Unit Selection Speech Synthesis, IEEE
Audio Process, Vol. 14, (2006) pp. 1763 – 1771.
2)T. Dutoit, An
Introduction to Text-to-Speech Synthesis, Kluwer
Publishers, Dordrecht, ISBN 0-7923-4498-7, (1997).
Expressing Degree of Activation in Synthetic
Trans. Speech Audio Process, Vol. 14, (2006) pp.1128 – 1136.
Balyan, S.S. Agrwal and Amita Dev, Speech Synthesis:
ISSN 2278-0181 Vol. 2 (2013) p. 57 – 75.
Mak, Man-Wai; Wan-Chi Siu,, “Application of a fast real time recurrent
learning algorithm to text-tophoneme
Neural Networks, 1995. Proceedings., IEEE International Conference on , vol.5,
vol.5, Nov/Dec 1995.
Conkie, Thomas Okken, Yeon-Jun Kim, Giuseppe Di
Building Text-To-Speech Voices in the Cloud, in Proc.
Research, Park Avenue, Florham Park, NJ- USA).
Hunnicutt, Sharon, and Dennis Klatt, Text To Speech,
System (Cambridge: Cambridge University Press,
8)Hunt A.J. and
BlackA.W., “Unit selectionin a concatenative
system for a large speech database,”
of IEEE Int. Conf. Acoust., Speech, and
Processing, 1996, pp. 373–376.
et al., Text Processing for Text-to-Speech Systems in
Languages, Proc. in 36th ISCA Workshop on Speech
(Bonn, Germany, August 2007) pp.22-24.
and Dr. E. Chandra, Speech processing –An Overview,
Int. J. of Engg.
Sci. and Tech., Vol. 4, (2012) p. 2853-2860.