英文原文文献
卓在勋-实习单位鉴定意见
2009 International Conference on Computer
Engineering and Technology
A Nios II Based
English Speech Training System For Hearing-
impaired
Children
Ningfeng Huang
School of Electronic Science
and
Engineering
Southeast University
Nanjing,
China
ningfengh@
Abstract
We propose a novel feature system of language
training for hearing-impaired children’s
speech
rehabilitation based on DE1 development
board,
which contains an on-chip system called
ucos. The
system is imbedded with Nios II
processors, which
could make our system
stable, portable and cheap.
Considering the
limited storage and calculation ability
of
imbed system, at the same time, and we recognize
child’s voice by Single phoneme, we use LPCC
as
Eigenvector to judge the similarity between
speaker’s
voice and example’s.
To help hearing-impaired children learn to
speak,
many researchers have developed various
speech
training aids. Most of them are based
on computer or
MCU+DSP[2-5]. This paper
describes a Nios II based
English speech
training system, which is designed for
the
hearing-impaired children. The system is
developed on the Altera DE1 development board.
DE1
board provides a lot of features to enable
various
multimedia project developments for
users. Cyclone II
FPGAs are perfectly suited
as an embedded processor
or microcontroller
when combined with Altera's 32-bit
Nios II
embedded processor intellectual property (IP)
cores. Users can add many other functions to
the
FPGA with additional IP cores available
from Altera
Corporation and Altera's partners.
Compared with the
computer or MCU+DSP based
systems, system based
on Nios II is much more
compact, stable and cheaper.
Considering the
hearing-impaired children have
difficulty
pronouncing and producing intelligible
speech,
we design this system with several speech
training modules according to the feature of
speech
training system for hearing-impaired
children and the
connection between the
acoustic character and
physiological character
of the speech, such as
breathing training,
vowel training and consonant
training.
This paper is organized as follows. In Section
2, a
brief introduction of the construction is
presented. In
Section3, we describe the main
algorithm we use to
recognize speech. In
Section 4, we discuss about how
we realize it
using Altera DE1 Multimedia
Haining
Wu
School of Automation
Southeast
University
Nanjing, China
wuhaining821@
Yinchen Song
School of
Biological Science
and Medical Engineering
Southeast University
Nanjing, China
songyinchen@
1. Introduction
Hearing-impaired children have difficulty in
hearing, for which people may consider that
they
cannot speak a word and will become so-
called “deaf
and dumb” people. However, most
of them actually
meet the requirements for
learning to speak. Firstly,
most of the
hearing-impaired children have healthy
speech
organs. What’s more, using hearing aids to
amplify and modulate sounds, 95% hearing-
impaired
children can compensate for their
poor sense of sound,
as a result of which they
can understand phonic
language and learn to
speak. Thirdly, sense of vision,
feeling and
motion contribute to learning phonic
language[1].
This work was supported by
the Altera University Program.
978-0-7695-3521-009 $$25.00 © 2009 IEEE
DOI
10.1109ICCET.2009.159
452
Development
Board and the experimental results are
given
in Section 5. Section 6 is conclusion of the
whole paper.
2. Construction of A
System
The system is designed for
hearing-impaired
children to rehabilitate the
speech ability. Figure 1
shows the structure
of the system. The system is
developed on the
Altera DE1 board. Using VGA
monitor and mouse
as interfaces between users and
system, the
system provides users some introductions
and
samples of pronouncing syllables by audio and
acquires the speech signal from the users by
microphone. The speech signal will be
processed by
advanced speech signal processing
and recognition
technologies. The system will
make a comparison
between users’ speech and
the sample speech and
eventually give users
score by quantitating the
matching rate.
speech organs perform during the complete
process of
pronouncing a certain syllable with
written instructions
both in Chinese and
English step by step. The system
can response
immediately after the user record his
speech.
Then user could get the score from the
progress bar and replay his speech and the
sample
speech, making a comparison on his own
directly.
3. Speech Recognition
In a short period of time, voice can be
considered to
be stable. Considering the low-
frequency CPU of
embedded systems, limited
storage resource
characteristics, we need to
reduce the amount of
calculation to ensure
real-time, we have adopted a
time-domain
short-time voice processing as follows [7-
8]:
Figure1. Construction of the system
This
training system consists of breathing training,
vowel training and consonant training[6].
Figure2. Structure of the training system
The breathing training is aimed to estimate
the
length of the period that user pronounces
a syllable at
his highest volume and help
modulate his breathing.
The progress bar is
used to display how much the time
the user
have been pronouncing. We estimate that
children commonly could persist for 10
seconds.
In the vowel training and consonant
training parts,
user could hear the sample
speech and observe how the
Figure3.
Speech recognition process
Data samples at 8
kHz, the length of a frame is 240 (30
ms),
frame shift 80 (10 ms). There are 10 groups of
girls’ voice samplings and 10 groups of boys’
voice
samplings of children between 8~9 years
old.
Preprocessing, including sub-frame, pre-
emphasis,
and endpoint detection. Sub-frame is
to multiple the
original voice signal with a
mobile window, between
the two overlapping, we
selected rectangular windows
function as a
window.
Rectangular windows formula:
w(n)=1.0≤n≤N−1
(1)
(N is length of a frame)
The
purpose of pre-emphasis is to increase the
high-
frequency part of voice, which is the
weaker part of
signal, and filter the baseline
drift caused by DC
changes, filtering 50hz
frequency interference. We use
FIR filter to
realize it.
FIR formula:
data(n)=S(n)−0.93*S(n−1)
(2)
The system detects endpoint by measuring
the
sample’s short-term average amplitude and
short-term
zero-crossing rate,
Short-term
average amplitude formula:
M
n
=
∑
|S
w
(n)|
(3)
n
=
0
N
−
1
453
Short-term zero-
crossing rate formula:
Z
n
=
1
{<
br>∑
sgn[S
w
(n)]−sgn[S
w
(n−1)]
}
(4)
2
N
−
1
−
l
n
=
0
We use Autocorrelation to estimate
the signal cycle
length, the formula is: (N is
the length of a frame)
R
w
(l)
=
∑
S
w
(n)S
w
(n
+
l)
l∈[1,N]
(5)
Figure5. Each five
vectors figure of [iə]
Then we use
Durbin algorithm to calculate linear
prediction-based cepstral coefficient (LPCC)
by the
recurrence formulas showed in reference
[9]
We match the
example voice’s LPCC vector with
speaker’s
voice vector, each vector contains 16 LPCCs.
a
i
is example’s LPCC, and bi is
speaker’s. After having
rate value, we qualify
error to rate, in order to show
percentage of
speaker’s similarity, as following
formulas
shows:
4. Hardware Implementation
This implement use FPGA to create system,
based
on SOPC builder. Taking the NIOS II
processor as a
core, we link many interfaces
and user logic to the
Avalon bus. Because we
bring many existing IP cores
to the design,
many of which offered by Altera
corporation to
support the DE1 board, development
process is
largely accelerated[10].
In this system, we
use Mathwork’s simulink
software and Altera
corporation’s DSP builder
software to design
the voice short-time signal
processing user
logic which realized by hardware. This
block
links with NIOS II system through Avalon MM
bus. In NIOS II system, we can obtain voice
real-time
processing by using C language and
this design has
been proved to be highly
effective. It can reduce the
burden of the
processing core and operation system to
make
the system more stable.
The major functions
of this model is to obtain the
real-time
calculation of voice signal’s short time
average magnitude M
n
and short-time
zero-crossing
rate Z
n
which are
defined in section 4. These two
parameters of
voice signal are very important in
judging the
end-point of voice and further analysis of
the
voice.
This model is composed by Avalon MM
write bus,
pre-emphasize filter, short time
average magnitude
calculate block, short-time
zero-crossing rate calculate
block, signal
combination blocks and Avalon MM read
bus.
16
Rate=100−100×log(error)
(7)
∑
error
=
16
i
=
1
(a
i
−
b
i
)
2
(6)
Besides, if error > 10, rate0%, if
error<1,
rate=100%.
Considering the
specific condition of Diphthong,
like “oi”,
“ei”, we analyze the LPCC graph simulated
in
MATLAB, noticing that most of them mainly
contain two strands, each stands for
corresponding
monophthong’s LPCC graph, So we
take first five
vectors of each monophthong’s
LPCC and calculate
error and qualify rate as
above, call them Rate1, Rate2,
then it turns
out that Rate= (Rate1+Rate2)2.
Take [iə] for
example; figure 4 shows the all vectors
figure
of [iə], figure 5 shows each monophthong’s five
vectors figure of [iə].
3
2
1
0<
br>-1
-2
-3
-4
0
Figure4. All
LPCC vectors figure of [iə]
246810121416
454
5. Experiment
After realizing speech processing in Nios, we
use
the wavread function, contained in Matlab,
to convert
the sample signal to txt document,
so c program can
read and calculate LPCC
vector, and then show them
in Matlab.
In
this figure, blue lines are speakers’ vector curve
of “i:” pronunciation, the red one is the
average of
them, standing for the standard
curve. The difference
of them is small and
reasonable.
Figure6. The entire design in dsp
builder
Figure7. FIR filter symbol
Figure8. Sub-frame accumulation
After
simulation, we can see the block work
satisfactorily in detecting the endpoint of
the
pronunciation. From top to button, the
fist one is wave
of two words, the second one
is the signal filtered by
FIR filter, we can
see DC part significantly reduced
high-
frequency gain significant. The third one and the
forth one are the wave of the average
amplitude and
zero-cross rate, observing the
transition part between
two words, the
amplitude is almost zero, the rate
remain
high.
Figure10. Simulation result
6. Conclusion
This paper proposed a
novel feature system of
English speech
training for hearing-impaired
children’s
speech rehabilitation. The system
developed
on Altera DE1 development board, which is
embedded with Nios II processors, is much more
stable,
portable and cheaper by comparison.
The training system consists of 3 sections as
follows: breathing training, vowel training
and
consonant training. It aims to help
modulate users’
breathing while talking and
let them produce
intelligible speech.
This system adopts LPCC algorithm to determine
the accuracy of pronunciation, maintained by
programming c in imbedded Nios II processors.
The
pre-processing part is realized by DSP
builder, which
works as a block linked to
Avalon MM bus.
The authors invited students
from Nanchang Road
Elementary School (Nanjing)
to record their voices
and made them into
speech samples, given that users
of this
system are hearing-impaired children. And the
authors also tested for a lot of times in
order to make
Figure9. Simulation result
455
sure of the system’s
stability. Although the system is
currently
designed for hearing-impaired children’s
learning English speech, it is expected that
the
developed system could be a useful tool
for hearing-
impaired people in speech training
no matter how old
they are and what language
they are going to speak.
2001. PACRIM.
2001 IEEE Pacific Rim Conference on,
vol. 1,
pp. 51-54
[4]
Y.Y. Shi, J. Liu and
R.S. Liu, “Single-chip speech
recognition
system based on 8051 microcontroller
core,”
IEEE Transactions on Consumer Electronics,
Vol. 47, No. 1, 2001, pp. 149-153
“On
design and implementation of an embedded
automatic speech recognition”, in Proc. of the
17th Int.
Conf. on VLSI Design(VLSID’04)
[5]
S. Phadke, R. Limaye, S. Verma and K.
Subramanian,
7. Acknowledgment
We
wish to gratefully acknowledge the advice,
comments and assistance we received from
Professor
Tang Yongming of the School of
Electronic Science
and Engineering of
Southeast University. And we
deeply appreciate
Ms. Liu Ruixue of Nanchang Road
Elementary
School (Nanjing) for providing children’s
speech samples.
tutors for
language therapy,” in Proc. of the 6th
Mexican
Int. Conf. on Computer Science(ENC’05), pp.
26
– 30
[7]
Y. Yazama, Y. Mitsukura, and N.
Akamatsu,
“Vowel recognition method by using
features included
in amplitude for mobile
device,” in Proc. of the 2004
IEEE Int.
Workshop on Robot and Human Interactive
Communication, Kurashiki, pp. 613-618
[8]
W.M. Campbell, K.T. Assaleh and C.C.
Brown, “Low-
complexity small-vocabulary speech
recognition for
portable devices,” in 5th Int.
Symposium on Signal
Processing and its
Applications, ISSPA ’99, Brisbane,
pp. 619-622
[9]
G.D. Wu and Z.W. Zhu, “Chip
design of LPC-cepstrum
for speech
recognition,” in 6th IEEEACIS Int. Conf. on
Computer and Information Science (ICIS 2007)
[10]
DE1 Development and Education
Board User Manual,
Version 1.1, Altera Co.,
San Jose, CA , 2006, pp. 1-45
[6]
I.
Kirschning and M.T. Toledo, “Vowel & diphthong
8. References
[1]
Alsaka, Y.A.;
Doll, S.; Davis, S., “Portable speech
recognition for the speech and hearing
impaired,”
Southeastcon '97. 'Engineering new
New Century'.,
Proc. IEEE, 1997, pp. 151-153
Young, “A computer based software for hearing
impaired children’s speech training and
learning
between teacher and parents in
Taiwan,” in 2001 Proc.
of the 23rd Ann. EMBS
Int. Conf., Istanbul, pp. 1457-
1459.
speech training of hearing impaired children,”
Communications, Computers and signal
Processing,
[2]
M.L. Hsiao, P.T. Li,
P.Y. Lin, S.T. Tang, T.C. Lee, S.T.
[3]
Weerasinghe, D.; Dias, D.,
456