英文原文文献

玛丽莲梦兔
873次浏览
2020年07月30日 16:23
最佳经验
本文由作者推荐

卓在勋-实习单位鉴定意见


2009 International Conference on Computer Engineering and Technology
A Nios II Based English Speech Training System For Hearing- impaired
Children
Ningfeng Huang
School of Electronic Science
and Engineering
Southeast University
Nanjing, China
ningfengh@


Abstract

We propose a novel feature system of language
training for hearing-impaired children’s speech
rehabilitation based on DE1 development board,
which contains an on-chip system called ucos. The
system is imbedded with Nios II processors, which
could make our system stable, portable and cheap.
Considering the limited storage and calculation ability
of imbed system, at the same time, and we recognize
child’s voice by Single phoneme, we use LPCC as
Eigenvector to judge the similarity between speaker’s
voice and example’s.


To help hearing-impaired children learn to speak,
many researchers have developed various speech
training aids. Most of them are based on computer or
MCU+DSP[2-5]. This paper describes a Nios II based
English speech training system, which is designed for
the hearing-impaired children. The system is
developed on the Altera DE1 development board. DE1
board provides a lot of features to enable various
multimedia project developments for users. Cyclone II
FPGAs are perfectly suited as an embedded processor
or microcontroller when combined with Altera's 32-bit
Nios II embedded processor intellectual property (IP)
cores. Users can add many other functions to the
FPGA with additional IP cores available from Altera
Corporation and Altera's partners. Compared with the
computer or MCU+DSP based systems, system based
on Nios II is much more compact, stable and cheaper.
Considering the hearing-impaired children have
difficulty pronouncing and producing intelligible
speech, we design this system with several speech
training modules according to the feature of speech
training system for hearing-impaired children and the
connection between the acoustic character and
physiological character of the speech, such as
breathing training, vowel training and consonant
training.
This paper is organized as follows. In Section 2, a
brief introduction of the construction is presented. In
Section3, we describe the main algorithm we use to
recognize speech. In Section 4, we discuss about how
we realize it using Altera DE1 Multimedia


Haining Wu
School of Automation
Southeast University
Nanjing, China
wuhaining821@



Yinchen Song
School of Biological Science
and Medical Engineering
Southeast University
Nanjing, China
songyinchen@
1. Introduction

Hearing-impaired children have difficulty in
hearing, for which people may consider that they
cannot speak a word and will become so- called “deaf
and dumb” people. However, most of them actually
meet the requirements for learning to speak. Firstly,
most of the hearing-impaired children have healthy
speech organs. What’s more, using hearing aids to
amplify and modulate sounds, 95% hearing- impaired
children can compensate for their poor sense of sound,
as a result of which they can understand phonic
language and learn to speak. Thirdly, sense of vision,
feeling and motion contribute to learning phonic
language[1].
This work was supported by the Altera University Program.

978-0-7695-3521-009 $$25.00 © 2009 IEEE
DOI 10.1109ICCET.2009.159
452


Development Board and the experimental results are
given in Section 5. Section 6 is conclusion of the
whole paper.

2. Construction of A System

The system is designed for hearing-impaired
children to rehabilitate the speech ability. Figure 1
shows the structure of the system. The system is
developed on the Altera DE1 board. Using VGA
monitor and mouse as interfaces between users and
system, the system provides users some introductions
and samples of pronouncing syllables by audio and
acquires the speech signal from the users by
microphone. The speech signal will be processed by
advanced speech signal processing and recognition
technologies. The system will make a comparison
between users’ speech and the sample speech and
eventually give users score by quantitating the
matching rate.
speech organs perform during the complete process of
pronouncing a certain syllable with written instructions
both in Chinese and English step by step. The system
can response immediately after the user record his
speech. Then user could get the score from the
progress bar and replay his speech and the sample
speech, making a comparison on his own directly.

3. Speech Recognition

In a short period of time, voice can be considered to
be stable. Considering the low- frequency CPU of
embedded systems, limited storage resource
characteristics, we need to reduce the amount of
calculation to ensure real-time, we have adopted a
time-domain short-time voice processing as follows [7-
8]:
Figure1. Construction of the system
This training system consists of breathing training,
vowel training and consonant training[6].

Figure2. Structure of the training system
The breathing training is aimed to estimate the
length of the period that user pronounces a syllable at
his highest volume and help modulate his breathing.
The progress bar is used to display how much the time
the user have been pronouncing. We estimate that
children commonly could persist for 10 seconds.
In the vowel training and consonant training parts,
user could hear the sample speech and observe how the

Figure3. Speech recognition process
Data samples at 8 kHz, the length of a frame is 240 (30
ms), frame shift 80 (10 ms). There are 10 groups of
girls’ voice samplings and 10 groups of boys’ voice
samplings of children between 8~9 years old.
Preprocessing, including sub-frame, pre- emphasis,
and endpoint detection. Sub-frame is to multiple the
original voice signal with a mobile window, between
the two overlapping, we selected rectangular windows
function as a window.
Rectangular windows formula:
w(n)=1.0≤n≤N−1
(1)
(N is length of a frame)
The purpose of pre-emphasis is to increase the high-
frequency part of voice, which is the weaker part of
signal, and filter the baseline drift caused by DC
changes, filtering 50hz frequency interference. We use
FIR filter to realize it.
FIR formula:
data(n)=S(n)−0.93*S(n−1)
(2)
The system detects endpoint by measuring the
sample’s short-term average amplitude and short-term
zero-crossing rate,
Short-term average amplitude formula:

M
n
=

|S
w
(n)|

(3)

n
=
0
N

1




453


Short-term zero- crossing rate formula:
Z
n
=
1
{< br>∑
sgn[S
w
(n)]−sgn[S
w
(n−1)]
}
(4)
2
N

1

l
n
=
0
We use Autocorrelation to estimate the signal cycle
length, the formula is: (N is the length of a frame)
R
w
(l)
=

S
w
(n)S
w
(n
+
l)

l∈[1,N]
(5)
Figure5. Each five vectors figure of [iə]


Then we use Durbin algorithm to calculate linear
prediction-based cepstral coefficient (LPCC) by the
recurrence formulas showed in reference [9]

We match the example voice’s LPCC vector with
speaker’s voice vector, each vector contains 16 LPCCs.
a
i
is example’s LPCC, and bi is speaker’s. After having
rate value, we qualify error to rate, in order to show
percentage of speaker’s similarity, as following
formulas shows:
4. Hardware Implementation

This implement use FPGA to create system, based
on SOPC builder. Taking the NIOS II processor as a
core, we link many interfaces and user logic to the
Avalon bus. Because we bring many existing IP cores
to the design, many of which offered by Altera
corporation to support the DE1 board, development
process is largely accelerated[10].
In this system, we use Mathwork’s simulink
software and Altera corporation’s DSP builder
software to design the voice short-time signal
processing user logic which realized by hardware. This
block links with NIOS II system through Avalon MM
bus. In NIOS II system, we can obtain voice real-time
processing by using C language and this design has
been proved to be highly effective. It can reduce the
burden of the processing core and operation system to
make the system more stable.
The major functions of this model is to obtain the
real-time calculation of voice signal’s short time
average magnitude M
n
and short-time zero-crossing
rate Z
n
which are defined in section 4. These two
parameters of voice signal are very important in
judging the end-point of voice and further analysis of
the voice.
This model is composed by Avalon MM write bus,
pre-emphasize filter, short time average magnitude
calculate block, short-time zero-crossing rate calculate
block, signal combination blocks and Avalon MM read
bus.

16
Rate=100−100×log(error)
(7)

error
=
16
i
=
1
(a
i

b
i
)
2
(6)
Besides, if error > 10, rate0%, if error<1,
rate=100%.
Considering the specific condition of Diphthong,
like “oi”, “ei”, we analyze the LPCC graph simulated
in MATLAB, noticing that most of them mainly
contain two strands, each stands for corresponding
monophthong’s LPCC graph, So we take first five
vectors of each monophthong’s LPCC and calculate
error and qualify rate as above, call them Rate1, Rate2,
then it turns out that Rate= (Rate1+Rate2)2.
Take [iə] for example; figure 4 shows the all vectors
figure of [iə], figure 5 shows each monophthong’s five
vectors figure of [iə].
3
2
1
0< br>-1
-2
-3
-4
0

Figure4. All LPCC vectors figure of [iə]
246810121416

454



5. Experiment

After realizing speech processing in Nios, we use
the wavread function, contained in Matlab, to convert
the sample signal to txt document, so c program can
read and calculate LPCC vector, and then show them
in Matlab.
In this figure, blue lines are speakers’ vector curve
of “i:” pronunciation, the red one is the average of
them, standing for the standard curve. The difference
of them is small and reasonable.
Figure6. The entire design in dsp builder

Figure7. FIR filter symbol


Figure8. Sub-frame accumulation
After simulation, we can see the block work
satisfactorily in detecting the endpoint of the
pronunciation. From top to button, the fist one is wave
of two words, the second one is the signal filtered by
FIR filter, we can see DC part significantly reduced
high- frequency gain significant. The third one and the
forth one are the wave of the average amplitude and
zero-cross rate, observing the transition part between
two words, the amplitude is almost zero, the rate
remain high.

Figure10. Simulation result

6. Conclusion

This paper proposed a novel feature system of
English speech training for hearing-impaired
children’s speech rehabilitation. The system
developed on Altera DE1 development board, which is
embedded with Nios II processors, is much more stable,
portable and cheaper by comparison.
The training system consists of 3 sections as
follows: breathing training, vowel training and
consonant training. It aims to help modulate users’
breathing while talking and let them produce
intelligible speech.
This system adopts LPCC algorithm to determine
the accuracy of pronunciation, maintained by
programming c in imbedded Nios II processors. The
pre-processing part is realized by DSP builder, which
works as a block linked to Avalon MM bus.
The authors invited students from Nanchang Road
Elementary School (Nanjing) to record their voices
and made them into speech samples, given that users
of this system are hearing-impaired children. And the
authors also tested for a lot of times in order to make

Figure9. Simulation result

455


sure of the system’s stability. Although the system is
currently designed for hearing-impaired children’s
learning English speech, it is expected that the
developed system could be a useful tool for hearing-
impaired people in speech training no matter how old
they are and what language they are going to speak.

2001. PACRIM. 2001 IEEE Pacific Rim Conference on,
vol. 1, pp. 51-54

[4]
Y.Y. Shi, J. Liu and R.S. Liu, “Single-chip speech
recognition system based on 8051 microcontroller
core,” IEEE Transactions on Consumer Electronics,
Vol. 47, No. 1, 2001, pp. 149-153
“On design and implementation of an embedded
automatic speech recognition”, in Proc. of the 17th Int.
Conf. on VLSI Design(VLSID’04)

[5]
S. Phadke, R. Limaye, S. Verma and K. Subramanian,
7. Acknowledgment

We wish to gratefully acknowledge the advice,
comments and assistance we received from Professor
Tang Yongming of the School of Electronic Science
and Engineering of Southeast University. And we
deeply appreciate Ms. Liu Ruixue of Nanchang Road
Elementary School (Nanjing) for providing children’s
speech samples.


tutors for language therapy,” in Proc. of the 6th
Mexican Int. Conf. on Computer Science(ENC’05), pp.
26 – 30
[7]
Y. Yazama, Y. Mitsukura, and N. Akamatsu,
“Vowel recognition method by using features included
in amplitude for mobile device,” in Proc. of the 2004
IEEE Int. Workshop on Robot and Human Interactive
Communication, Kurashiki, pp. 613-618

[8]
W.M. Campbell, K.T. Assaleh and C.C. Brown, “Low-
complexity small-vocabulary speech recognition for
portable devices,” in 5th Int. Symposium on Signal
Processing and its Applications, ISSPA ’99, Brisbane,
pp. 619-622

[9]
G.D. Wu and Z.W. Zhu, “Chip design of LPC-cepstrum
for speech recognition,” in 6th IEEEACIS Int. Conf. on
Computer and Information Science (ICIS 2007)

[10]
DE1 Development and Education Board User Manual,
Version 1.1, Altera Co., San Jose, CA , 2006, pp. 1-45
[6]
I. Kirschning and M.T. Toledo, “Vowel & diphthong
8. References

[1]
Alsaka, Y.A.; Doll, S.; Davis, S., “Portable speech
recognition for the speech and hearing impaired,”
Southeastcon '97. 'Engineering new New Century'.,
Proc. IEEE, 1997, pp. 151-153
Young, “A computer based software for hearing
impaired children’s speech training and learning
between teacher and parents in Taiwan,” in 2001 Proc.
of the 23rd Ann. EMBS Int. Conf., Istanbul, pp. 1457-
1459.
speech training of hearing impaired children,”
Communications, Computers and signal Processing,

[2]
M.L. Hsiao, P.T. Li, P.Y. Lin, S.T. Tang, T.C. Lee, S.T.

[3]
Weerasinghe, D.; Dias, D.,




456

2014考研数学-对党的认识


常用标点符号-暑期师德培训体会


吕梁高专-2013年四川高考


吴樾图片-大写8


华夏银行北京分行-山东干部管理学院


林徽因传-村官工作总结


句容市教育局-裁判员代表宣誓词


主婚人婚礼致辞-读后感300字