Introduction
Speech communication in urban environments is often affected by acoustic noises that can mask the speech signal and impair its intelligibility. This is especially challenging for people with Autism Spectrum Disorder (ASD), who are known to have auditory hypersensitivity and difficulties processing speech in noisy environments. Therefore, there is a need for developing methods that can enhance the intelligibility of noisy speech for ASD individuals.
Method
The authors of the paper propose a personalized method (pGTFF0) that uses harmonic features estimated from speech frames as center frequencies of Gammatone auditory filterbanks. A gain factor is then applied to the output of the filtered samples, emphasizing the fundamental frequency (F0) and its harmonics, which contain most of the intelligibility information. The method aims to emulate an external noise filtering tailored for individuals with ASD.
Evaluation
The proposed method is compared to three competing approaches: a baseline method that does not apply any filtering, a method that uses fixed Gammatone filterbanks (GTFF0), and a method that uses adaptive Gammatone filterbanks (aGTFF0). The evaluation is done using four acoustic noises (car, babble, factory, and street) at different signal-to-noise ratios (SNRs). The performance is measured using two objective metrics: ESTOI and PESQ, which reflect the intelligibility and quality of the enhanced speech, respectively. Additionally, a perceptual listening test is conducted with 10 ASD volunteers and 10 neurotypical (NT) volunteers, who are asked to identify words from the enhanced speech signals.
Results
The results show that the proposed method (pGTFF0) outperforms the competing approaches in terms of intelligibility and quality improvement, according to both objective and subjective measures. The results also show that ASD volunteers attain lower intelligibility rates than NT volunteers, indicating that ASD individuals are more affected by acoustic noises. The authors suggest that the proposed method can be used as a preprocessing step for speech enhancement systems or hearing aids designed for ASD individuals.
Conclusion
The paper introduces a personalized method (pGTFF0) that achieves intelligibility improvement of noisy speech for ASD situation. The method uses harmonic features as center frequencies of Gammatone auditory filterbanks and applies a gain factor to the filtered samples. The method is evaluated using four acoustic noises and different SNRs, and compared to three competing approaches. The experimental results show that the proposed method outperforms the competing approaches in terms of intelligibility and quality improvement, and that ASD individuals are more sensitive to acoustic noises than NT individuals. The paper suggests that the proposed method can be useful for speech enhancement systems or hearing aids for ASD individuals.
FAQ
how does ASD affect speech processing?
ASD individuals may have auditory hypersensitivity, which means they are more sensitive to loud or unpleasant sounds, and may experience discomfort or pain in noisy environments. ASD individuals may also have difficulties processing speech in noise, which means they may have trouble understanding what others are saying or expressing themselves clearly when there is background noise.
How does the proposed method (pGTFF0) work?
The proposed method uses harmonic features estimated from speech frames as center frequencies of Gammatone auditory filterbanks. A gain factor is then applied to the output of the filtered samples, emphasizing the fundamental frequency (F0) and its harmonics, which contain most of the intelligibility information.
What are Gammatone auditory filterbanks and why are they used in the proposed method (pGTFF0)?
Gammatone auditory filterbanks are a set of bandpass filters that mimic the frequency selectivity of the human auditory system. They are used in the proposed method (pGTFF0) to filter the noisy speech signal and emphasize the F0 and its harmonics, which contain most of the intelligibility information. The proposed method uses harmonic features estimated from speech frames as center frequencies of the Gammatone filterbanks, which allows for a personalized filtering tailored for each individual.
How is the F0 and its harmonics estimated from noisy speech frames?
The F0 and its harmonics are estimated from noisy speech frames using the Hilbert-Huang Transform (HHT) with amplitude modulation (HHT-Amp). HHT is a nonlinear and adaptive signal decomposition technique that can extract the intrinsic mode functions (IMFs) of a signal. HHT-Amp is a modification of HHT that can estimate the amplitude and phase of the IMFs. The F0 and its harmonics are obtained from the IMFs that have the highest energy and the lowest frequency.
What are the benefits of the proposed method (pGTFF0)?
The benefits of the proposed method are that it can improve the intelligibility and quality of noisy speech for ASD individuals, and that it can be personalized for each individual based on their auditory sensitivity and preference.
What are the limitations of the proposed method (pGTFF0)?
The limitations of the proposed method are that it requires a reliable estimation of the harmonic features from noisy speech frames, which can be challenging in low SNR conditions. Moreover, the proposed method does not consider the spectral and temporal characteristics of the noise, which may affect the performance of the filtering process.
What are the future directions of the research paper?
The future directions of the paper are to extend the proposed method to other types of noises and speech signals, such as non-stationary noises and dysarthric speech. The authors also plan to conduct more perceptual listening tests with larger and more diverse groups of ASD and NT volunteers, and to investigate the effects of the proposed method on the cognitive load and emotional state of the listeners.
Source: