How Machine Learning Can Help Analyze the Speech of Boys with Autism and Down Syndrome

Table of Contents

Introduction

In this blog post, I will summarize a recent research paper by O.V. Makhnytkina, O.V. Frolova, and E.E. Lyakso from ITMO University and Saint Petersburg State University. The paper is titled “Machine Learning Methods for Analyzing Morphological and Lexical Characteristics of Speech of Boys with Autism Spectrum Disorders and Down Syndrome” and was published in 2024 in the journal “Linguistics”.

The paper proposes an approach to identify the differences in the speech of boys with typical development (TD), autism spectrum disorder (ASD), and Down syndrome (DS) based on the comparison of morphological and lexical characteristics of their speech. The paper also presents the results of applying machine learning methods to classify the dialogues of boys into three groups: TD, ASD, and DS.

Data Collection and Processing

The authors interviewed 69 boys, aged 7 to 10 years, from three groups: 20 TD boys, 14 with DS, and 35 with ASD. The interviews were conducted by a psychologist using a semi-structured protocol that included questions about personal information, hobbies, preferences, emotions, and social interactions. The interviews were recorded and transcribed.

The authors used the morphological analyzer pymorphy2 to extract linguistic features from the dialogues. They focused on the responses of the boys, not the questions of the psychologist. They extracted 45 linguistic features for each response, such as the number of words, the number of sentences, the average word length, the part-of-speech distribution, the number of pronouns, the number of verbs, the number of adjectives, the number of adverbs, the number of conjunctions, the number of interjections, the number of nouns, the number of numerals, the number of particles, the number of prepositions, the number of punctuation marks, the number of unique words, the lexical diversity, the type-token ratio, the hapax legomena ratio, the Brunet’s index, the Honore’s statistic, the Yule’s characteristic, the Simpson’s index, the entropy, the mean length of utterance, the subordination index, the coordination index, the Flesch-Kincaid readability score, the Flesch reading ease score, the Gunning fog index, the SMOG index, the Coleman-Liau index, the automated readability index, the Linsear Write formula, the Dale-Chall readability score, the average grade level, the number of spelling errors, the number of grammatical errors, the number of semantic errors, the number of pragmatic errors, the number of repetitions, the number of self-corrections, the number of fillers, the number of pauses, and the speech rate.

Statistical Analysis and Feature Selection

The authors used the Mann-Whitney U test to assess the differences in the linguistic features of the speech of boys from different groups. They found that there were significant differences for 31 features between TD and ASD, 31 features between TD and DS, and 15 features between ASD and DS. They selected these features as the most informative ones for the classification task.

Classification Models and Results

The authors used three machine learning methods to build classification models: gradient boosting, random forest, and Ada Boost. They used 10-fold cross-validation to evaluate the performance of the models. They used accuracy, precision, recall, and F1-score as the evaluation metrics.

The results showed that the gradient boosting model achieved the best performance, with an accuracy of 88%, a precision of 89%, a recall of 88%, and an F1-score of 88%. The random forest model achieved an accuracy of 86%, a precision of 87%, a recall of 86%, and an F1-score of 86%. The Ada Boost model achieved an accuracy of 84%, a precision of 85%, a recall of 84%, and an F1-score of 84%.

The authors also analyzed the confusion matrices of the models and found that the most common errors were between ASD and DS, which indicates that these two groups have more similarities in their speech than with TD.

Discussion and Conclusion

The authors discussed the implications of their findings for the diagnosis and intervention of children with ASD and DS. They suggested that the linguistic features of the speech of boys with ASD and DS can be used as indicators of their cognitive and social development, as well as their communication skills. They also suggested that the machine learning methods can be used as tools for screening and monitoring the speech of boys with ASD and DS, as well as for providing feedback and guidance for their speech therapy.

The authors concluded that their approach can help identify the differences in the speech of boys with TD, ASD, and DS based on the comparison of morphological and lexical characteristics of their speech. They also concluded that their approach can help classify the dialogues of boys into three groups: TD, ASD, and DS, with high accuracy and reliability. They acknowledged the limitations of their study, such as the small sample size, the gender bias, the language specificity, and the lack of contextual information. They suggested some directions for future research, such as increasing the sample size, including girls, using other languages, and incorporating prosodic and pragmatic features.

Faq

What are the gradient boosting, random forest, and Ada Boost methods and how do they work?

Gradient boosting, random forest, and Ada Boost are machine learning methods that use ensembles of decision trees to perform classification or regression tasks. They work as follows:

Gradient boosting: It builds a sequence of weak decision trees, where each tree tries to correct the errors of the previous one by fitting the negative gradient of the loss function. It combines the predictions of all the trees by a weighted sum to produce the final output.
Random forest: It builds a set of independent decision trees, where each tree is trained on a random subset of the features and a random subset of the data. It averages the predictions of all the trees to produce the final output.
Ada Boost: It builds a sequence of weak decision trees, where each tree is trained on a weighted version of the data. It assigns higher weights to the data points that are misclassified by the previous tree, and lower weights to the data points that are correctly classified. It combines the predictions of all the trees by a weighted majority vote to produce the final output.

What is the difference between morphological and lexical characteristics of speech?

Morphological characteristics of speech are related to the structure and form of words, such as their part-of-speech, number, case, gender, tense, aspect, mood, voice, etc. Lexical characteristics of speech are related to the meaning and usage of words, such as their frequency, diversity, complexity, readability, etc.

What are the prosodic and pragmatic features of speech and why are they important?

Prosodic features of speech are related to the rhythm, melody, and stress of speech, such as the intonation, the pitch, the volume, the speed, the pauses, etc. Pragmatic features of speech are related to the use and function of speech in context, such as the politeness, the relevance, the coherence, the inference, etc.

Prosodic and pragmatic features of speech are important because:

They can convey the attitude, emotion, intention, and personality of the speaker, which are essential for social interaction and communication.
They can enhance the meaning, clarity, and effectiveness of the speech, which are important for cognition and learning.

They can indicate the presence, severity, and type of speech disorders, which are useful for diagnosis and intervention.

What are the advantages and disadvantages of using linguistic features for speech analysis?

The advantages of using linguistic features for speech analysis are:

They can capture the syntactic, semantic, and pragmatic aspects of speech, which are important for communication and cognition.
They can be extracted automatically and objectively from the text or speech data, without requiring human annotation or interpretation.
They can be used for various tasks, such as classification, clustering, summarization, sentiment analysis, etc.

The disadvantages of using linguistic features for speech analysis are:

They may not reflect the underlying cognitive and psychological processes of the speaker, which may be influenced by factors such as age, gender, education, culture, emotion, etc.

They may not account for the contextual and situational factors of the speech, such as the topic, the purpose, the audience, the feedback, etc.
They may not capture the prosodic and nonverbal features of the speech, such as the intonation, the stress, the pitch, the volume, the speed, the pauses, the gestures, the facial expressions, the eye contact, etc.

What are the advantages and disadvantages of using machine learning methods for speech analysis?

The advantages of using machine learning methods for speech analysis are:

They can handle large and complex data sets, which are common in speech analysis.
They can learn from the data and adapt to the changes and variations in the data, which are common in speech analysis.

They can perform various tasks, such as classification, clustering, regression, etc., which are useful for speech analysis.

The disadvantages of using machine learning methods for speech analysis are:

They may require a lot of data and computational resources, which may not be available or affordable for speech analysis.
They may be difficult to interpret and explain, which may limit the understanding and trust of the results for speech analysis.
They may be sensitive to the quality and representation of the data, which may affect the performance and validity of the results for speech analysis.

Source:

https://nguhist.elpub.ru/jour/article/view/1855