Publicación:
Machine Learning Techniques for Speech Emotion Classification

dc.contributor.authorMelo Locumber, Noe
dc.contributor.authorFabian, Junior
dc.date.accessioned2025-08-11T16:44:16Z
dc.date.issued2021
dc.description.abstractIn this paper we propose and evaluate different models for speech emotion classification through audio signal processing, machine learning and deep learning techniques. For this purpose, we have collected from two databases (RAVDESS and TESS), a total of 5252 audio samples with 8 emotional classes (neutral, calm, happy, sad, angry, fearful, disgust and surprised). We have divided our experiments in 3 main stages. In the first stage, we have used feature engineering to extract relevant features from the time, spectral and cepstral domains. Features like ZCR, energy, spectral centroid, chroma, MFCC etc. were used to train a SVM classifier. The best model obtained an accuracy of 91.1%. In the second stage, we only have considered 40 MFCC coefficients for training several Deep Neural Networks such as CNN, LSTM and MLP were trained, the best model obtained an accuracy of 89.5% with an MLP architecture. Finally, for the third stage we have trained an end-to-end CNN network (SampleCNN) at the sample level. This last approach does not require features engineering, but directly the audio signal. In this stage, we achieve a precision of 81.7%. The experiments show that the results achieved are competitive and some experiments have surpassed in accuracy the related works. © 2021, Springer Nature Switzerland AG.
dc.identifier.doi10.1007/978-3-030-76228-5_6
dc.identifier.scopus2-s2.0-85111128406
dc.identifier.urihttps://cris.esan.edu.pe/handle/20.500.12640/764
dc.identifier.uuidfe157abf-c620-4559-9110-c69ee4684bd2
dc.language.isoen
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.relation.ispartofCommunications in Computer and Information Science
dc.rightshttp://purl.org/coar/access_right/c_14cb
dc.subjectAudio processing
dc.subjectMachine learning
dc.subjectSpeech emotion classification
dc.titleMachine Learning Techniques for Speech Emotion Classification
dc.typehttp://purl.org/coar/resource_type/c_2f33
dspace.entity.typePublication
oaire.citation.endPage89
oaire.citation.startPage77

Archivos

Colecciones