Publicación: Machine Learning Techniques for Speech Emotion Classification
Autor corporativo
Recolector de datos
Otros/Desconocido
Director audiovisual
Editor
Tipo de Material
Fecha
Citación
Título de serie/ reporte/ volumen/ colección
Es Parte de
Resumen
In this paper we propose and evaluate different models for speech emotion classification through audio signal processing, machine learning and deep learning techniques. For this purpose, we have collected from two databases (RAVDESS and TESS), a total of 5252 audio samples with 8 emotional classes (neutral, calm, happy, sad, angry, fearful, disgust and surprised). We have divided our experiments in 3 main stages. In the first stage, we have used feature engineering to extract relevant features from the time, spectral and cepstral domains. Features like ZCR, energy, spectral centroid, chroma, MFCC etc. were used to train a SVM classifier. The best model obtained an accuracy of 91.1%. In the second stage, we only have considered 40 MFCC coefficients for training several Deep Neural Networks such as CNN, LSTM and MLP were trained, the best model obtained an accuracy of 89.5% with an MLP architecture. Finally, for the third stage we have trained an end-to-end CNN network (SampleCNN) at the sample level. This last approach does not require features engineering, but directly the audio signal. In this stage, we achieve a precision of 81.7%. The experiments show that the results achieved are competitive and some experiments have surpassed in accuracy the related works. © 2021, Springer Nature Switzerland AG.

