Publicación: Machine Learning Techniques for Speech Emotion Classification
| dc.contributor.author | Melo Locumber, Noe | |
| dc.contributor.author | Fabian, Junior | |
| dc.date.accessioned | 2025-08-11T16:44:16Z | |
| dc.date.issued | 2021 | |
| dc.description.abstract | In this paper we propose and evaluate different models for speech emotion classification through audio signal processing, machine learning and deep learning techniques. For this purpose, we have collected from two databases (RAVDESS and TESS), a total of 5252 audio samples with 8 emotional classes (neutral, calm, happy, sad, angry, fearful, disgust and surprised). We have divided our experiments in 3 main stages. In the first stage, we have used feature engineering to extract relevant features from the time, spectral and cepstral domains. Features like ZCR, energy, spectral centroid, chroma, MFCC etc. were used to train a SVM classifier. The best model obtained an accuracy of 91.1%. In the second stage, we only have considered 40 MFCC coefficients for training several Deep Neural Networks such as CNN, LSTM and MLP were trained, the best model obtained an accuracy of 89.5% with an MLP architecture. Finally, for the third stage we have trained an end-to-end CNN network (SampleCNN) at the sample level. This last approach does not require features engineering, but directly the audio signal. In this stage, we achieve a precision of 81.7%. The experiments show that the results achieved are competitive and some experiments have surpassed in accuracy the related works. © 2021, Springer Nature Switzerland AG. | |
| dc.identifier.doi | 10.1007/978-3-030-76228-5_6 | |
| dc.identifier.scopus | 2-s2.0-85111128406 | |
| dc.identifier.uri | https://cris.esan.edu.pe/handle/20.500.12640/764 | |
| dc.identifier.uuid | fe157abf-c620-4559-9110-c69ee4684bd2 | |
| dc.language.iso | en | |
| dc.publisher | Springer Science and Business Media Deutschland GmbH | |
| dc.relation.ispartof | Communications in Computer and Information Science | |
| dc.rights | http://purl.org/coar/access_right/c_14cb | |
| dc.subject | Audio processing | |
| dc.subject | Machine learning | |
| dc.subject | Speech emotion classification | |
| dc.title | Machine Learning Techniques for Speech Emotion Classification | |
| dc.type | http://purl.org/coar/resource_type/c_2f33 | |
| dspace.entity.type | Publication | |
| oaire.citation.endPage | 89 | |
| oaire.citation.startPage | 77 |