Skip to content

Research at St Andrews

Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks

Research output: Contribution to journalArticlepeer-review


Esma Mansouri Benssassi, Juan Ye

School/Research organisations


Emotion recognition through facial expression and non verbal speech represent an important area in affective computing. They have been extensive studied, from classical feature extraction techniques to more recent deep learning approaches. However most of these approaches face two major challenges: (1) robustness – in the face of degradation such as noise, can a model still make correct predictions?, and (2) cross-dataset generalisation – when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a Spiking Neural Network (SNN) in predicting emotional states based on facial expression and speech data, then investigate and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM respectively when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accu- racy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisa- tion of facial features and vocal characteristics across subjects.


Original languageEnglish
Number of pages14
JournalSoft Computing
VolumeFirst Online
Early online date16 Jan 2021
Publication statusE-pub ahead of print - 16 Jan 2021

    Research areas

  • Spiking neural network, Facial emotion recognition, Speech emotion recognition, Unsupervised learning

Discover related content
Find related publications, people, projects and more using interactive charts.

View graph of relations

Related by author

  1. ContrasGAN: unsupervised domain adaptation in Human Activity Recognition via adversarial and contrastive learning

    Rosales Sanabria, A., Zambonelli, F., Dobson, S. A. & Ye, J., 6 Nov 2021, (E-pub ahead of print) In: Pervasive and Mobile Computing. In Press, p. 1-34 34 p., 101477.

    Research output: Contribution to journalArticlepeer-review

  2. Collaborative activity recognition with heterogeneous activity sets and privacy preferences

    Civitarese, G., Ye, J., Zampatti, M. & Bettini, C., 4 Nov 2021, (E-pub ahead of print) In: Journal of Ambient Intelligence and Smart Environments. Pre-press, p. 1-20 20 p.

    Research output: Contribution to journalArticlepeer-review

  3. Investigating multisensory integration in emotion recognition through bio-inspired computational models

    Mansouri Benssassi, E. & Ye, J., 19 Aug 2021, (E-pub ahead of print) In: IEEE Transactions on Affective Computing. Early Access, 13 p.

    Research output: Contribution to journalArticlepeer-review

  4. Continual learning in sensor-based human activity recognition: an empirical benchmark analysis

    Jha, S., Schiemer, M., Zambonelli, F. & Ye, J., 16 Apr 2021, (E-pub ahead of print) In: Information Sciences. In Press, p. 1-35 35 p.

    Research output: Contribution to journalArticlepeer-review

  5. Continual activity recognition with generative adversarial networks

    Ye, J., Nakwijit, P., Schiemer, M., Jha, S. & Zambonelli, F., 27 Mar 2021, In: ACM Transactions on Internet of Things. 2, 2, p. 1-25 25 p., 9.

    Research output: Contribution to journalArticlepeer-review

ID: 271472115