Emotion recognition abilities across stimulus modalities in schizophrenia and the role of visual attention
Introduction
Impaired emotion recognition is well documented in schizophrenia (Edwards et al., 2002, Hoekert et al., 2007, Pinkham et al., 2007, Kohler et al., 2010) and has been linked to functional outcomes such as poorer community functioning, social skill, and social behavior (Couture et al., 2006, Meyer and Kurtz, 2009, Fett et al., 2011). Although research on emotion recognition in schizophrenia has primarily focused on visual processing of emotional information (e.g. facial emotion recognition), a number of studies have investigated emotional prosody recognition, or the ability to accurately identify the emotional content of spoken words. Effect size estimates indicate patients show greater impairment in affective prosody recognition (d = 1.24) than facial affect recognition (d = .91) (Hoekert et al., 2007, Kohler et al., 2010), suggesting that modality of stimulus presentation may impact emotion recognition abilities.
To date, only a few studies have directly compared emotion recognition abilities across modalities. Early studies provided mixed results for advantages between auditory and visual presentations (Borod et al., 1989, Whittaker et al., 1994, Haskins et al., 1995, Edwards et al., 2001); however, these studies used different tasks for each modality, which introduces methodological confounds. To address this limitation, Fiszdon and Bell (2009) compared performance on an audio only version of the Bell Lysaker Emotion Recognition Task (BLERT) to a multichannel version which included both audio and visual cues. Whereas both patients and controls were more accurate in the multichannel condition, patients benefitted less from the addition of visual information than controls, which is consistent with evidence of multisensory integration abnormalities in individuals with schizophrenia (de Gelder et al., 2002, de Jong et al., 2009). In a similar paradigm, Thaler et al. (2013) presented individuals with audio only, visual only, and audiovisual versions of the BLERT. Again, patients were more accurate for the audiovisual condition relative to the audio only condition, but patients performed comparably between the visual only and audiovisual conditions. Thus, improved performance in patients appeared to be due to the presence of visual information with no additional benefit for concurrently presented audio and visual information. This pattern of performance is also consistent with work investigating multisensory integration in schizophrenia and particularly with a hypothesis that emerges from this work, namely that visual inputs may show dominance over auditory inputs for patients (de Gelder et al., 2005).
Impaired multisensory integration and an over-reliance on the visual modality may in part explain why patients do not show the expected level of benefit for the audiovisual modality. An important consideration, however, relates to how visually presented emotional information is processed by patients. Several studies report restricted visual scanpaths of faces in schizophrenia, and although these studies have used only static visual stimuli lacking an audio component, they have generally found reduced attention to core facial features such as the eyes and mouth in patients versus healthy controls (Loughland et al., 2002a, Loughland et al., 2002b). Thus, if patients prioritize visual information regardless of whether audio information is available and fail to attend to the most relevant portions of the face, this may explain why they perform similarly between visual and audiovisual modalities and why they continue to perform more poorly than controls. The use of eye tracking during visual and audiovisual emotion recognition conditions could provide important information on the contribution of visual attention to performance across modalities.
The current study used dynamic audio-visual stimuli to create a multimodal emotion recognition task presented to participants under three conditions: audio only, video only, combined audio and video. Eye tracking was utilized to investigate differences in visual scanpaths during both video conditions.
Consistent with previous work, we hypothesized patients with schizophrenia would be less accurate at emotion recognition than matched healthy controls on all three conditions. Additionally, consistent with Fiszdon and Bell (2009), we anticipated both groups would have increased accuracy for emotion recognition during the combined condition, but that this increase in accuracy would be smaller in patients relative to controls suggesting impaired utilization of combined audio and visual cues. In line with Thaler et al. (2013), we also predicted that patients would fail to show an improvement in accuracy between the visual only and combined conditions.
Consistent with previous findings regarding visual scanning of facial stimuli, we predicted that healthy controls would spend more time fixating salient features of the face (i.e. eyes and mouth) than patients with schizophrenia in the both the visual only and combined conditions. Extrapolating from work showing that emotional prosodic information modulates and orients attention toward the source of the sound (Brosch et al., 2008, Brosch et al., 2009), we predicted controls would spend more time viewing the mouth in the combined condition relative to the visual only condition, which would reflect efforts to utilize the auditory content of the stimulus. In contrast, we anticipated that visual attention to the mouth by patients would be similar between the visual and combined conditions, demonstrating a failure to modulate viewing patterns based on modality. Similar viewing patterns between conditions by patients could therefore offer a potential mechanistic explanation for previous findings that they do not benefit from multimodal stimulus presentations.
Section snippets
Participants
Thirty-one patients (15 female) who met DSM-IV criteria for schizophrenia (n = 8) or schizoaffective disorder (n = 23) and 30 (15 female) non-clinical control individuals participated. Patients were recruited from Metrocare Services, a non-profit mental health services provider in Dallas County, Texas, community advertisements, and previous participation in our lab. Control participants were recruited from ads posted on Craigslist and from previous lab studies. All participants provided written
Results
Groups did not differ on ethnicity, χ2 = .91, p = .82, gender, χ2 = .02, p = .90, age, t(59) = .10, p = .92, years of education completed, t(59) = 1.14, p = .26, or premorbid IQ as estimated by the WRAT-3 reading subscale, t(59) = .61, p = .55 (Weickert et al., 2000). See Table 1 for participant demographic information.
Consistent with prediction, a significant main effect of group, F(1.59) = 13.03, p = .001, ηp2 = .18, indicated controls were more accurate on the task as a whole. The main effect of condition was also
Discussion
This study assessed emotion recognition abilities in schizophrenia across audio, visual and combined audiovisual modalities, and examined their association with patterns of visual attention. As anticipated, patients showed reduced recognition accuracy; however, this was specific to the audio only and visual only conditions. Contrary to prediction, the combined audiovisual condition improved emotion recognition accuracy more in patients than controls, with the two groups performing comparably in
Role of funding source
N/A.
Contributors
Author 1 (C. Simpson) conceptualized the study, oversaw and completed all statistical analyses, wrote the first draft of the manuscript, and contributed substantially to all subsequent drafts of the manuscript. Author 2 (A. Pinkham) aided in the study design, supervised the project, assisted with statistical analysis, and edited all versions of the manuscript. Author 3 (S. Kelsven) assisted with data collection and preparation and edited drafts of the manuscript. Author 4 (N. Sasson) aided in
Conflicts of interest
All authors report no conflicts of interests.
Acknowledgments
We thank Dr. Diana Robins for kindly providing the DAVE stimuli for our use. We would also like to thank Tom Campbell and Chris Dollaghan for generously sharing their lab space and equipment, and we gratefully acknowledge all of the individuals who participated in the present study.
References (32)
- et al.
A preliminary comparison of flat affect schizophrenics and brain-damaged patients on measures of affective processing
J. Commun. Disord.
(1989) - et al.
Behold the voice of wrath: cross-modal modulation of visual attention by anger prosody
Cognition
(2008) - et al.
Attentional-shaping as a means to improve emotion perception deficits in schizophrenia
Schizophr. Res.
(2008) - et al.
Multisensory integration of emotional faces and voices in schizophrenics
Schizophr. Res.
(2005) - et al.
Audiovisual emotion recognition in schizophrenia: reduced integration of facial and vocal affect
Schizophr. Res.
(2009) - et al.
Facial affect and affective prosody recognition in first-episode schizophrenia
Schizophr. Res.
(2001) - et al.
Emotion recognition via facial expression and affective prosody in schizophrenia: a methodological review
Clin. Psychol. Rev.
(2002) - et al.
The relationship between neurocognition and social cognition with functional outcomes in schizophrenia: a meta-analysis
Neurosci. Biobehav. Rev.
(2011) - et al.
Effects of presentation modality and valence on affect recognition performance in schizophrenia and healthy controls
Psychiatry. Res.
(2009) - et al.
Affect processing in chronically psychotic patients: development of a reliable assessment tool
Schizophr. Res.
(1995)
Impaired recognition and expression of emotional prosody in schizophrenia: review and meta-analysis
Schizophr. Res.
Emotion–cognition interactions in schizophrenia: implicit and explicit effects of facial expression
Neuropsychologia
Schizophrenia and affective disorder show different visual scanning behavior for faces: a trait versus state-based distinction?
Biol. Psychiatry.
Visual scanpaths to positive and negative facial emotions in an outpatient schizophrenia sample
Schizophr. Res.
Elementary neurocognitive function, facial affect recognition and social-skills in schizophrenia
Schizophr. Res.
Superior temporal activation in response to dynamic audio–visual emotional cues
Brain Cogn.
Cited by (24)
The limited effect of neural stimulation on visual attention and social cognition in individuals with schizophrenia
2021, NeuropsychologiaCitation Excerpt :Individuals with schizophrenia demonstrate atypical visual behaviors (Beedie et al., 2011) that may create additional challenges in social situations, attending less to salient social information like facial features (Gordon et al., 1992; Loughland et al., 2002; Nikolaides et al., 2016; Sasson et al., 2016; Williams et al., 2003). Further, aberrant gaze patterns correlate with mentalizing ability (Roux et al., 2014), as more normative visual attention relates to better performance (Simpson et al., 2013). Established cross-sectional correlations between visual attention and social cognition might be explained through cascading deficits, i.e., lower-level deficits preventing appropriate and adaptive evaluation of social cues.
Emotional body language: Social cognition deficits in bipolar disorder
2020, Journal of Affective DisordersThe use of the Duchenne marker and symmetry of the expression in the judgment of smiles in schizophrenia
2017, Psychiatry ResearchCitation Excerpt :Within emotional recognition tasks with individuals with schizophrenia, results are not consistent. For instance, research has shown that participants spent less time in the mouth than controls (Simpson et al., 2013). In contrast, in line with the current data, other studies have shown that participants with schizophrenia tend to use the mouth more than controls for different emotions (Lee et al., 2011).
Beyond emotion recognition deficits: A theory guided analysis of emotion processing in Huntington's disease
2017, Neuroscience and Biobehavioral ReviewsEmotion recognition deficits among persons with schizophrenia: Beyond stimulus complexity level and presentation modality
2016, Psychiatry ResearchCitation Excerpt :This study examined the impact of the modality and the complexity level of stimuli on the emotion recognition difficulties of persons with schizophrenia. As hypothesized and consistent with previous studies (Leitman et al., 2005; Simpson et al., 2013) main effect of group was found, as persons with schizophrenia showed statistically significant lower emotion recognition accuracy when presented with visual, auditory and semantic recognition tasks compared to persons without schizophrenia. Moreover, main effect of modality was found.
Facial, vocal and cross-modal emotion processing in early-onset schizophrenia spectrum disorders
2015, Schizophrenia ResearchCitation Excerpt :Notably, social cognition is thought to be an essential mediator between cognitive impairment and functional disease outcomes (for a review, see Schmidt et al., 2011). Other studies showed that well-established deficits in emotion identification (Simpson et al., 2013; Tseng et al., 2013) were correlated with clinical features such as negative symptoms (Chan et al., 2010; Addington and Addington, 1998; Schneider et al., 1995), illness duration (Savla et al., 2013; Hooker and Park, 2002; Ihnen et al., 1998), social abilities (Hooker and Park, 2002; Ihnen et al., 1998) and functional outcomes (Irani et al., 2012). Only three studies have assessed facial emotion identification in EOS (Habel et al., 2006; Seiferth et al., 2009; Barkl et al., 2014).