Emotion recognition abilities across stimulus modalities in schizophrenia and the role of visual attention

https://doi.org/10.1016/j.schres.2013.09.026Get rights and content

Abstract

Emotion can be expressed by both the voice and face, and previous work suggests that presentation modality may impact emotion recognition performance in individuals with schizophrenia. We investigated the effect of stimulus modality on emotion recognition accuracy and the potential role of visual attention to faces in emotion recognition abilities. Thirty-one patients who met DSM-IV criteria for schizophrenia (n = 8) or schizoaffective disorder (n = 23) and 30 non-clinical control individuals participated. Both groups identified emotional expressions in three different conditions: audio only, visual only, combined audiovisual. In the visual only and combined conditions, time spent visually fixating salient features of the face were recorded. Patients were significantly less accurate than controls in emotion recognition during both the audio and visual only conditions but did not differ from controls on the combined condition. Analysis of visual scanning behaviors demonstrated that patients attended less than healthy individuals to the mouth in the visual condition but did not differ in visual attention to salient facial features in the combined condition, which may in part explain the absence of a deficit for patients in this condition. Collectively, these findings demonstrate that patients benefit from multimodal stimulus presentations of emotion and support hypotheses that visual attention to salient facial features may serve as a mechanism for accurate emotion identification.

Introduction

Impaired emotion recognition is well documented in schizophrenia (Edwards et al., 2002, Hoekert et al., 2007, Pinkham et al., 2007, Kohler et al., 2010) and has been linked to functional outcomes such as poorer community functioning, social skill, and social behavior (Couture et al., 2006, Meyer and Kurtz, 2009, Fett et al., 2011). Although research on emotion recognition in schizophrenia has primarily focused on visual processing of emotional information (e.g. facial emotion recognition), a number of studies have investigated emotional prosody recognition, or the ability to accurately identify the emotional content of spoken words. Effect size estimates indicate patients show greater impairment in affective prosody recognition (d = 1.24) than facial affect recognition (d = .91) (Hoekert et al., 2007, Kohler et al., 2010), suggesting that modality of stimulus presentation may impact emotion recognition abilities.

To date, only a few studies have directly compared emotion recognition abilities across modalities. Early studies provided mixed results for advantages between auditory and visual presentations (Borod et al., 1989, Whittaker et al., 1994, Haskins et al., 1995, Edwards et al., 2001); however, these studies used different tasks for each modality, which introduces methodological confounds. To address this limitation, Fiszdon and Bell (2009) compared performance on an audio only version of the Bell Lysaker Emotion Recognition Task (BLERT) to a multichannel version which included both audio and visual cues. Whereas both patients and controls were more accurate in the multichannel condition, patients benefitted less from the addition of visual information than controls, which is consistent with evidence of multisensory integration abnormalities in individuals with schizophrenia (de Gelder et al., 2002, de Jong et al., 2009). In a similar paradigm, Thaler et al. (2013) presented individuals with audio only, visual only, and audiovisual versions of the BLERT. Again, patients were more accurate for the audiovisual condition relative to the audio only condition, but patients performed comparably between the visual only and audiovisual conditions. Thus, improved performance in patients appeared to be due to the presence of visual information with no additional benefit for concurrently presented audio and visual information. This pattern of performance is also consistent with work investigating multisensory integration in schizophrenia and particularly with a hypothesis that emerges from this work, namely that visual inputs may show dominance over auditory inputs for patients (de Gelder et al., 2005).

Impaired multisensory integration and an over-reliance on the visual modality may in part explain why patients do not show the expected level of benefit for the audiovisual modality. An important consideration, however, relates to how visually presented emotional information is processed by patients. Several studies report restricted visual scanpaths of faces in schizophrenia, and although these studies have used only static visual stimuli lacking an audio component, they have generally found reduced attention to core facial features such as the eyes and mouth in patients versus healthy controls (Loughland et al., 2002a, Loughland et al., 2002b). Thus, if patients prioritize visual information regardless of whether audio information is available and fail to attend to the most relevant portions of the face, this may explain why they perform similarly between visual and audiovisual modalities and why they continue to perform more poorly than controls. The use of eye tracking during visual and audiovisual emotion recognition conditions could provide important information on the contribution of visual attention to performance across modalities.

The current study used dynamic audio-visual stimuli to create a multimodal emotion recognition task presented to participants under three conditions: audio only, video only, combined audio and video. Eye tracking was utilized to investigate differences in visual scanpaths during both video conditions.

Consistent with previous work, we hypothesized patients with schizophrenia would be less accurate at emotion recognition than matched healthy controls on all three conditions. Additionally, consistent with Fiszdon and Bell (2009), we anticipated both groups would have increased accuracy for emotion recognition during the combined condition, but that this increase in accuracy would be smaller in patients relative to controls suggesting impaired utilization of combined audio and visual cues. In line with Thaler et al. (2013), we also predicted that patients would fail to show an improvement in accuracy between the visual only and combined conditions.

Consistent with previous findings regarding visual scanning of facial stimuli, we predicted that healthy controls would spend more time fixating salient features of the face (i.e. eyes and mouth) than patients with schizophrenia in the both the visual only and combined conditions. Extrapolating from work showing that emotional prosodic information modulates and orients attention toward the source of the sound (Brosch et al., 2008, Brosch et al., 2009), we predicted controls would spend more time viewing the mouth in the combined condition relative to the visual only condition, which would reflect efforts to utilize the auditory content of the stimulus. In contrast, we anticipated that visual attention to the mouth by patients would be similar between the visual and combined conditions, demonstrating a failure to modulate viewing patterns based on modality. Similar viewing patterns between conditions by patients could therefore offer a potential mechanistic explanation for previous findings that they do not benefit from multimodal stimulus presentations.

Section snippets

Participants

Thirty-one patients (15 female) who met DSM-IV criteria for schizophrenia (n = 8) or schizoaffective disorder (n = 23) and 30 (15 female) non-clinical control individuals participated. Patients were recruited from Metrocare Services, a non-profit mental health services provider in Dallas County, Texas, community advertisements, and previous participation in our lab. Control participants were recruited from ads posted on Craigslist and from previous lab studies. All participants provided written

Results

Groups did not differ on ethnicity, χ2 = .91, p = .82, gender, χ2 = .02, p = .90, age, t(59) = .10, p = .92, years of education completed, t(59) = 1.14, p = .26, or premorbid IQ as estimated by the WRAT-3 reading subscale, t(59) = .61, p = .55 (Weickert et al., 2000). See Table 1 for participant demographic information.

Consistent with prediction, a significant main effect of group, F(1.59) = 13.03, p = .001, ηp2 = .18, indicated controls were more accurate on the task as a whole. The main effect of condition was also

Discussion

This study assessed emotion recognition abilities in schizophrenia across audio, visual and combined audiovisual modalities, and examined their association with patterns of visual attention. As anticipated, patients showed reduced recognition accuracy; however, this was specific to the audio only and visual only conditions. Contrary to prediction, the combined audiovisual condition improved emotion recognition accuracy more in patients than controls, with the two groups performing comparably in

Role of funding source

N/A.

Contributors

Author 1 (C. Simpson) conceptualized the study, oversaw and completed all statistical analyses, wrote the first draft of the manuscript, and contributed substantially to all subsequent drafts of the manuscript. Author 2 (A. Pinkham) aided in the study design, supervised the project, assisted with statistical analysis, and edited all versions of the manuscript. Author 3 (S. Kelsven) assisted with data collection and preparation and edited drafts of the manuscript. Author 4 (N. Sasson) aided in

Conflicts of interest

All authors report no conflicts of interests.

Acknowledgments

We thank Dr. Diana Robins for kindly providing the DAVE stimuli for our use. We would also like to thank Tom Campbell and Chris Dollaghan for generously sharing their lab space and equipment, and we gratefully acknowledge all of the individuals who participated in the present study.

References (32)

Cited by (24)

  • The limited effect of neural stimulation on visual attention and social cognition in individuals with schizophrenia

    2021, Neuropsychologia
    Citation Excerpt :

    Individuals with schizophrenia demonstrate atypical visual behaviors (Beedie et al., 2011) that may create additional challenges in social situations, attending less to salient social information like facial features (Gordon et al., 1992; Loughland et al., 2002; Nikolaides et al., 2016; Sasson et al., 2016; Williams et al., 2003). Further, aberrant gaze patterns correlate with mentalizing ability (Roux et al., 2014), as more normative visual attention relates to better performance (Simpson et al., 2013). Established cross-sectional correlations between visual attention and social cognition might be explained through cascading deficits, i.e., lower-level deficits preventing appropriate and adaptive evaluation of social cues.

  • The use of the Duchenne marker and symmetry of the expression in the judgment of smiles in schizophrenia

    2017, Psychiatry Research
    Citation Excerpt :

    Within emotional recognition tasks with individuals with schizophrenia, results are not consistent. For instance, research has shown that participants spent less time in the mouth than controls (Simpson et al., 2013). In contrast, in line with the current data, other studies have shown that participants with schizophrenia tend to use the mouth more than controls for different emotions (Lee et al., 2011).

  • Emotion recognition deficits among persons with schizophrenia: Beyond stimulus complexity level and presentation modality

    2016, Psychiatry Research
    Citation Excerpt :

    This study examined the impact of the modality and the complexity level of stimuli on the emotion recognition difficulties of persons with schizophrenia. As hypothesized and consistent with previous studies (Leitman et al., 2005; Simpson et al., 2013) main effect of group was found, as persons with schizophrenia showed statistically significant lower emotion recognition accuracy when presented with visual, auditory and semantic recognition tasks compared to persons without schizophrenia. Moreover, main effect of modality was found.

  • Facial, vocal and cross-modal emotion processing in early-onset schizophrenia spectrum disorders

    2015, Schizophrenia Research
    Citation Excerpt :

    Notably, social cognition is thought to be an essential mediator between cognitive impairment and functional disease outcomes (for a review, see Schmidt et al., 2011). Other studies showed that well-established deficits in emotion identification (Simpson et al., 2013; Tseng et al., 2013) were correlated with clinical features such as negative symptoms (Chan et al., 2010; Addington and Addington, 1998; Schneider et al., 1995), illness duration (Savla et al., 2013; Hooker and Park, 2002; Ihnen et al., 1998), social abilities (Hooker and Park, 2002; Ihnen et al., 1998) and functional outcomes (Irani et al., 2012). Only three studies have assessed facial emotion identification in EOS (Habel et al., 2006; Seiferth et al., 2009; Barkl et al., 2014).

View all citing articles on Scopus
View full text