Interrater reliability of schizoaffective disorder compared with schizophrenia, bipolar disorder, and unipolar depression – A systematic review and meta-analysis

https://doi.org/10.1016/j.schres.2016.07.012Get rights and content

Abstract

Schizoaffective disorder is a common diagnosis in clinical practice but its nosological status has been subject to debate ever since it was conceptualized. Although it is key that diagnostic reliability is sufficient, schizoaffective disorder has been reported to have low interrater reliability. Evidence based on systematic review and meta-analysis methods, however, is lacking. Using a highly sensitive literature search in Medline, Embase, and PsycInfo we identified studies measuring the interrater reliability of schizoaffective disorder in comparison to schizophrenia, bipolar disorder, and unipolar disorder. Out of 4126 records screened we included 25 studies reporting on 7912 patients diagnosed by different raters. The interrater reliability of schizoaffective disorder was moderate (meta-analytic estimate of Cohen's kappa 0.57 [95% CI: 0.41–0.73]), and substantially lower than that of its main differential diagnoses (difference in kappa between 0.22 and 0.19). Although there was considerable heterogeneity, analyses revealed that the interrater reliability of schizoaffective disorder was consistently lower in the overwhelming majority of studies. The results remained robust in subgroup and sensitivity analyses (e.g., diagnostic manual used) as well as in meta-regressions (e.g., publication year) and analyses of publication bias. Clinically, the results highlight the particular importance of diagnostic re-evaluation in patients diagnosed with schizoaffective disorder. They also quantify a widely held clinical impression of lower interrater reliability and agree with earlier meta-analysis reporting low test-retest reliability.

Introduction

Schizoaffective disorder is a prevalent diagnosis in both clinical and epidemiological samples. For example, in an Australian epidemiological survey, 16.1% of all patients screened positive for psychosis eventually received a diagnosis of schizoaffective disorder (Morgan et al., 2012), and a European population based study estimated its prevalence to be 1.1% (Scully et al., 2004). Also, a study of Medicaid claims found almost half as many patients with diagnoses of schizoaffective disorder as patients with schizophrenia diagnoses (42%) (Olfson et al., 2009).

Despite its prevalence, the diagnosis of schizoaffective disorder has been critically debated for decades. Some authors recommend abandoning the diagnosis entirely (Lake and Hurwitz, 2007, Maier, 2006, Malhi et al., 2008), whereas others emphasize its usefulness (Marneros, 2007). The recent revision of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) retained schizoaffective disorder as a diagnostic entity (American Psychiatric Association, 2013), and ICD-11, scheduled to appear in 2017, will also provide diagnostic criteria for schizoaffective disorder (Gaebel, 2012).

Diagnostic manuals aim at improving the reliability of diagnoses, a key issue for clinical practice and research and a long-standing problem in psychiatry. In an earlier meta-analysis, we have shown that test-retest reliability of schizoaffective disorder is moderate and consistently, statistically significantly and to a considerable extent lower than the test-retest reliability of its main differential diagnoses schizophrenia, bipolar disorder, and unipolar disorder (Santelmann et al., 2015). We are, however, not aware of any systematic and quantitative attempt at summarizing the interrater reliability of schizoaffective disorder. From a clinical viewpoint interrater reliability is particularly consequential because it measures to what degree two doctors use the same diagnosis for the same patient.

As a consequence, we conducted a systematic review and meta-analysis of studies investigating the interrater reliability of schizoaffective disorder relative to other functional psychoses. We hypothesized lower interrater reliability in schizoaffective disorder than in schizophrenia, bipolar disorder, and unipolar depression.

Section snippets

Methods

This is a systematic review and meta-analysis of diagnostic interrater reliability studies comparing schizoaffective disorder with schizophrenia, bipolar disorder, and unipolar depression. The analysis is part of a research project on the diagnostic reliability of schizoaffective disorder (registered on PROSPERO: CRD42013006713; www.crd.york.ac.uk/prospero). Earlier results have been published on the test-retest reliability (Santelmann et al., 2015) and on the diagnostic shift seen in patients

Results

Out of 4126 articles screened, 345 were assessed for eligibility at full-text level, and 23 articles on 25 studies were included in the analysis (see PRISMA flowchart in Fig. 1). The studies were published between 1974 and 2012 and, in total, reported on 7912 patients (range: 24–3493; Median: 100). Table 1 provides a breakdown of the key characteristics of the 25 studies included.

Discussion

We consider three results of this study as particularly important: First, with an estimated kappa of 0.57 the interrater reliability of schizoaffective disorder is only moderate according to the interpretation of Landis and Koch (1977). Second, in direct comparisons, the interrater reliability of schizoaffective disorder turned out to be substantially lower than that of schizophrenia, bipolar disorder, and unipolar depression. Third, while heterogeneity measures showed high levels of

Role of funding source

No funding body agreements.

Contributors

C. Baethge had the idea for this research and supervised it all way through. H. Santelmann carried out the literature search and screened all relevant abstracts. H. Santelmann and J. Bußhoff independently of each other extracted data from relevant full text articles. J. Franklin advised in the statistical analysis of this paper which was carried out by H. Santelmann. C. Baethge and H. Santelmann drafted this paper which was corrected by J. Franklin and J. Bußhoff.

Conflict of interest

The authors reported no conflict of interest with respect to this work.

References (56)

  • N.C. Andreasen et al.

    The Comprehensive Assessment of Symptoms and History (CASH). An instrument for assessing diagnosis and psychopathology

    Arch. Gen. Psychiatry

    (1992)
  • C. Baethge

    Long-term treatment of schizoaffective disorder: review and recommendations

    Pharmacopsychiatry

    (2003)
  • C. Baethge et al.

    Substantial agreement of referee recommendations at a general medical journal–a peer review evaluation at Deutsches Arzteblatt International

    PLoS One

    (2013)
  • W.L. Baker et al.

    Understanding heterogeneity in meta-analysis: the role of meta-regression

    Int. J. Clin. Pract.

    (2009)
  • I.F. Brockington et al.

    Definitions of depression: concordance and prediction of outcome

    Am. J. Psychiatry

    (1982)
  • T. Bronisch et al.

    Diagnostic reliability and validity of the PSE CATEGO-system

    Arch. Psychiatr. Nervenkr.

    (1982)
  • A.G. Cardno et al.

    A twin study of schizoaffective-mania, schizoaffective-depression, and other psychotic syndromes

    Am. J. Med. Genet. B Neuropsychiatr. Genet.

    (2012)
  • E. Cheniaux et al.

    The diagnoses of schizophrenia, schizoaffective disorder, bipolar disorder and unipolar depression: interrater reliability and congruence between DSM-IV and ICD-10

    Psychopathology

    (2009)
  • J. Cohen

    A coefficient of agreement for nominal scales

    Educ. Psychol. Meas.

    (1960)
  • S. Duval et al.

    Trim and fill: a simple funnel-plot–based method of testing and adjusting for publication bias in meta-analysis

    Biometrics

    (2000)
  • M. Egger et al.

    Bias in meta-analysis detected by a simple, graphical test

    BMJ

    (1997)
  • O. Esan

    DSM-5 schizoaffective disorder: will clinical utility be enhanced?

    Soc. Psychiatry Psychiatr. Epidemiol.

    (2015)
  • M. Flaum et al.

    DSM-IV field trial for schizophrenia and other psychotic disorders

  • J.L. Fleiss

    Measuring nominal scale agreement among many raters

    Psychol. Bull.

    (1971)
  • H.J. Freyberger et al.

    Testing ICD-10: results of a multicentric field trial in German speaking countries

    Nervenarzt

    (1990)
  • P. Fusar-Poli et al.

    Diagnostic stability of ICD/DSM first episode psychosis diagnoses: meta-analysis

    Schizophr. Bull.

    (2016)
  • W. Gaebel

    Status of psychotic disorders in ICD-11

    Schizophr. Bull.

    (2012)
  • K.L. Gwet

    Computing inter-rater reliability and its variance in the presence of high agreement

    Br. J. Math. Stat. Psychol.

    (2008)
  • Cited by (21)

    • Bipolar disorder diagnostic stability: a Portuguese multicentric study

      2020, Psychiatry Research
      Citation Excerpt :

      Secondly, when comorbid, it can be a confounding factor of diagnosis, hence contributing to diagnostic errors (McIntyre et al., 2019) – a fact reflected in DSM-5, where mood episodes due to substances are considered exclusion criteria for BD (American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. Fifth Edition. DSM-5, 2013). There is a scarcity of research focusing on psychiatric diagnoses stability, and, specifically, on BD diagnostic stability (Santelmann et al., 2016). While studies can be found evaluating interrater reliability of BD diagnosis (Santelmann et al., 2016, 2015), the assessment of BD diagnostic stability over time is rare (Cegla-Schvartzman et al., 2019).

    • Steeper aging-related declines in cognitive control processes among adults with bipolar disorders

      2019, Journal of Affective Disorders
      Citation Excerpt :

      First, psychiatric diagnoses, including BD, were based on clinical interview and review of available medical records in the context of a clinical neuropsychological evaluation and not based on structured diagnostic interviews typical of research protocols. However, past studies support good reliability of BD diagnoses across diagnostic methods—from electronic-record text mining to non-structured clinical interviews (Castro et al., 2015; Regier et al., 2013; Santelmann et al., 2016). This real-world clinical assessment procedure possibly makes the present findings more generalizable to other clinical BD populations.

    View all citing articles on Scopus
    View full text