The first thing Spitzer did to reform the DSM was to assemble a team of fifteen psychiatrists to help him write the new manual. This team was called the DSM Taskforce, and Spitzer was its outright leader. So in the mid-1970s the Taskforce set about writing a kind of New Testament for psychiatry: a book that aspired to improve the uniformity and reliability of psychiatric diagnosis in the wake of all its previous failings. If this sounds all very intrepid, well, that’s pretty much what it was. Spitzer’s Taskforce promised a new deal for psychiatry, and there was a lot of pressure on them to deliver.
[...]
Finally, to help improve diagnostic reliability further, Spitzer’s team created criteria for each disorder that a patient had to meet in order to warrant the diagnosis. So while, for example, there are multiple symptoms associated with depression, it was somehow decided that a patient would need to have at least five of them for a period of at least two weeks to qualify for receiving the diagnosis of depression. The only problem was: on what grounds did Spitzer’s team decide that if you have five symptoms for two weeks you suffered from a depressive disorder? Why didn’t they choose six symptoms for three weeks or three symptoms for five weeks? What was the science that justified putting the line where Spitzer’s team chose to draw it? In an interview in 2010, the psychiatrist Daniel Carlat asked Spitzer this very question:
Carlat: How did you decide on five criteria as being your minimum threshold for depression?
Spitzer: It was just consensus. We would ask clinicians and researchers, ‘How many symptoms do you think patients ought to have before you would give them the diagnosis of depression?’, and we came up with the arbitrary number of five.
Carlat: But why did you choose five and not four? Or why didn’t you choose six?
Spitzer: Because four just seemed like not enough. And six seemed like too much [Spitzer smiles mischievously].
Carlat: But weren’t there any studies done to establish the threshold?
Spitzer: We did reviews of the literature, and in some cases we received funding from NIMH to do field trials … [However] when you do field trials in depression and other disorders, there is no sharp dividing line where you can confidently say, ‘This is the perfect number of symptoms needed to make a diagnosis’ … It would be nice if we had a biological gold standard, but that doesn’t exist, because we don’t understand the neurobiology of depression.
[...]
Once we had settled in our chairs, the first question I had for Spitzer concerned one of the other major changes he introduced into the DSM. What I didn’t mention in the last chapter is that while he created a new checklist system and sharpened the definitions for each disorder, he also introduced over 80 new disorders, effectively expanding the DSM from 182 disorders (DSM-II) to 265 (DSM-III). ‘So what’, I asked Spitzer, ‘was the rationale for this huge expansion?’
‘The disorders we included weren’t really new to the field’, answered Spitzer confidently. ‘They were mainly diagnoses that clinicians used in practice but which weren’t recognised by the DSM or the ICD. There were many examples: borderline personality disorder was one, and so was post-traumatic stress disorder. There were no categories for these disorders prior to DSM-III. So by including them we gave them professional recognition.’
‘So presumably’, I asked, ‘these disorders had been discovered in a biological sense? That’s why they were included, right?’
‘No – not at all’, Spitzer said matter-of-factly. ‘There are only a handful of mental disorders in the DSM known to have a clear biological cause. These are known as the organic disorders [things like epilepsy, Alzheimer’s and Huntington’s disease]. These are few and far between.’
‘So, let me get this clear’, I pressed, ‘there are no discovered biological causes for many of the remaining mental disorders in the DSM?’
‘It’s not for many, it’s for any! No biological markers have been identified.’
[...]
This verdict comes from one of the leading lights on Spitzer’s Taskforce, Dr Theodore Millon. Here’s what he said about the DSM’s construction:
There was very little systematic research, and much of the research that existed was really a hodgepodge – scattered, inconsistent, and ambiguous. I think the majority of us recognized that the amount of good, solid science upon which we were making our decisions was pretty modest.4
Once I’d read this quote to Spitzer, I asked him whether he agreed with Millon’s statement. After a short and somewhat uncomfortable silence, Spitzer responded in a way I didn’t expect:
‘Well, it’s true that for many of the disorders that were added, there wasn’t a tremendous amount of research, and certainly there wasn’t research on the particular way that we defined these disorders. In the case of Millon’s quote, I think he is mainly referring to the personality disorders … But again, it is certainly true that the amount of research validating data on most psychiatric disorders is very limited indeed.’
Trying not to look shocked, I continued: ‘So you’re saying that there was little research not only supporting your inclusion of new disorders, but also supporting how these disorders should be defined?’
‘There are very few disorders whose definition was a result of specific research data’, responded Spitzer. ‘For borderline personality disorder there was some research that looked at different ways of defining the disorder. And we chose the definition that seemed to be the most valid. But for the other categories rarely could you say that there was research literature supporting the definition’s validity.
[...]
Spitzer’s admission so surprised me that I decided to check it with other members of his Taskforce. So on a rainy English Monday I called Professor Donald Klein in his New York office to ask whether he agreed with Spitzer’s account of events. Klein had been a leader on the Taskforce, and so was at the heart of everything that went on.
‘Sure, we had very little in the way of data’, Klein confirmed through a crackling phone line, ‘so we were forced to rely on clinical consensus, which, admittedly, is a very poor way to do things. But it was better than anything else we had.’
‘So without data to guide you’, I nudged carefully, ‘how was this consensus reached?’
‘We thrashed it out, basically. We had a three-hour argument. There would be about twelve people sitting down at the table, usually there was a chairperson and there was somebody taking notes. And at the end of each meeting there would be a distribution of events. And at the next meeting some would agree with the inclusion, and the others would continue arguing. If people were still divided, the matter would be eventually decided by a vote.’
‘A vote, really?’ I asked, trying to conceal that I hardly felt reassured.
‘Sure, that is how it went.’
[...]
Garfinkel then gave me a concrete example of how far down the scale of intellectual respectability she felt those meetings could sometimes fall. ‘On one occasion I was sitting in on a Taskforce meeting, and there was a discussion about whether a particular behaviour should be classed as a symptom of a particular disorder. And as the conversation went on, to my great astonishment one Taskforce member suddenly piped up, “Oh no, no, we can’t include that behaviour as a symptom, because I do that!” And so it was decided that that behaviour would not be included because, presumably, if someone on the Taskforce does it, it must be perfectly normal.’