Together for the Charter: Author Misreads Expert Re Crowds?

xxxxxxxx
Below is just one of the controversies I came across in search of experiments replicating Galton's experience which so far has found one reported Dutch example.
SEE: http://bbccharter.blogspot.co.uk/2012/07/extract-dutch-experiment.html

Essentially the below concludes that Galton's experiment was as described. However there is no reference to any replication of the experiment to confirm the theory.

http://www.overcomingbias.com/2007/10/author-misreads.html

By Robin Hanson · October 5, 2007 6:00 am · 19 Comments · « Prev · Next »

James Surowiecki starts his book The Wisdom of Crowds telling how Francis Galton in 1907 used a crowd to guess an ox’s weight:

Galton borrowed the tickets from the organizers and ran a series of statistical tests on them Galton arranged the guesses (which totaled 787 in all, after he had to discard thirteen because they were illegible) in order from highest to lowest and graphed them to see if they would form a bell curve. Then, among other things, he added all the contestants’ estimates, and calculated the mean of the group’s guesses. That number represented, you could say, the collective wisdom of the Plymouth crowd. If the crowd were a single person, that was how much it would have guessed the ox weighed. … The crowd has guessed that the ox, after it had been slaughtered and dressed would weigh 1,197 pounds. After it had been slaughtered and dressed, the ox weighted 1,198.

David Levy and Sandra Peart say Surowiecki got it all wrong. Galton did not [AT FIRST] even bother to calculate a mean, as he saw his data was clearly not normally distributed. He used the median (of 1207), which was much further off than the mean, but by modern standards clearly the better estimator.
It was Karl Pearson in 1924 who calculated the mean.
(Line crossed out, and [clarifier] added later.)

This description of what Galton did with the guesses misrepresents what Galton actually did. Galton was clear that the distribution of guesses was not normal, writing that "The abnormality of the distribution of the estimates now becomes manifest, …” (Galton 1907b, p. 451). Surowieki has replaced Galton’s statement with the claim that Galton "graphed them [the guesses] to see if they would form a bell curve" – allowing the remaining possibility that the guesses might be normal. Galton’s principled opposition to the mean as the voice of the people, which Pearson supplemented by the use of the mean, is now described as Galton’s use of the mean. Finally, the reported estimate of the vox populi has been changed from 1207 to 1197.

Several authors, "Sunstein (2006, p. 24), Solomon (2006), Caplan (2007, p. 8)", copied Surowiecki’s
error
[VERSION], and several recent papers have argued about how close prediction market prices are to mean beliefs. The original Galton paper can be found reprinted in Levy and Peart’s book Vanity of the Philosopher and online.

Added: In the comments, Surowiecki says Levy and Peart are very wrong: Galton did too mention the mean, when responding a few weeks later to a letter that mentioned the mean. He cited this letter in his book footnotes. Hopefully we can get Levy and Peart to respond.

Added: Here are Surowiecki’s comments and Levy and Peart’s responses in full:

James Surowiecki’s first comment:

"Galton did not even bother to calculate a mean, as he saw his data was clearly not normally distributed. He used the median (of 1207), which was much further off than the mean, but by modern standards clearly the better estimator. It was Karl Pearson in 1924 who calculated the mean."

Robin, before repeating falsehoods, you might want to go back to the original sources — or, in this case, to the footnotes to my book. Galton did, in fact, calculate the mean, long before Karl Pearson did. Galton’s calculation appeared in Nature, Vol. 35, No. 1952 (3/28/07), in a response to letters regarding his original article. One of the correspondents had gone ahead and calculated a mean from the data that Galton had provided in his original piece, and had come up with the number 1196. Galton writes, "he makes it [the mean] 1196 lb. . . . whereas it should have been 1197 lb."

I find the fact that Levy and Peart wrote an entire article about Galton (and, to a lesser extent, about my use of him), and never went back and checked the original sources is astounding in its own right. (They actually wonder in the paper, "However the new estimate of location came to be part of Surowieki’s account," as if the answer isn’t listed right there in the footnotes.) What makes it even more astounding, though, is that they’ve written an entire paper about the diffusion of errors by experts who "pass along false information (wittingly or unwittingly)" while passing along false information themselves.

It also seems bizarre that Levy and Peart caution, "The expectation of being careful
seems to substitute for actually being careful," and yet they were somehow unable to figure out how to spell "Surowiecki" correctly. The article is a parody of itself.

I’m happy to enter into a discussion of whether the median or the mean should be used in aggregating the wisdom of crowds. But whether Galton himself thought the mean or the median was better was and is irrelevant to the argument of my book. I was interested in the story of the ox-weighing competition because it captures, in a single example, just how powerful group judgments can be. Galton did calculate the mean. It was 1197 lbs., and it was 1 lb. away from the actual weight of the ox. The only "falsehood" being perpetrated here are the ones Levy and Peart are putting out there, and the ones that you uncritically reprinted.

James Surowiecki’s second comment:

Here are the links for the letter from Galton, where he reports the mean:

http://galton.org/cgi-bin/searchImages/galton/search/essays/pages/galton-1907-ballot-box_1.htm
http://galton.org/cgi-bin/searchImages/galton/search/essays/pages/galton-1907-ballot-box_2.htm

There’s no reason for debate here. Levy and Peart say "Pearson’s retelling of the ox judging tale apparently served as a starting point for the 2004 popular account of the modern economics of information aggregation, James Surowieki’s Wisdom of Crowds." It wasn’t the starting point. The starting point was Galton’s own experiment, and his own reporting of the mean in "The Ballot Box." Robin writes: "Galton did not even bother to calculate a mean." He did calculate it, and he did report it. This fact shouldn’t be listed as an "addendum" to the original post. The original post should be rewritten completely — perhaps along the lines of "Surowiecki and Galton disagree about which estimate is a better representation of group judgment" rather than "Author Misreads Expert" — or else scrapped.

David Levy and Sandra Peart respond (by email):

Surowiecki is correct that Galton reports the mean in his letter to Nature of March 28, 1907. He reports it there in response to a query. And that letter toNature is in the references to the Wisdom of Crowds which (ironically, in a note about carefulness and checking) we did not check. Pearson required both the mean and the standard deviation to compute the calibrating normal. So, he needed to do the recomputations. Our next version will clarify this with thanks to Surowiecki, who has rightly made the point that Galton reported the mean.

We can now focus on what the larger point; that the account which reports Galton’s mean (but not his defense of the median) leads to a conflation of what Galton defended with what we may wish him to defend, the mean. When people quote Galton through Surowiecki, they tell Surowiecki’s tale, not Galton’s. Though Galton reported the mean in response to a question, he did not defend the use of the mean or use it in his report of the ox tale either before or afterwards.

Here are the results and the conclusion in the original Vox Populiarticle.

the middlemost estimate expresses the vox populi, every other estimate being condemned as too low or too high by a majority of the voters (for fuller explanation see "One Vote, One Value," Nature, February 28, p. 414). Now the middlemost estimate is 1207 lb., and the weight of the dressed ox proved to be 1198 lb.; so the vox populi was in this case 9 lb., or 1 per cent, of the whole weight too high. …. (p. 450)

This result is, I think, more creditable to the trust-worthiness of a democratic judgment than might have been expected. (P. 451).

This conclusion is reproduced in the later Memories and is quoted by Surowiecki (p. xiii). Here is the conflation of what Galton did what Surowiecki evidently thinks he should have done.

The crowd had guessed that the ox, after it had been slaughtered and dressed, would weigh 1,197 pounds. After it had been slaughtered and dressed, the ox weighted 1,198 pounds. In other words, the crowd’s judgment was essentially perfect. Perhaps breeding did not mean so much after all. Galton wrote later: "The result seems to creditable to the trustworthiness of a democratic judgment than might have been expected." That was, to say the least an understatement.

Here’s the "Ballot Box" where Galton defends the median on 1) the basis of democratic theory and 2) as a way to bound the influence of the estimate. After the defense he reports the sample mean.

Mr. Hooker, in Nature of March 21, seems not to have quite appreciated my principal contention in the letters "One Vote, One Value" and "Vox Populi" of February 28 and March 7 respectively. It was to show that the verdict given by the baliot-box must be the Median estimate, because every other estimate is condemned in advance by a majority of the voters. This being the case, I examined the votes in a particular instance according to the most appropriate method for dealing with medians, quantiles, &c. I had no intention of trespassing into, the technical and much-discussed question of the relative merits of the Median and of the several kinds of Mean, and beg to be excused from not doing so now except in two particulars. First, that it may not be sufficiently realised that the suppression of any one value in a series can only make the difference of one half-place to the median, whereas if the series be small it may make a great difference to the mean ; consequently, 1 think my proposal that juries should openly adopt the median when estimating damages, and councils when estimating money grants, has independent merits of its own, besides being in strict accordance with the true theory of the ballot-box. Secondly, Mr. Hooker’s approximate calculation from my scanty list of figures, of what the mean would be of all the figures, proves to be singularly correct; he makes it 1196 lb. … whereas it should have been 1197 lb.

Did Galton change his mind? Here’s the 1908 account in the Memories, 280-1 in which the vox populi clearly the median. The same concern with outliers is found. The mean is nowhere in sight.

A little more than a year ago, I happened to be at Plymouth, and was interested in a Cattle exhibition, where a visitor could purchase a stamped and numbered ticket for sixpence, which qualified him to become a candidate in a weight-judging competition. An ox was selected, and each of about eight hundred candidates wrote his name and address on his ticket, together with his estimate of what the beast would weigh when killed and "dressed" by the butcher. The most successful of them gained prizes. The result of these estimates was analogous, under reservation, to the votes given by a democracy, and it seemed likely to be instructive to learn how votes were distributed on this occasion, and the value of the result. So I procured a loan of the cards after the ceremony was past, and worked them out in a memoir published in Nature [176-7]. It appeared that in this the vox populi was correct to within 1 per cent. of the real value; it was 1207 pounds instead of 1198 pounds, and the individual estimates were distributed in such a way that it was an equal chance whether one of them selected at random fell within or without the limits of -3.7 per cent, or +2.4 per cent of the middlemost value of the whole.

The result seems more creditable to the trustworthiness of a democratic judgment than might have been expected. But the proportion of the voters who were practised in judging weights undoubtedly surpassed that of the voters in ordinary elections who are versed in politics.

I endeavoured in the memoirs just mentioned, to show the appropriateness of utilising the Median vote in Councils and injuries, whenever they have to consider money questions. Each juryman has his own view of what the sum should be. I will suppose each of them to be written down. The best interpretation of their collective view is to my mind certainly not the average, because the wider the deviation of an individual member from the average of the rest, the more largely would it effect the result In short, unwisdom is given greater weight than wisdom. In all cases in which one vote is supposed to have one value, the median value must be the truest representative of the whole, because any other value would be negatived if put to the vote. If it were more than the median, more than half of the voters would think it too much; if less, too little.

We were in error not to check all of Surowiecki’s citations. The result he reported is something which Galton computed. On this important issue he is right, we were wrong. But our larger point remains: that Galton defends the use of the median and attacks the use of the mean for the basis of democratic judgment in his first and his last words on the subject. Indeed, in the letter in which he reports the mean, he defends the use of the median for juries and councils when they are making decisions involving money.

James Surowiecki’s third comment:

I appreciate Levy and Peart admitting their mistake. But they seem not to recognize that their mistake undermines the critique that’s at the center of their paper. Their paper, they write, is about the misconstruing of Galton’s experiment. "A key question," they write, "is whether the tale was changed deliberately (falsified) or whether, not knowing the truth, the retold (and different) tale was passed on unwittingly." But the account of Galton’s experiment was not changed deliberately and was not falsified. It was recounted accurately. Levy and Peart want to use my retelling of the Galton story as evidence of how "experts pass along false information
(wittingly or unwittingly) [and] become part of a process by which errors are diffused." But there’s no false information here, and no diffusion of errors, which rather demolishes their thesis. If they really want to write a paper about how "experts" pass along false information, they’d be better off using themselves as Exhibit A, and tell the story of how they managed to publish such incredibly shoddy work and have prominent economists uncritically link to it.

James Surowiecki’s forth comment:

David Levy responds:

The paper has been taken down at Adam Smith Lives for rethinking. We offer our apologies to James Surowiecki.

http://adamsmithlives.blogs.com/thoughts/2007/10/experts-and-i-1.html

One paragraph which will go into the next version is this:

One of Galton’s defenses for the sample median as the vox populi was it that bounds the influence of any individual voter. Replication and checking of the work of experts may be a way to bound the influence of experts. It is important for reader to know that in an earlier version we denied the existence of Galton’s mean. This emphasizes the importance of replication and competition precisely to bound the influence of such error

Here’s what we are prepared to defend :

The majority-rule context of Galton’s publications is lost when the sample median, upon which Galton put such stress, is no longer reported.

and responds more:

Dear James,

I have had time to reflect and now I
would like to offer a more detailed personal apology than what we’ve
jointly posted before. When I failed to find Galton’s mean, in spite of
your sufficient directions, I should have asked you directly for help.
From these two failures of mine, and because Sandy trusted my work, we
were led to the wrong conclusion that your account of Galton’s mean was
false instead of the right conclusion that your account was simply
different than our accounts of Galton’s median. If the
accounts are merely different then we have many ways of asking which of
the two estimators one might prefer. We began that helpful exercise. We
did not stop there. When we said that your account was false, and asked
a rhetorical question of how this came to be, we called into question
my own intentions. We also wrongly called into question the care which
scholars took in citing your work.

For all this, again, I offer a personal apology.

Together for the Charter

Sunday 1 July 2012

Author Misreads Expert Re Crowds?

No comments:

Post a Comment