By Tim Redmond
Wow, this thing got a lot of attention -- I think it shows how much fascination the world has with our lame, incompetant and famous governor. Check out the comments and you'll notice something else: The minute Matier and Ross on sfgate picked this up, the right-wing nuts started weighing in, which makes you wonder (or not wonder) who exactly reads the San Francisco Chronicle.
At any rate, Supervisor David Chiu has done the math and concludes that it's highly unlikely this was a mistake:
Assuming it was real, I calculated the probability that this is pure chance. Assuming it's a 1/26 chance for each particular letter, the probability that this is random is one out of 8,031,810,176.
digg •
del.icio.us •
sphere •
google
•


Comments (17)
that's not really the right calculation on the odds, however. one would have to take into account the proportion of words that start with each of those letters, which as anyone with a dictionary could tell you is not evenly distributed across all letters.
still rare, but the calculation isn't that simple.
Posted by frouglas | October 28, 2009 12:15 PM
Anyone got a better calculation? There must be some math nerds out there ....
Posted by Tim Redmond | October 28, 2009 12:35 PM
spellcheck, dude. if you're a reporter, and you're going to use the word incompetent in the first sentence in your article, please spell it correctly. not so good for credibility.
Posted by outside observer | October 28, 2009 01:18 PM
using the 2of12 list from the 12dicts file found at http://wordlist.sourceforge.net/, i calculated the probability of a word starting with the following letters as follows:
f = 4.40%
u = 3.59%
c = 9.30%
k = 0.66%
y = 0.29%
o = 2.66%
u = 3.59%
for an overall probability of 2.39E-12, or approximately 1 in 370,855,495,993. so a much lower probability than that calculated by Supervisor Chiu.
this estimate could probably be improved by narrowing the word list to the 10,000 most common words, or maybe even 1,000 (how big is arnold's vocab?), but i couldn't find any lists of that sort out there that could be easily imported into excel.
Posted by frouglas | October 28, 2009 01:21 PM
A quick look at your calculations makes me think you only calculated the probability of 7 words starting with the magic letters. I think it's actually more complicated than this though. There are two other factors that need to be considered.
One is that you're not just looking at the raw chance that if you have 7 words what the probability is that they will start with f u c k y o u. You have to account for the fact that there are more than 7 words in the letter. It's been a long time since I took statistics but I'm pretty sure off the top of my head that because there are more than 7 words in the letter it lowers the probability from what those above have indicated.
Now before you go getting bent out of shape that I'm somehow sticking up for the Governator there is the 2nd factor. That is that it isn't just a random pick of whether 7 of the N words in the letter start with the magic letters. The letter is effectively a grid, so above and beyond the simple statistical probability of the words starting so they spell out the magic phrase you have to take into account the probability that random chance would have caused the 7 words starting with the magic letters to occur in the right places in the overall pattern so that they all fall into the 1st column of the grid. Again it's been a long time since I took statistics but I can tell you off the top of my head that whatever factor 1 above reduced the probability by, factor 2 will more than make up for it in terms of how astronomically small the chances of this being random are.
Maybe the Bay Guardian should call up someone in the math department over at Berkeley or Stanford. I'm sure they can help them come up with the correct probability after accounting for the factors I mention...
Posted by Dr. Beer | October 28, 2009 01:50 PM
Am I understanding that correctly, that about five times as many letters start with U than with Y? That's unexpected.
And also, nearly 10% of our words start with C? That's good enough for me.
Posted by mattymatt | October 28, 2009 01:52 PM
in response to Dr. Beer: yes, there are a lot of factors that haven't been accounted for here, but your first issue is not one of them. we are only concerned with the seven words that fall on the far left of each line. the letters that begin the remaining words in the text are unimportant. the calculation above estimates the probability that, if you picked words totally randomly, the words that went over the line break and started a new line would begin with the necessary letters.
now, there are some issues that haven't been addressed, which are beyond the scope of my abilities to address. for example, one would have to account for the fact that in a truly random string of words, longer words are more likely to be the first on a given line, as they carry over from the previous line (which i think you've sort of identified in your second point). also, these probabilities apply to all words, with no recognition of the different types of words or grammatical structure. but i think the analysis above improves on the 1/26 probability for each letter calculation originally cited.
mattymatt: yeah, there are some surprising percentages there, but that's what the word list that i have showed. are they exactly right? certainly not. are they likely indicative? i would argue yes. again, as i said before, this could be improved with a smaller list of more frequently used words (rather than a general "all-words" list), but i couldn't find one.
Posted by frouglas | October 28, 2009 02:07 PM
It looks like Arnold has the biggest swinging dick in the State. Good for him. Ammiano should learn that he can't be the Queen for a Day everwhere he goes.
Posted by magdelyn | October 28, 2009 05:46 PM
I think you also have to take into consideration the probability that such a coincidence might occur given the many thousands of public letters that go back and forth.
Coincidences aren't uncommon.
Posted by rei | October 28, 2009 06:07 PM
Ha, face it. It was intentional. The real question is whether or not he was the one that actually typed it in, or if a staffer just got cute.
Posted by bman | October 28, 2009 06:46 PM
I grabbed a huge hunk of text (30k words), wrote a script to count the frequency of first letters, and ended up with a probability of 1 in about 600 billion. That ignores the fact that the space is exactly in the right place. That message was not coincidental.
spike
Posted by spike jones | October 29, 2009 08:10 AM
Lets see if I get this straight, Amianno has a typical San Francisco politician tantrum and Arnie get a little revenge and its bad. After all the hemming and hawing of the since forgotten "you lie" guy?
Civility is for the other side, when your own moonbats shoot their mouths off they are speaking truth to power, when the other side does it, its a disgrace?
If it was a good bill Arnie should have signed it and lifted himself above the San Francisco children, I suppose this letter has brought Arnie down to the level of the average SF screaming progressive.
Posted by glen matlock | October 29, 2009 11:59 AM
Glen, don't you think there's a difference between a verbal comment aimed at a Republican who crashed a Democratic Party event and a formal, written veto message from the governor of California?
Posted by Tim Redmond | October 29, 2009 02:09 PM
Tim the difference is that if the whole situation was reversed you would have another excuse to make up from another angle.
You and the rest of the "progressives" have no standards but "what works at this moment."
If the story was reversed you would be saying how clever the progressive gov was for insulting the conservative Assemblyman and how the conservative needed to get a sense of humor.
Your pretend outrage is a joke, you have no standards but, "how can I a progressive shouter take advantage of this."
Posted by glen matlock | October 29, 2009 04:36 PM
While a good first order approximation, the frequency of words in the dictionary that start with the particular letters isn't a good estimate. A better way as spike suggested, would be to use the frequency of words that start with those letters in actual text that would be similar in word usage to Arnold's letter.
@mattymatt: a lot of words start with the prefix un- which really boosts the number of words that start with u in the dictionary.
Posted by MikeC | October 29, 2009 06:02 PM
My calculation, according to my interpretation of The String Theory and Lord Buckley's line "..Mr Bear, in the eyes of the lord we're both beasts when it comes right down to it..." ['God's Own Drunk']is,
who the fuck knows, who the fuck cares, who the fuck gives a damn, big fucking deal; at least it makes for a break from the repetitious, irrelvant, mendacious, miasma of mealy-mouthed bullshit that masquerades as political discourse.
Bottom line, Tom fired off a fairly lame but appropriate zinger to the Gropenators calculated provocative intrusion, and some troll in Arnie's army attempted a contrived response. The only real surprise here is that Newsom the Nebbish didnt introduce Arnie the Axeman. I guess Aaron still has functions to perform.
Y'all can kiss my hetero, and sagging, heinie.
Posted by Patrick Monk | October 29, 2009 06:50 PM
Frouglas probability is better than Chiu's but still incorrect. What should be taken into account is the frequency of the words appearing in the language. For instance, yes, only 0.29% of the words start with the letter Y, but the words "Yet" or "You" or "Yesterday" are quite common. So the odds of a Y appearing at the start of a phrase are much higher than 0.29.
I think that's what Dr Beer was aiming at. A better way to approach this would be for each of the words to compute the frequency of that word in the language, instead of what is in essence the number of page for each letter in the dictionary divided by the total number of pages.
I made the calculation here: http://cedichou.blogspot.com, but the end result is Frouglas and Chiu seriously OVERestimate the chance of this happening. I find 1 in 132,000 billion billion billion.
Posted by cedichou | October 29, 2009 07:07 PM