Monday, March 10, 2008

Myths

I saw these myths passed off as true in the news media, so of course, they must be true!

Myth: Toxins, sedentary life styles, environmental hazards, poor nutritional choices and violent crime are causing serious health problems for Americans.
Truth: Much to my surprise, heart disease, stroke, lung diseases and overall cancer rates are in decline and have been for a long time. The decline in the death rate is even accelerating - meaning dropping faster than ever (also see the "Related Stories" on that link). Even more interesting is that the incidence at which new cancers are diagnosed has fallen in spite of better diagnostics. Life expectancy is at an all time high and on average, increased by 2 years in just the past ten years (approximately)!

Myth: Windows Vista is an upgrade to Windows XP.
Truth: Many any consumers think XP is an upgrade to Vista. I agree that XP is an upgrade to Vista.

Myth: Home lighting uses 20% of household electricity.
Truth: 8.8%. For typical homes, the figure is anywhere from 5% to 10%. Switching from incandescent lights to compact florescent, assuming all existing lights are incandescent and can be switched, would reduce that 8.8% portion of household electricity by up to 75%. Thirty years ago, lighting was a larger percentage - but since then most homes have added electricity hogs like HDTVs, VCRs, fancy appliances and several computers, thereby decreasing the share used by lighting.

Myth: Switching to compact florescent lights will save consumers money.
Truth: For lights that are frequently used and assuming compact florescent bulbs do indeed last longer this is likely true (about 1/3d of mine have had actual lifetimes less than incandescent bulbs). This is not true when little used lights are replaced with compact florescent bulbs - such as lighting in closets, some hallways, automatic on/off motion sensing exterior lights, lighting under stairs, in utility rooms and in many bathrooms. The U.S. government passed a law that will ban the sale of most incandescent lights starting in 2012 - which means you will have to install these lights in places were they do not make sense. Philips, a lighting manufacturer, was a huge supporter of this law because they will make far more money per lighting unit sold.

Myth: "Red light running" cameras are an effective tool to reducing crashes caused by cars running red lights.
Truth: Red light cameras increase crashes and result in higher auto insurance premiums charged in the areas where they are installed. Like septic system versus sewers (see below), red light cameras are sold to the public as "improving safety". The real reason, surprisingly, is that they increase revenue not only to the city (RoboRevenue) but also to insurance companies who have been pushing for red light cameras. The data now show that red light cameras lead to an increase in overall crashes - and the insurers use that as an excuse to increase auto insurance premiums. The best way to reduce intersection crashes is to re-design the intersection; simple methods like adding one second to a "yellow light" phase can dramatically reduce red-light crashes.

Myth: Businesses hate government regulation.
Truth: Many businesses love government regulation and use regulation to their advantage while publicly complaining about regulations. (They only dislike regulations that help their competitors or harm their own business.) Two examples in the news. First, lighting manufacturers publicly complain that eliminating incandescent lights will raise the cost of lighting to consumers. Privately, they love this since they will sell higher priced products. Second, an executive at General Motors complained new government mileage requirements will raise the average car price by up to $7,000. Privately, they love this new opportunity to increase profits. Numerous industries employ lobbyists at the State and Federal level to pass regulations that - they hope - will benefit themselves while harming potential competition.

Numerous businesses and industries - from the accounting scandals of Kenneth Lay and Andy Fastow's Enron crimes to the collapse of MCI WorldCom and on and on - to today's banking crimes - these overpaid CEOs, by their own actions, are literally pleading for adult supervision. Effective businesses know how to turn externalities to their advantage, including regulation.

Myth: A drought in Australia, caused by global warming, is a significant cause of global rice shortages. It must be true because the NY Times wrote that.
Truth: Rice production in Australia normally totals 1.3 million tons. Total global rice production is 421 million tons. Therefore, Australia's contribution to the global total is 0.3%, but less during their drought year. Even though rice production is increasing by 1.8% this year, global demand is increasing faster - to 424 million tons. The main problem is that yield increased by about 2.5 to 3.5% per year in the 80's but fell to a 0.74% growth rate in the 90s while demand is increasing faster than production. Why the NY Times led with the anecdotal story out of Australia is mystifying since it is just one of many components to the problem, including that climate change has both good and bad effects on rice growing.

Myth: 95% of dieters who lose weight will gain it all back.
Truth: That claim comes from one study in the early 1950s involving just 100 subjects on one particular diet (reference) and may be insufficient for a precise statement about the likelihood of keeping weight off. Other studies have found that many people who lose weight do gain much of it back.

Myth? Obesity is a global epidemic.
Truth: The obesity epidemic mantra has been promoted, very effectively, by the International Association for the Study of Obesity (IASO). Their Taskforce defined the Body Mass Index (BMI) calculation which was then adopted by the World Health Organization (WHO) and is now the basis for defining who is overweight or obese. If your BMI is 26 or higher, you are overweight. The new definition, adopted in the 1990s, instantly redefined a lot of people as overweight. What is not disclosed is that the IASO is a very effective lobbying group funded almost entirely by the drug industry and "diet food" manufacturers. The sponsors are also listed in the IASO's most recent annual report, all of whom profit from the new "war on obesity" by selling prescription diet drugs and diet food regimens. Muddling the issue further are recent studies, such as the one by Katherine Flegal of the CDC, that have found lower mortality amongst those who are somewhat overweight versus those who are not. More here. For another example of how the profit motive has led to incorrect public health policy, then read this.

Myth: Southern California wild fires in 2007 were caused by global warming.
Truth: Some politicians and activists said wild land fires in Southern California in late summer of 2007 were caused by human-induced climate change. Months later we now know that actual causes of the fires were arson and power lines that blew down or arced during high winds especially where new home developments (and power lines) were added to high fire danger areas. These fires were caused by human activity but not by human-induced global warming.

Myth: The United States faces a severe shortage of scientists and engineers.
Truth: Several recent reports including one from the Urban Institute have found that the U.S. produces up to three times more graduates in Science, Technology, Engineering and Math (STEM) than it has annual job openings. According to a report in Science Careers (from the American Association for the Advancement of Science) alleged shortages are an enduring myth. Universities are exempt from the limit on H-1B visa hiring through a loophole negotiated as a way to get universities to support industry's desire to keep tech salaries low. Since 2001, real salaries in the IT sector have declined by 8% (after inflation), hardly indicating a terrible shortage. The median salary in IT also dropped in the past year in non-inflation adjusted dollars.

Myth: Automobiles are "the largest producer of greenhouse gases in the United States".
Truth: Nature itself is the largest producer of "greenhouse" gases (says NASA) but if we restrict this to human-produced greenhouse gases, according to the EPA, 39% of human caused emissions in the U.S. are due to electricity generation, 27% are due to industrial operations (most of which is electricity generation) and 20% are due to the automobile category. Transportation, including trains, ships, airlines and heavy truck transport is about 30% of the total but autos, light trucks and SUVs are 66% of the transport sector, hence 20%. Raising efficiency of the auto category, assuming autos run on gas, from 24 to 35 mpg, as the government recently required, may cut that 20% contribution to 14% (assuming all other emissions remain constant).

Myth: Prescription medications are effective at treating many health problems.
Truth: Recent reports have been published regarding the ineffectiveness of statins (and the high likelihood of serious side effects) and that many anti-depressant drugs may be mostly worthless. (A large review of studies indicates that Prozac and similar drugs are almost entirely worthless.)
'One dirty little secret of modern medicine is that many drugs work only in a minority of people. "There's a tendency to assume drugs work really well, but people would be surprised by the actual magnitude of the benefits," says Dr. Steven Woloshin, associate professor of medicine at Dartmouth Medical School.'
How did this happen? Publication bias is part of the problem - half the studies show a benefit and get published. The other half show no benefits or risks but get buried and do not get published. A few years ago, the Wall Street Journal quoted a executive vice president at a pharmaceutical firm saying that about 70% of prescribed drugs essentially do not work. I asked people in the pharma industry if that were true and they responded with (roughly) "That sounds about right". I do not know if this is good or bad - but its is unfortunate that most patients are not aware that they may see no benefits from prescribed medications - and may see actual harm, including harm that results in hospitalization. We spend many tens of billions of dollars every year on medications that frequently do not work - in part due to huge advertising campaigns by the pharma companies exaggerating the benefits of their drugs, such as occurred with Celebrex and Viox. Ask your doctor "What is the Number Needed to Treat (NNT) and the Number Needed to cause Harm (NNH)?" for your prescribed medications. An NNT of 30 means that for 30 people treated, one will benefit.

Myth: Household septic tank and drain fields must be replaced with "sanitary sewer systems" to insure water quality.
Truth: The primary purpose of sewer systems is typically to enable a greater density of home developments versus septic drain fields. The water quality issue is mostly a "red herring" raised to create political support since most people will be in favor of good water quality but not necessarily in favor of increased population and its effects: traffic jams, longer lines at stores, increased pollution and over crowded school classrooms.

When homes have septic systems, the minimum lot size must have space for two septic drain fields - the original and a possible future replacement. Each drain field size is determined by the soil characteristics and the size of the home (how many people will be living there). Space is set aside for a second drain field in case the field fails. Sewer lines are brought into older neighborhoods to replace drain fields so the density of home developments can be increased. This is buried in my County's Master Plan but they publicly sold the project based on the need to protect environmental water quality.

We have watched developers tear down old homes on large lots and replace with as many as 5 new homes on tiny lots - beginning as soon as one day after the sewer projects were completed. Many of the new homes are 3,000 sq ft and larger on 4,500 to 5,000 sq ft lots and replaced 1,000 square foot homes on 35,000 sq ft lots. Government employees and politicians like greater home density because it increases property tax revenue collections.

A quick Google search found that sewer systems are installed around the country specifically for increasing tax revenues. Huge luxury homes are rarely an improvement to the environment.

Myth: Hybrid metro buses are great for the environment - saving gas and reducing pollution.
Truth: Perhaps some day but not yet. Hybrid buses emit the same level of particulate emissions as conventional diesel buses. Fuel consumption is supposed to be about 10% less than the diesel buses they replace with hopes this will eventually reach up to 50% reduction (diesel buses typically get 3 to 6 mpg with variations depending on hills, stops, passenger loads). Yet the largest hybrid bus system in the world has found that hybrids get worse gas mileage than diesel buses. (They do offer a benefit for bus systems and passengers when the buses must run in poorly ventilated tunnels since the bus can then run as an electric bus - but so could older dual mode buses that obtained electricity from overhead trolley lines.)

Hybrid buses cost about $500,000 each or about 67% more than diesel buses. Even with hoped for fuel savings, hybrid buses and their operation are significantly more expensive without yet delivering significant benefits. Hydrogen powered buses are far worse - costing about 30 times more to operate per mile than diesel buses. Transporting passengers by helicopter would be less expensive - ever so mindful of costs - not! - California plans to expand the use of hydrogen fuel buses since the new buses cost six times more than diesels.

Myth: A Russian transport ship reached the North Pole without the aid of an ice breaker - the first time in history a ship did this without the aid of an ice breaker!
Truth: This is my all time favorite media myth. About 2 years ago, the NY Times ran a feature story about a cargo ship named the Akademik Federov reaching the North Pole without an ice breaker due to melting Arctic ice. The NY Times used this to create a speculative news story about how melting ice would open the Arctic to global commerce by ordinary cargo ships. But the entire story was based on a false premise: The Federov, like about 70 Russian cargo transport ships, is also an icebreaker. In fact, the Federov can break through ice packs better than the best U.S. ice breaker. I understand the NY Times has legions of fact checkers, but alas, they have not yet learned to use Google as the information was readily available via Google. I wrote to the NY Times corrections email address with a link to the manufacturer's specifications. The NY Times did not acknowledge my note and made no correction to their false report. Presumably, they did not wish to have facts get in the way of a great narrative. "Fact Free Reporting" is now considered hip in journalism circles.


This post was written and updated December 24, 2007 through April 28th, 2008.

Friday, November 16, 2007

Part 5: JibJab does the news

A must see:



This is an appropriate follow on to the Parts 1 through 4, below, about how the media possess math skills at abou the middle school level.

Friday, October 19, 2007

Last Blog Post

So at last, I have written my last blog post. Enjoy.


A few weeks ago I posted an item from the Wall Street Journal about an epidemiologist who published a paper in 2005 concluding that virtually all medical science research findings are ultimately proven wrong with suggestions this may apply to other fields of science too.

At that time I referred to the finding as astonishing. However, I later found (and posted in a comment) an old paper published in Science that found only 1% of published papers ever received six or more future citations in other research. About 45% of the published papers were never cited again and about 45% were cited once. The implication was that most published research is worthless and rarely looked at again let alone replicated.

The history of bad science and pre-conceptions leading to disastrous public policy is lengthy. One example makes this clear - Nazism was built on top of the very bad science of eugenics. Bad science was adopted by Hitler to blame the Jews (and others) and whip the citizens into nationalistic furor and conformity. Nazism may have developed without the faulty science of eugenics but eugenics created a seemingly credible basis for their insanity. Skepticism of eugenics and Nazism was not tolerated within Germany.

Right now I am reading Gary Taubes' book "Good calories, Bad calories" which covers in meticulous and exhausting detail the history of nutritional dietary recommendations. The book has about 60 pages of references listed in the bibliography. The author has been a science writer for Science. He reviewed more than 100 years of published peer reviewed literature. Where possible he interviewed hundreds of clinicians and researchers. His process was to read current recommendations, and then read their referenced sources. And then read the referenced sources for those, and so on, all the way back to the 19th century to trace the evolution of nutritional guidelines.

His book is about the evidence and the flawed process that led to today's questionable nutritional guidelines and possibly leading to increased rates of disease. Contrary to drilled into our heads propaganda, high fat diets may reduce cardiovascular risk and low fat diets may increase other risks. There turns out to be little evidence supporting claims of low fat diets - the mortality of those on low fat diets is the same as those on high fat diets. How we reached this surprising state and what it means for our health is the story he tells.

This review continues - its lengthy but well worth reading ...



I've reposted below some items from the old blog that were original works and which were frequently visited by people searching the Internet for relevant terms.

The readership here was blown up by the iPowerWeb fiasco and there is no indication that it has returned or will return in the future. I'm down to about 25% of the readership prior to the August meltdown that took the web site off line for nearly a month.

I chose in late September to end the blog but decided to wait until the end of the month since this web site started in Nov 1995. Makes for an even 12 years.

I'm off to other things - and I still maintain this blog for a school (Stop by and visit!).

If I post here again it will be the rare essay, like those below. For example, it would be nice to add an essay to the collection, below, on the problems of the Delphi-method used for achieving consensus amongst stakeholders and the use and abuse of expert panels. But I no longer wish to spend much time on this.

Comments have been turned off because I do not want to moderate them or deal with junk comments in the future nor will I necessarily respond to emails on the essays. I'm outta here.

Thanks for 12 great years!

Part 1: The Math-Free Zone of Journalism

I am late to noticing the lack of quantitative analysis and simple math skills amongst those in the journalism business.

Since I lack a Ph.D. in the subject matter (or any Ph.D. as I only have a Masters), I am automatically not an expert and no one will care much for what I think. So instead I quote from the experts - who work in the journalism field - and who have been documenting a lack of basic math skills in the math-free-zone of journalism for years and consider the problem to be a crisis (about which nothing is being done).

Source: American Society of Newspaper Editors:
Columnist James Kilpatrick has collected such a thick file of mathematical errors in newspapers that he’s convinced many journalists cannot handle even grade-school math. He declares: "People who write for a living should never be left alone with mathematics. They are almost bound to mess up."
The article says the average score among a group of journalists taking a test of simple middle school math questions like those on this Math Test for Journalists was just 68% - essentially a failing score. This may be the actual test on which they earned an average score of 68% - this is 6th grade math skills. Appalling.

Source: Dr. Kathleen Woodruff Wickham, author of Math Tools for Journalists:
Journalists are notoriously bad with numbers. Either from lack of training or an innate phobia of figures, journalists routinely avoid tackling math problems and often make mistakes when they do delve into numbers.

Notes from a conference held in June of 2005 says that journalists only need "6th grade math" but apparently cannot even do that (quoting conference speaker Steve Doig, Chair, Kronkite School of Journalism, Arizona State University).

Comments from journalists themselves:
According to the ACEJMC, to be accredited, a journalism program's curriculum "...should provide up-to-date instruction in the skills and in the theories, history, functions, procedures, law, ethics and effects of journalism and mass communications...." AND "Competence in language use and visual literacy should be stressed throughout the curriculum...."

And that's as specific as it gets in terms of curriculum standards or expectations. There is not one word anywhere in the ACEJMC document that would suggest journalists need any quantitative skills. Not surprising that, and I only know of two programs -- Arizona State and Hawaii -- that require their graduates to have a course in statistics, for example. It seems to me that if a GA reporter can't compute percent of change or percent of proportion, for example, he/she literally cannot competently cover ANY aspect of government.
As of the 2005-2006 academic year, the Accrediting Council for journalism and mass communications schools adopted its first ever math recommendation that "graduates should ... be able to" "apply basic numerical and statistical concepts". This means basic arithmetic, percentages, averages and that is about it.

My previous observation that journalists live in a math-free-zone was dead on. Most journalism programs do not have requirements for math beyond the minimal high school math classes required for college admission, as noted by the just updated accrediting standards. I checked a few university journalism programs and several had no math requirements even in the "general education" or "core requirements" category. Based on the sample tests linked above, journalists need only master middle school grade level skills - we are not discussing algebra yet. This is just arithmetic, not even math. Many journalists lack sufficient numeracy skill to even understand why this is a problem.

And some journalists are simply math morons: "Seven of the eight warmest years on record have occurred since 2001, and the 10 warmest have all occurred since 1997." Wow. 7 of the 8 warmest years in record occurred during the past SIX YEARS. (This was written in 2007 and presumably accounted for the years 2002, 2003, 2004, 2005 and 2006 - which is just 5 years, or 6 years if you include the not yet completed 2007. Either way, even the simplest of math is wrong.)



Ran across another item that explains why reporters poorly or rarely question experts, a problem noted by Professor Edward Wasserman at Washington and Lee University. They are taught that if the source has been peer reviewed, then it is probably correct, which means reporters do not understand what peer review is about nor understand its problems (see chapter 2 of Dr. Bill Hersh's textbook, Information Retrieval or this mea culpa from Dr. Lawrence Altman in the NY Times). Read this column from Nature to learn about some of the problems or issues raised by this item from the University of Michigan Press.

Journalism students are apparently incorrectly taught that peer reviewers check and validate the underlying claims, assumptions, data and conclusions which is typically not true of peer review (and in some science fields the data has been kept secret.) The assumption that peer-reviewed means verified and tested is likely a common misconception held by the general public too. Most article submissions that do not pass peer review end up published in a different journal anyway. (There is peer reviewed research that suggest most published studies are subsequently shown to be wrong - on the order of 98% of them are wrong. A study published in Science long ago found that only 1% of studies are cited by other studies six or more times. Nearly half are never cited and most of the remainder are cited only once which means the studies were likely wrong, incomplete, or worthless.)

Peer review does not mean the work has been audited, verified and tested and is therefore accurate or reliable. A professor at Drexel University explains this better than I.

Junkfood Science has added a column on "math phobia" and how the public and the media are misled by science research (and mostly misled by media morons).

I am slowly working on adding a lot more to this topic, which will appear as a Part 2 and a Part 3. I will address why journalists avoid math, why they are taught to avoid using math in stories (even if it leads to "fake but accurate" news, as I will demonstrate) and more. Sadly, a good rule of thumb is that if the news report involves science, health care, the environment, or topics where numbers and statistics - including accounting and finance - are extremely important to understanding the story, then the news report is often wrong. (One exception is that some business reporters do have training in business, economics or finance ... although I can think of examples were those who obviously had no such training got things rather confused.)

Almost comically, this is well known in the media but they will never report on this issue in their own publications.

Part 2: Why is Journalism a Math Free Zone?

Part 1 established that your local news room is a math-free-zone and the problem is extremely well known in the field of journalism. Ironically, they never run news reports about their own math incompetency. Sounds like censorship to me.

Part 2 looks at how this situation came about, plus how reporters cope with numbers, if they have to cope with numbers.

Q: Why has journalism developed into a math-free-zone?
A: Journalism did not develop into a math free zone - it has always been a math free zone. Journalists self selected into the field, in part, because they like to write and did not like math. There are numerous journalist quotes online about failing math, failing the SAT math section, skipping math class, etc. - seems peculiar to me that they boast about this but they do:
"Our distaste for figuring led us to avoid the study and practice that gives more mathematically oriented folks such ease with numerical procedures. And maybe we just lack the talent. Plenty of journalists received verbal SAT scores that were 30 or 40 percentile points above their quantitative scores."
And
"In the grand scheme of things, most journalists rank numbers somewhere below cockroaches. If the truth be told, a good number of us chose journalism as a college major because it allowed us to avoid math courses."
(Source: No Train, No Gain, a web site resource for news trainers. More examples of math problems here.)

Reporters take pride in knowing so little math:
"Sometimes that's because sources are deliberately misleading, and sometimes it's because we relish our mathematical ineptitude so much that we encourage stories that are inaccurate and unfair."
Q: Is math considered important qualification for getting hired as a journalist?
A: No, not at all. Math skill has no bearing on most job opportunities. Until a year ago, math (and by math I mean simple middle school math) did not appear on the accrediting standards for colleges and schools of journalism. Math was not seen as important to getting hired:
"If Simon [a reporter] had been an illiterate -- someone who lacks the ability to read and write -- he never would have been allowed in a newsroom. But as an innumerate, it didn't matter that he wasn't good at math. If you don't know the difference between a noun and a verb, you could never get a job as a reporter or editor. But newsrooms are full of people who don't know how to calculate a percentage."
Q: So how do journalists handle numbers?
A: By avoiding numbers all together! Even avoiding stories that require too many numbers. But their main trick is to simply write entire stories without numbers by replacing numerical values with comparisons and metaphors.

Most are trained to remove numbers from stories because numbers are said to "bog down" an otherwise interesting story. It is common for a reporter to wrap a story with colorful adjectives to suggest trends and comparisons to past events, glued together with colorful quotes from equally colorful "experts". By eliminating numbers and using colorful adjectives, the reporter avoids bogging down the story with actual details that, well, might even make the story moot.

This is what journalists are trained to do - avoid numbers, eliminate them whenever possible. Simplify, simplify, simplify. Give only what "the reader needs to know".

Example 1:

  • Use Only What You Need
    • "In all, conservation groups sought to halt 421 proposed national forest timber sales containing 2.5 billion board feet of timber. That's more than half the 4.8 billion board feet the U.S. Forest Service plans to sell on the 13 national forests in Oregon and Washington this year"
    "The point of this passage was to indicate the scope of the environmentalist attack on logging. So what are the critical figures? To some degree, that's a judgment call. But you can make a good argument that all readers really needed to know [emphasis added] was that the conservationists wanted to stop more than 400 sales containing more than half of all the timber the Forest Service planned to sell in Oregon and Washington. Adding all the other figures just detracts from the central point without contributing anything terribly meaningful."

    On the surface, the recommendation sounds acceptable. So let's rewrite the story using those guidelines:
    In all, conservation groups sought to stop more than 400 timber sales containing more the half the timber the Forest Service planned to sell in Oregon and Washington.
    Sounds good. What could be wrong? The revised version, while sort of accurate, has lost sense of the scope of the acreage involved. Change all of the original "billion" references to "thousand" references. The revised sentence is still accurate, but does it capture the scope of the controversy? No. But that's okay, the reporter knows what I "really needed to know". The Associated Press, in an Orwellian doublespeak proclaims they stand for your "right to know" although that apparently means only your right to know what the AP thinks you need to know.

    Example 2:
    Let's look at another example of scope. A frequent news headline reads similar to "Researchers announce new drug therapy that is 50% more effective". Left out of the math-free and number-free report is "more effective" than what?

    Typical of most medical research, the old drug may have achieved 4% improved patient outcomes and the new drug achieved 6% improved patient outcome. Did the "50%" claim capture the idea that we are discussing a large percentile increase in a small number?

    This type of reporting loses another part of the story. In a typical medical study you might have results like these:
    • 30% took a placebo (received no treatment) and got better
    • 34% of those taking an older medication got better
    • 36% took the new medication and got better
    Are you excited by those numbers? When you see the numbers this way, you discover that hardly anyone gets better as a result of either the old or new drug. Could you tell that from the cheery "50% more effective" headline? Virtually all news reports will proclaim the above results as "the drug is 50% more effective" which is misleading at best. (Junkfood Science has more examples of the bad reporting on relative risks in health care study results. We are slowing learning that many purported treatments, typically from prescription drugs, either do not work or barely work at all. In fact the majority of drugs do not work at all for most patients.)

    Example 3:
    Here is another example of science fiction news reporting from my morning newspaper:
    "From 1985 to 1994, 62 percent of the female polar bears studied dug dens in snow on sea ice. From 1998 to 2004, just 37 percent made dens on ice."
    Those are the only numbers provided in the story. But percent of what? 10 bears? 100 bears? 1000 bears? The first period (1985 to 1994) represents 10 years, but the second period represents 7 years. What happened to the years 1995 to 1997? If the full 10 year period is shown, what are the results? Some one removed a lot of numbers from the story to avoid bogging it down with data - but turned the story into meaningless gibberish.

    (I did some brief looking in to this and found it involved 89 bears across the whole study period and was presented at a conference in Alaska; the study itself is under review and has not yet been submitted for publication or peer reviewed. Treating it as a binary confidence interval problem and assuming part of the missing information, the 95% confidence level would be about + or - 10%. Stated another way the actual number in the overall population lies between 52% to 72% of polar bears dug in snow ice in the first period, and somewhere in 27% to 47% dug in snow ice in the 2nd period. This would be statistically significant - but barely (or maybe "bearly"?) - which means that the finding is "probably true". Would it sill be statistically significant if the 1995 to 1997 period was included? Why is it missing? Was the data "cherry-picked"? Why has no reporter asked that question? We know the answer - it involves both math and skeptically questioning an expert. All the media outlets have since picked up the story and ran it - often after removing even more numbers - all you need to know is that "more polar bears are denning on land".)

    Example 4:
    In January 2008, the NY Times published an article filled with the usual "he said, she said" quotes and the startling claim that returning American veterans were murdering people across the United States due to the stress of the war zones. Sadly, the NY Times ignored the actual numbers in order to reach their conclusion because the actual data contradicted the NY Times story. Presumably the reporter did not wish to "bog down" a good story with actual numbers.

    Q: How are numbers handled on TV news?
    A: Numbers on TV? You're kidding? TV newscasters are taught to avoid all numbers unless they can display a graphic, typically a cute graph (stack of dollar bills, columns on a building, line of puppies, etc).

    The assumption made by the news industry is that readers know less math than the reporters.

    Back when news had to fit a dwindling sheet of paper (newsprint), dumbing down stories to make them fit may have made sense. With rapidly shrinking page counts in most papers, they are dumbing down the stories even more. Their online editions ought to provide more depth, ought to provide the data, and ought to link to the supporting data and key sources. But they cannot do that because the reporters often lack the analytical skills necessary to do much beyond simple averages and percentages.

    Part 3 looks at the impact of the math-free-zone on science and other reporting that is heavily dependent on statistical analysis. We will see that organizations such as the Royal Society (perhaps the oldest scientific organization in the world), NASA and the European Space Agency press release guidelines call for removing as much data as possible from press releases with the admonition that they should assume "the reader knows nothing". Worse, everyone knows this and uses reporter math phobia as a tool to shape the news reports in their favor.

    Part 3: How does math innumeracy affect the news?

    Part 1 established the poor math skills common in the media industry.

    Part 2 provided background on how this lack of skill came about and examples of mangled numerical reporting.

    Part 3 explains how numerical ineptness leads to bad news reporting, and how organizations take advantage of naive reporters by dumbing down and manipulating the news to serve their own purposes.

    Q: Journalists never seem to trust what a politician says - why do they trust doctors, scientists or other highly skilled "experts"?
    A: As noted by Professor Edward Wasserman in part 1, they view "experts" in "reverential" terms and lack the skill to confidently ask skeptical questions of "experts". The No Train, No Gain folks wrote:
    "We're used to the way politicians and bureaucrats hedge their words to make themselves look good. Because we're on guard and savvy to the politics of the language, we usually protect readers from the most self-serving versions of the truth.

    But we're much less critical when it comes to quantified information."
    The developer of math guidelines for the Poynter organization is quoted, saying
    ""reporters and editors are suckers for numbers. To them, a number looks solid, factual, more trustworthy than a fallible human source. And being numerically incompetent, they can't find the flaws in statistics and calculations. They can't tell the difference between a meaningless number and a significant one. The result is stories that are misleading and confusing at best and, at worst, flat out wrong."
    Here is a simple and very common example: Around election time, we are inundated with polls. One week the poll says X has 51% to 49% for Y. A week later, the poll shows Y has 51% and X has 49%. The headline blares "Y TAKES THE LEAD!" The reporter flips through the Rolodex and grabs quotes from the usual suspects, "This is the turning point in our campaign," said Jay Walrus, Mr. Y's campaign manager. "We expected to see this transition occur as voters became more familiar with the story about Mr. X torturing little puppies when he was younger. Last year."'

    Of course, the poll probably had a "margin of error" (at what confidence level is never specified) of +/- 4%. Since the estimates are all within the margin of error, the race remains a dead heat. We cannot pick one as the winner. This is elementary statistics. But poll results will be published weekly in the press during election seasons, and nearly always, will get the story wrong.

    This common error is the same type that led to the incorrect news reports about 2006 being the hottest year on record. (I confirmed that by contacting the National Climate Data Center official in charge of compiling the data who agreed that the confidences interval of the estimates meant that we could not distinguish the "hottest" from amongst many different years and that the NCDC press release as misleading (intentionally, it turns out). A scientist at NASA says about a dozen years all qualify as "hottest" and they are spread throughout the 20th century. But do not expect a news reporter to understand this nuance.)

    Another common problem for reporters is the inability to see that very technical published reports with a lot of mathematical symbols are often little more than exercises in finding spurious correlations. With our modern computers and data bases, a surprising number of research papers are written by data mining and looking for correlations in large mountains of data (a related problem is the often bogus use of meta-analysis methods to data mine other research reports). Today our computing horse power makes it easy to find correlations and associations galore - yet most are likely to be meaningless - but do make for great headlines. Data mining is so much easier than doing real research that, for some fields, may require getting dirty out in the field. Hence, data mining correlations and meta-analysis have become the preferred quick-and-not-so-dirty way of adding another paper to one's curricula vitae.

    When reporters do not understand the simplest of math concepts, they may as well be writing fictional news reports. Or may be science fiction news reports ...

    Q: How does the lack of math understanding impact science reporting?
    A: Worse than we realize since there is the other side of the story - those who manufacture the press releases know that reporters are not too bright when it comes to numbers. When major research is published, the sponsor organization issues press releases summarizing the major claims but omits the statistics and uncertainty and other qualifiers that go along with the actual report. Journalists then write the story based on the baby-talk press release, add a few "he said, she said" quotes - but rarely read the actual journal article. Like the polar bear story in Part 2 the numbers do not even need to add up and the reporter will not notice.

    The Royal Society - this is the U.K's national academy of science - issued guidelines for what scientists should plan to release to the media recommending that scientists dumb everything down so that the journalists can understand their work. The use of actual data is discouraged.
    The media abstract should be no more than 100 words and aim to outline, to a lay audience, your research and any relevant findings. If possible try to highlight why the research is important, i.e. does your research discover something new? Does it change perceptions or previous understanding? If possible, try to link your research with to examples or analogies as this enables journalists to understand and relate to your work. Please avoid using excessive jargon or statistics, unless absolutely necessary.
    (Source: Journal of the Royal Society, Interface - "J. R. Soc. Interface is a new international journal publishing articles from the interface between the physical sciences, including mathematics, and the life sciences.")

    Examples of NASA and European Space Agency guidelines for dumbing down the story, are below.

    While the scientists may have specified the details, assumptions, caveats, cautions and uncertainty about their work in the journal article, their employer leaves those out of the press release. And the scientists often fail to assert themselves in the PR loop to insure that their own employers issue accurate press releases.

    A related problem is that reporters typically translate probability numbers into English prose (remember - avoid numbers!). A 60% probability becomes "significant" to a reporter or possibly even "strongly significant" even though it is neither. Or it becomes "likely", when its not really likely at all.

    A research finding that a result is statistically "significant" or "highly significant" means that the result is "probably true", whereas the English equivalent of "significant" or "highly significant" means "important". Reporters often translate a science press release's "statistically significant" claim into "important breakthrough" even though the usage in stats is fundamentally different than English: they are not the same.

    A paper was circulating about ways of conveying uncertainty and risk (to reporters and others) in the context of the International Panel on Climate Change (that paper led to these guidelines.) The first link notes the impreciseness of language used to dumb-down probability estimates, producing inappropriate conclusions by the media or the public regarding probabilities and event magnitudes.

    The second link (see point #14) has a table translating probability estimates into reporter-speak: >66% is "likely", 33% to 66% as "About as likely as not" and so on. Stated another way, "likely" means there is a 2 in 3 chance our forecast is correct and a 1 in 3 chance we are wrong. Some in the climate science community have expressed reservations that the public view of climate catastrophe does not match the science and, per the above links, are working to address those short falls. Unfortunately, that is a challenging task when real data must be sanitized into baby-talk for the media, often by PR flaks who measure success by getting their press release noticed by focusing on the hype rather than the most likely outcome.

    Q: Are all science press releases written this way?
    A: I guess. Here's the European Space Agencies recommendation for writing science press releases:
    "6. Simplify: A fundamental rule of written science communication is to make texts as simple as possible. Nowadays people simply do not have time for lengthy explanations.
    7. Explain: It is always necessary to match the writing to the level of the target group, but never more so than in science writing. Remember that "the reader knows nothing".
    Here is what a NASA representative has to say about this:
    "Many observers of the science press have noted an increasing tendency for both press releases and printed stories about science topics to exaggerate the uniqueness and impact of new research. The writer of a press release does this to increase the probability that the media will cover the story, and the media reporter will go along with this hyperbole or perhaps expand it further in order to get the story approved for publication by editors or other gatekeepers."
    Q:Who writes dumbed down science press releases on behalf of the scientists?
    A: PR staff are often former journalists, people trained as journalists, and people sort of trained as journalists including discredited journalism majors like ex-NASA PR flak George Deutsch. They are trained in "Schools of Journalism and Mass Communications", the latter being a catch-call category for broadcast news and public relations functions. Former journalists know how the system works and are often the best at creating "effective" press releases that get noticed and re-published by the media. Science by press release is used specifically to influence the media into creating fake news reports often about fake results. And a surprising amount of what goes into a press relief is pure "blustery crap".

    Q: Is the problem just science and tech reporting?

    A: No. Numbers appear everywhere: health care, medicine, social security, the environment, safety topics, government spending, tax policy, economics and on an on. Every large organization knows that most reporters are math phobic and easily manipulated through press releases. Its not just corporations or government or researchers. Non-profits do it. Environmental activists do it. An enormous number of news stories began as a carefully constructed press release designed, typically, to appeal to emotions and devoid of actual data.

    Some disguise their true motives by creating fake grassroots organizations to do their lobbying for them. In the PR biz, these are known as "astroturf" organizations doing "astroturf" marketing and lobbying, defined as "instant manufacturing of public support for a point of view in which either uninformed activists are recruited or means of deception are used to recruit them". Possibly founded by well intentioned individuals, the groups are funded by corporate interests, special-interest lobbying groups and the PR-firms they hire. Even some well known "authoritative" science web sites run by "real scientists" were originally started by astroturf PR firms. Many corporate interests lobby for laws that will benefit their own interests and often harm their competitors. They will often choose not to lobby for laws directly to avoid the obvious charge of self interest. Instead, they work through Astroturf organizations to shape public opinion and thence, use them to lobby politicians to create a regulatory environment favorable to their interests. (Contrary to public perception, many corporations love government regulation and effectively manage the regulatory process for their own ends.)

    In the spring of 2008, the Alliance for Climate Change launched a "grassroots" public relations campaign to push for government laws and regulations concerning carbon dioxide (especially). One of the primary beneficiaries of mandated carbon laws will be those who run "carbon-exchanges" and those who fund them and seek to profit from new carbon laws; this is part of the reason the ACC intends to spend a minimum of $300 million (with more likely) to run its public relations astroturf marketing program. The folks behind ACC include quite a cross section of former politicians, current political appointees and the financial firm Goldman Sachs, all of whom stand to profit enormously once carbon-trading is required by law.

    Q: How else does this influence how press releases are written?
    A: PR writers avoid numbers and substitute emotion, crisis, or calamity. Try to wrap it in to being "for the children" or some how involving "cute animals" that are endangered. Add hyperbole, spice with "breakthroughs!", stir with Astroturf activists.

    Issue the press release well in advance of the availability of the data. (I suppose then, that this would not be a surprise at all.). This is known as "science by press release" (some of which make false claims via press release says Wesley Smith, lawyer and bioethicist.) "Science by press release" garners the media exposure without offering an opportunity for anyone to audit, verify and authenticate the work. Once it is on the newswire and distributed around the country in newspapers, television and radio newscasts, even if completely wrong, its now a fact as far as most of the public is concerned. Even if a retraction (say fixing the broken polar bear story described in Part 2) were issued, chances are that very few outlets would print the fix. It may be all wrong, but since it was "Seen on TV" it is now a fact.

    Numbers-free and data-free news reports based on over hyped press releases is opinion writing, not news reporting.

    Science by press release is very common in health care, industry and more recently it is appearing in the general sciences as well. Journalists turn off their thinking caps and regurgitate marketing hype without questioning any aspect of the message. This is sadly what passes as journalism now days - journalism has become marketing by a different name.

    Q: Why don't scientists step up and point out when the media is making gross mistakes in science reporting?
    Answer.

    Q: This Astroturf marketing is interesting. What other steps do they take?
    A: Astroturf marketing can be used as a follow on to the press release to add more spin to the press release hype. Astroturf marketeers can control the "Letters to the Editor" page of your local paper. Astroturf and other special interest groups routinely use phone calls and emails to encourage their followers to simultaneously write "Letters to the Editor" of the local paper. First, this emphasizes the local, "grassroots" nature of the effort. Second, the "Letters" editor typically receives more letters than they can use. They select which letters to publish, in part, by how many letters they receive on a topic. In the past few years, newspapers decided that readers wanted more viewpoints versus fewer viewpoints - and today they typically limit Letters to the Editor to 200 words or less. In effect, quantity of viewpoints is now more important than quality of viewpoints. This makes it easier than ever for Astroturf marketing to Letter-bomb the local paper in conjunction with over hyped press releases.

    This "letter bombing" makes it easier to create the semblance of "consensus" and that naysayers are misguided outcasts.

    The local news folks know full well they are being manipulated by PR flaks and letter-writing campaigns. Everyone seems to engage in "Wink, wink, nudge, nudge" - its just a big con game. But how many readers or TV viewers were aware of how their news is carefully manufactured? I did not know all this.

    Q: What does the public think of all this?
    There is an increasing awareness that the media exaggerates and overstates research findings and the confidence in those results and typically leaves out critical details such as study limitations. Read the comments on this NY Times item.

    Q: What qualifications are required to be an environmental science journalist for CBS News?
    A: None related to science or math. But you should have "great energy" (translation: young and good looking) and "You are wicked smart, funny, irreverent and hip, oozing enthusiasm and creative energy" (translation: young and good looking) and you'll need to travel a LOT (translation: young and good looking and probably unmarried too!). There is no requirement that candidates will have studied science, math, the environment or engineering - what they want is a good "story teller", preferably with a pre-determined point of view.

    Fake but Accurate News
    Math and statistical interpretation underly nearly every subject that matters today. But reporters not only lack the basic math skills, they are taught to avoid the data. Hard news becomes soft emotional news: lead with the anecdote and then write a bunch of "he said, she said" quotes to create text devoid of actual data that might bog down a good story. Even in a world of bloggers, hardly anyone actually verifies press articles that involve numbers (readers could very well be dumber than reporters when it comes to math).

    The result is "fake but accurate" news reports shaping public opinion, leading to public policy set by politicians who generally know less math than the average person on the street.

    The term "fake but accurate" news was coined after CBS News ran a poorly sourced story based on photocopies of Microsoft Word documents from about 1970. CBS defended itself with contortions saying that while the documents might be fake, the conclusion derived from the fake documents was accurate. Sort of an Alice-in-Wonderland verdict before the evidence approach to journalism.

    Ultimately, public policy is settled by who shouts the loudest or creates the best emotional story about a child losing a puppy. Public policy is set by politicians who far too often appeal to emotions and consensus building. Journalists pride themselves on writing advocacy pieces to shape public policy (whether their thesis is true or not is unimportant). It is is very hard to provide a rational argument for or against a policy that has become devoid of mathematical understanding: Emotion usually wins in the Court of Public Opinion.

    Q: Will media math skill deficiencies ever get fixed?
    A: I guess not.
    "There has been a movement, so far unsuccessful, here at my university to require students to pass a math test similar to the spelling and grammar test that they must pass to receive a degree from the journalism school."
    (Source: Univ of North Carolina School of Journalism and Mass Communications.)

    While some people seem aware of the problem, ultimately their real business is selling eye balls to advertisers. Diverse topics and accuracy are not as big a deal as they would like you to believe. All they need do is be "good enough" and have enough stories about Paris Hilton and Brittany Spears to keep selling eye balls to advertisers. TV News is equally bad, often featuring saturation coverage of "lost cute white chicks" - its good for the ratings, sells more eye balls, and it does not involve math.

    Even the National Geographic Society has degraded itself. Their National Geographic Channel routinely passes off recreated events without identifying them as recreations - having the effect of compressing time and over stating the importance or urgency of findings. In some cases, they leave out facts that make their claims lopsided or even wrong - but that's okay because the NGC TV channel's goal is emotional content that sells eye balls to advertisers. The National Geographic today is about entertainment and no longer about stretching your mind.

    Q: Knowing all this, why would anyone read the newspaper?
    A: Because they are bored? News today is mostly infotainment. It is not about "news". The paper is also handy for starting up the wood stove, lining bird and pet rodent cages or as packing material in shipments. If I actually read the paper anymore, I'm certain I would find several articles every day in which the numbers have been removed, mangled or misinterpreted. (I do occasionally read it and I find mangled numbers each time I try to read it.)

    My perspective on this is strongly influenced by some recent lousy reporting in my local paper; however, when I set out to learn more 4 weeks ago, I had no idea I would find that most of the industry is aware of the problems, talks a lot about it, accomplishes little and carries on without change "since we've always done it that way". Of course, my local paper is one the newspapers that President Truman declared in 1948 to be one of the two worst newspapers in the country. Hey, we've always done it that way!

    We would be better off reading the press releases ourselves, reading the original source material ourselves and learning to think for ourselves rather than being fed baby-talk, evidence-free reports that tell us "what the reader needs to know".

    Which leads to the question, do we even need any of today's soft news reporters? There are a few reporters who do know math and who do know science. But not many. And it may not matter any more: Newspaper readership has been dropping for a very long time now, and may even be dropping more rapidly. My local paper has lost about 6% to 7% of it subscribers in the past couple of years. And the L.A. times announces a huge online/offline re-org calling itself "Web-stupid". None of their proposed measures addresses the issues I've discussed here - they still cannot or will not do the math. Their solution is to put bad reporting online, faster and more often. Whatever. Even when the story does not involve any math they get the headline completely wrong. Geez.

    I'll have some additional comments in Part 4 (a lot shorter than this part!).

    Afterword: And how should schools improve science knowledge? Would you believe by showing them science fiction TV shows instead of real science? Sigh.

    Mister Snitch has a lot of interesting comments on the past and future of the news and the old fashioned command-and-control top-down view of monopoly newspapers. He advocates a Wiki model for news - the community is the news. I like it.'

    Part 4: How Journalists are trained to censor data

    This is how journalists are trained to censor hard data from news stories:
    # The most effective writing comes from selection, not compression, of facts. It's also true with numbers. Choose only the numbers that have meaning to your readers.

    # Consider charting numbers instead of writing them. Removing them from the text not only improves your story; it often makes a bigger impression on readers.

    # Pepper your story with just the right number in just the right place rather than cramming them all together. Use an anecdote, quote or observation to separate paragraphs with lots of numbers.

    # Recast as many numbers as possible in simple terms that remove their abstraction. Ratios, rates, pictorial images and rounding can help simplify numbers.
    Mathematics is detailed oriented; most writing is rarely detailed oriented. In writing, we are taught to simplify our sentences (I've written 7 books and I learned a lot from the editors who effectively simplified my text.) The book The Elements of Style is perhaps the best book ever written on good writing (and its short!)

    But there seems to be a problem in the bridging of the world of math and the world of writing.

    Unlike text, where less text may make the same point more clearly, in math, you cannot leave out the details.

    Math, like text, must also be "read". But most readers have not learned how to read math. Instead, their eyes glaze over and they jump ahead to the text description. When reading math you must slow down, digest and think. Reading math is harder than reading text.

    Just because reading math is hard is not an excuse to censor math from the record.

    Journalists are applying rules that make sense for text and verbal concepts to all math and data. Removing the details of the math does not always make a better story - it often makes the story incomplete, meaningless or plain wrong. There are many examples (including some I've posted in previous parts of this series) where reducing math content leads to less understanding, not more.

    The goal should be accurate understanding - not "easier to read". Journalists have confused the idea that "easier to read" means "less data" leads to better understanding, when it often leads instead to mis-understanding. Worse, reporters who have difficulty doing percentage calculations are selecting which numbers they think are important. This is not a good strategy for accurate reporting.

    And let's not get into accurate science concepts where even the correction has an error.



    Journalists also seem trained to write idiotic statements without thinking. Take for example, this item:
    "The San Diego region is poised for an economic boost next year as homeowners who lost houses in last week's dramatic wildfires there set about rebuilding, analysts said on Tuesday."
    If disasters create economic booms, then why not enable more flooded cities (Katrina) and forest fires, or for that matter, terrorist acts in the U.S.? Perhaps the U.S. military should use our "creaky infrastructure" for target practice? Heck, it would boost the economy!

    The problem that the reporter cannot sense is that economic data measure contemporary spending and fail to account for the preceding disaster losses. Incredibly, I have read this same nonsense claim from dim witted journalists after every major disaster in the United States!

    For example, suppose you fall into a hole. Climbing out of the hole only gets you to ground level - the trend is positive and looks great, but you are no higher than when you started. A disaster is a fake economic boom because economic statistics fail to account for the destruction of wealth during the disaster.

    I am aghast that the reporter was unable to question the "experts" on this - but alas, the young reporter undoubtedly treated the quoted economists and professors with deferential reverence.

    Update: Here is a contemporary item in 2007 regarding a reporter leaving out the actual numbers in an article about a presidential candidate.

    Afterword: The journalist whose article inspired me to write this entire series (the numbers that did not add up) has written an article with tips for reporters to do better on health care reporting - such as, don't start with the anecdote, and "understand the magnitude of impact". The latter would be, I guess, why her advocacy article on putting a defibrillator in each local health club began with an anecdote and never mentioned that such a program might save just 3 lives per century.

    This series has been very sad commentary on the state of journalism today. As the number of news readers drops each year, the solution is not to put bad reporting on the web faster nor to dumb down the stories even further with more anecdotes and "he said, she said" quotes. Adding multimedia to bad reporting is no improvement either.

    The real solution will require hard work - may be developing some genuine math and statistical skills - and writing reports based on where the data leads, not on where the pre-determined "narrative" takes us.

    Update: Professional journalists are fond of telling us that professional journalism does extensive research and fact checking and balanced reporting, unlike bloggers. Unfortunately, put some of these professionals on stage and their utter ineptness becomes the story, rather than the subject. Far more details of this journalistic disaster are here including the journalist's short online follow up. (I suspect this was a case where the audience was smarter than the reporter doing the on stage interview and she served only to embarrass herself with mindless questions and self promotion. That may work on CNN or Fox but not in front of a tech audience.)

    Labels:

    Introduction to Statistics (without math)

    A non-mathematical introduction to basic statistical concepts
    This short essay introduces basic statistical concepts without using much math. This is not a cookbook for working out statistics problems. This is a guide to understanding statistical concepts when you read the news, press releases, or scientific reports.

    Warning: The following text requires the use of brain cells, contains no anecdotes and no quotes from "experts". Reader discretion advised.

    Normal Distribution
    If a population under study has a "normal distribution", when we "sample" (other terms, survey, poll) the population by measuring only a subset of the total population, we expect that our sample measurements will follow the "normal distribution".

    In other words, if we sample say 100 people out of 1,200, we should get a cross section of the total population. We can use this sample to make an estimate on say, how the whole group of 1,200 might have voted.

    If we graph our samples, we will get a result that should approximate the normal distribution. In the normal distribution, most results fall in the broad middle, with fewer results at either end.

    The P(x) curve here illustrates an expected normal distribution. Other terms for the normal distribution are the "Bell curve" or Gaussian distribution.

    (Not all distributions are normal. There are other statistical methods for dealing with those situations that are not described here. Because of this, when conducting a survey or or other research, a scatter plot of the data is typically drawn to just observe the raw data. If it does not resemble a normal distribution, then other statistical procedures should be used.)

    Example
    Suppose there are 1,500 students in a school. We conduct a survey and ask 200 randomly selected students some questions to learn something about the students. There is a small chance that we could randomly select a biased sample, say lots more boys than girls. Because of this, the sample size affects our ability to confidently draw conclusions. With any size sample, it is possible that we did not draw a randomly distributed set - although with larger sample sizes, up to a point, we can be more confident that we really did get a random distribution of samples.

    Estimating the Sample Average

    Most everyone knows that to calculate the average of a set of values you merely sum all the values and divide by the number of values. That is simple enough.

    By averaging the samples from the overall population you do not, however, calculate the actual population mean. Instead, you have produced an estimate of the population mean. Depending on your sample size, your estimated average may be off (is almost certainly off) by some amount form the actual population mean.

    You could see this by running the survey several times. Each time we randomly select 200 students. The result of each survey would result in a slightly different estimate of the population mean. Yet if you went and sampled each and every member of the population - all 1,500 students - you'd arrive at the actual population mean and it would probably not be the same as your sample estimates.

    How far might our sample average be off from the actual mean? We will come back to that in a moment.

    Standard Deviation
    A reasonable question is "How random are our samples?" Some sample values will be less than the average, some close to the average, and some above the average. So how much do our sample values deviate from the average? That concept is captured in a calculation known as the standard deviation. We might have a set of say, student test scores, on a 100 point test. The average score might be 80, with a standard deviation of say, 7.

    What does that mean? Without explaining why, we expect that in a normal distribution, 2/3ds of of our scores will fall within + or - one standard deviation unit (in this case one unit = 7 so 80 +/- 7). 95% of all the scores should fall within + or - two standard deviation units (80 +/- 14) and 99.7% of all values should fall within +/- 3 standard deviations.

    There are two kinds of standard deviations: (1) sample standard deviation and (2) population standard deviation. If you did a survey of a subset you'll want to use (1). If you actually surveyed everyone, you'll use (2).

    Confidence Intervals
    From above, we know that our sample average (average of all subset samples) is not exactly the population mean (if we could sample every member of the population). Our estimated average will probably fall within some range around the exact population mean.

    How wide is the interval about the real mean? That depends on how confident we want to be that our estimate is good or bad. Let's say we know that the values in the overall population lie between 0 and 100. We can say, with 100% confidence, that our the population mean will therefore lie somewhere between 0 and 100! Thus, to achieve 100% confidence (in a worst case scenario) our confidence interval would be as wide as all possible values.

    If we are willing to give up some confidence that our estimate is correct, we can narrow the confidence interval.

    Without explaining the math, we can say, with 95% confidence that the actual population mean (if we sampled every individual) lies within about + or - two standard deviations of our estimated average.

    Let's go back to the school example. Suppose all 1,500 students took a test and we sampled 200 to learn that we have an estimated average test score of 80, with a standard deviation of 7. We can say that we are 95% confident that if we calculated a mean of all 1,500 test scores, the actual mean will lie within the range of (about) 80-14 to 80+14 or a range of 66 to 94.

    What if we wanted to be only 80% confident that the real mean is within our interval? Since we are willing to be less confident, we can narrow the size of our interval to 80 - 9 to 80 + 9 or 71 to 89.

    Another way to look at the confidence interval is as follows:
    * At the 99% confidence level, there is a 1 in 100 chance we are wrong
    * At the 95% confidence level, there is a 1 in 20 chance that our estimated average is wrong (outside the interval range)
    * At the 90% confidence level, there is a 1 in 10 chance that our estimated average is wrong (outside the interval)
    * At the 80% confidence level, there is a 1 in 5 chance we are wrong
    * At the 66% confidence level, there is a 1 in 3 chance we are wrong
    * At the 51% confidence level, there is a 1 in 2 chance that we are wrong (equal odds)

    What Confidence Level Should Be Used?
    By statistical convention or long term standards, the most commonly used confidence interval is 95%. There are some situations where you might choose less (90%, although rarely) or 99% (fairly common in health care) and sometimes even 99.9%. In some situations where the result is not too important - like say a market research survey - someone may have reason to choose an 80% confidence level. However, levels below 90% are considered pretty worthless for real results.

    WHAT TO LOOK FOR
    Look for the confidence interval in the report. If you do not see it mentioned, either ignore the report or look for a more detailed report.

    If the confidence interval is less than 95%, try to understand why. A sleight-of-hand to make a finding sound important by claiming statistical significance is to change the confidence limit to 90% or even 80%. This may or may not be mentioned. What was not significant at the 95%, or 90% level might be significant at the 80% level - but that's a worthless result since there is a 1 in 5 chance we are completely wrong.

    Survey results are often presented as "Candidate X polls 45% +/- 4.5 percentage points". Without knowing the confidence level, this is not useful information. Was this at the 95% confidence level? 80%?

    What is the sample size? In some fields, like health care, numerous studies are completed with very small sample sizes ranging from n=1 to 30. Be very careful about interpreting these studies. Typically they are used to justify funding for further studies - their results are usually of no value in making future health care decisions.

    (There are also two kinds of confidence intervals - one-sided and two-sided - which will not really be dealt with in this introduction.)

    Hypothesis Testing

    When many people hear the word "hypothesis" they may think "science", but the concept of a hypothesis is not restricted to science. A marketing manager might form a hypothesis: Our new ad campaign resulted in an increase in sales - or not. The marketing manager may want to know whether a change in sales was merely random or due to the ad campaign.

    Hypothesis Testing is used to make tests about what we think we know. A hypothesis is either true or not true. A hypothesis is never probably true or probably false - a hypothesis only rejected or accepted.

    A hypothesis contains two parts:
    (1) The null hypothesis - this is true unless we have convincing evidence it is not true. Example - the increase in sales was not due to the ad campaign.
    (2) The research hypothesis or alternative hypothesis - this will be accepted if we have convincing evidence. Example - the increase in sales was due to the ad campaign.

    We can accept the null hypothesis or reject the null hypothesis and accept the research hypothesis. (We do not reject the research hypotheis - we only have the chance to accept it.) The goal of an experiment is to reject the null hypothesis.

    Hypothesis Example

    We've made a change to our classes at the school and we hope this leads to an increase in test scores.

    Null hypothesis - the change made no difference, the mean equals 80.

    Research hypothesis - the change made a difference, the mean is not equal to 80.

    We conduct a new survey of a sample of students and we calculate a new average of 85. Do we accept or reject the null hypothesis?

    Let's assume that our standard deviation is still 7. Since the new estimate lies within the 95% confidence interval we accept the null hypothesis. There is not a statistically meaningful difference.

    Suppose instead our new average was 97. Do we accept or reject the null hypothesis? Since the new estimate lies outside the 95% confidence interval, we reject the null hypothesis and accept the research hypothesis.

    The reality: a lot of people would assume that the new average of 85 was meaningful, but it could be entirely by chance of how we randomly sampled the population. The statistical tests help us to understand the impacts of chance on our results and to make statistically valid conclusions.

    The p-value
    In hypothesis testing, the confidence interval value (e.g. 95%) is typically referenced as the opposite of 95% - that is 5%. We end up saying "we accept the null hypothesis at the p> 0.05 level" (or p>5%) or similar. Accepting the null hypothesis is the same as saying that the results were "not statistically significant".

    The p-values are, in turn, translated into English statements as follows:






    p>5%Not signficant
    p<5%Significant at the 5% level
    p<1%Highly significant that the 1% level
    p <0.1%Very highly significant at the 0.1% level


    IMPORTANT POINT
    The statistical words "significant", "highly significant" and "very highly significant" do NOT mean that the conclusion is important, profound, will change the world, or what ever hype sounding adjective you want to use. The words provide us with information about the confidence of our rejecting the null hypothesis. That is all. This is a critically important point - in news stories in the press you will frequently see a statistically "significant" finding translated into "important breakthrough" or "significant finding" or similar, when it is neither.

    WHAT TO LOOK FOR
    Occasionally, you may spot the use of a 90% - or worse, 80% - confidence interval - in order to claim significance in a research finding. If that is a scientific finding, you should normally expect at least a 95% confidence - and in health care, you should expect at least a 95% and even a 99% confidence.

    When you read a report that says "the researchers are 80% confident" in the result, you should understand that this means, by normal science standards, no one else has any confidence in the result because this means there is 1 in 5 chance the researchers are wrong. Watch those confidence intervals!

    Other Statistical Tests
    Another common test is to compare two populations to each other. For example, suppose we have two factories that produce robots. Factory A has a defect rate of 2.3% and Factory B has a defect rate of 2.8%. Are these differences statistically significant?

    Other statistical techniques are used to make sense of sequences of data. A very common type of simple data analysis is the regression. Suppose we have sales over ten years - what sort of sales might we expect next year and the year after? We can draw a trend line through the years on a graph and see where the trend looks to be going. Using statistical methods, we can develop and equation that enables us to predict a future value - and understand a confidence interval for our prediction and give us an idea of the reliability of our future predictions. The reliability or "closeness" of the prediction is often measured with a statistic called the "R-squared" value. (You may see references to it from time to time.) An R^2 value close to zero means our regression - and hence prediction - is worthless. An R^2 value close to 1 is really good. There are also other measures sometimes used.

    Many types of regression are simple. Such as the sales problem.

    Some though are more complex. Suppose back at the school, we decide to make many changes to our school to improve test scores. We change the number of class periods, we offer a new 7 am class option, we hire three extra tutors, and we create new entrance requirements for students wishing to take pre-calculus classes, plus we change text books in the English class. Now, we look at our test results over a few years. Do we see a correlation between our changes and the test results?

    We may wish to write a formula to help us predict test scores - something like:

    Test scores = NumPeriods * 7AMOption * MoreTutors * Requirements * Textbooks

    This becomes a multiple regression problem rather than a simple regression problem. Statistical methods help us to identify how much weight to give to individual components of this equation (maybe the textbook is not relevant to the test scores?)

    The above regression methods are designed to work with data that shows linear relationships (e.g. a trend line that sort of looks like it follows a line on a graph). When the data is non-linear, then other methods must be used. Non-linear data would be, for example, a trend that goes up and down over time.

    When reviewing studies showing a trend, keep a close eye on the use of linear regression. A convenient sleight of hand is to select the beginning and ending points on non-linear datasets - thus cherry picking a selection of data that just happens to show an up or down trend and performing linear regression. If the rest of the data before and after is not disclosed, someone is probably trying to be deceptive. This is deception is quite common too. (More on this in a moment.)

    In another example, suppose we would like to know if there is a relationship between the Federal budget and the Dow 30 Industrials stock market tally. We are asking, "Is there a correlation between the budget and the stock market?"

    To answer that question we use statistical methods to evaluate the correlation between the two data series, arriving at a correlation coefficient ranging from -1 to 1. A value near zero means little or no correlation. A value closer to 1 means a good correlation between the two series. A value closer to -1 means they are inversely correlated (e.g. as the budget goes up, the Dow goes down).

    When you read about correlations you should wonder about the value of the correlation coefficient. A low value is essentially worthless, but activists may tell you the data is correlated (which it is, sort of, but not very well.)

    Some correlations are nonsensical. I once read in my economics text that some one had correlated the price of butter in India with the Dow 30 over a significant time period. A totally meaningless correlation. Unfortunately, there are a great many papers written that identify statistically significant - but meaningless - correlations. Watch out for these. Gary Taubes new book "Good calories, Bad Calories" describes a lot of meaningless correlations being used to establish government nutrition policies (e.g. Food Pyramid) that may now be incorrect due to their lack of meaningful evidence.

    Correlations can be drawn between many items but it does not mean there is a connection between them unless you can identify a mechanism that links them together. For example, there is a nearby semiconductor manufacturing plant and slightly more people in the neighborhood have toenail fungus than in other neighborhoods. Statistics can identify the correlation - but that does not mean the semiconductor plant is causing toenail fungus unless you can identify a possible direct source of causation. It may just be an interesting correlation between data series and no more.

    The increase in bloggers correlates very well with estimated annual global temperature, for example but blogging is probably not a real cause other than its often connected to "hot air" (that is a joke!)

    Some data does not follow nice trend lines. For example, sales of downhill ski equipment are much stronger during the winter. Thus, sales records of ski equipment go up and down every year. This type of data - that basically repeats on a periodic basis - is a time series. Special techniques are used for analyzing time-series data.

    There are many types of data that do following trend lines - as in linear trend lines. Some data, if drawn on a graph is wavy in nature, or rises or falls steeply. This requires non-linear trend analysis methods.

    Some times it is not clear if there is a wave - or even multiple wave-like properties to data. For example, those who try to look at trends in Atlantic hurricances have about 130 years of data to look at (good data after WW II and poor data prior). Hurricanes are thought to have various cycles ranging over long periods of time. If we look at only a sample of a long period, depending on where we select the beginning and end points of our data, we may be looking at an increase or a decrease. For this reason, observing trends on many types of data requires careful consideration and recognition of possible sources of misinterpretation.

    Be very careful when you see a linear trend or linear regression used to make future forecasts when the data being reviewed may not be linear in nature.

    Consider another problem. We have 4 factories producing cereals. We sample 100 boxes of cereal from each factory for quality and taste. We'd like to check for meaningful differences. To do this, we would use a technique known as "Analysis of Variance".

    Finally, there are a variety of advanced statistical methods, plus special methods used in specific fields. For example, in health care, a statistic called the "odds ratio" is often used. Another is "Number Need to Treat" or NNT. For example, for a specific drug, you may need to treat 20 people to for each patient that shows a positive outcome. That would be an NNT of 20, and you should interpret that as meaning the drug shows value for about 5% of the patients who take it and no value to the others. (You might be surprised to learn that most drugs do not work for most patients who take them.)

    Useful Resources
    I did not use this in writing the above, but it may of interest to others who would like some of the mathematical background which I intentionally omitted.

    HyperStat Online Statistics Book (free!)

    Where does the IPCC terminology "Very highly confident" come from?

    Where does the terminology "Very highly confident" come from?

    Note: This article appeared originally on the Wordpress blog that was killed by iPowerweb. It was of a few articles that I have restored from backups since it was a popular landing spot for Internet searches.

    On February 2, the International Panel on Climate Change's (IPCC) issued a 21-page Summary for Policymakers (SPM) written by Fourth Assessment Report science section leads and government political or staff officials from 113 countries.

    In the SPM, they issue pronouncements with probabilities and degrees of confidence expressed as "likely", "very likely" and "virtually certain". In a footnote, they relate these adjectives to confidence levels of 67%, 90% and 99%, which is detailed in the UncertaintyGuidanceNote.pdf IPCC policy for team members document.

    Where does this terminology come from?
    In statistics, similar terminology is used but with different meanings to make claims of statistically "significant" (95%), "highly significant" (99%), and "very highly significant" (99.9%).

    Many of us were confused by the similar - but not the same - terminology used in the SPM and posted questions at RealClimate asking for an explanation of what the SPM terminology meant since the meaning is not provided by the IPCC documents. No answer was provided by RealClimate.

    Where does this terminology come from and what does it mean?
    Per a footnote in the SPM, as well as this paper by Dr. Steven Schneider, Professor of Biology at Stanford University, most of these estimates came from subjective "expert judgment". Dr. Schneider's paper appears to have been written for the IPCC and used as the basis of the terminology used by the IPCC. While some estimates may be data-derived, the SPM does not say which are based on data and which are based on subjective analysis. (Read Dr. Schneider's paper for yourself to understand the recommended best practices.)

    To learn more about the consensus process, expert judgment, the use of the Delphi method, the use of qualitative and quantitative analysis, read on.

    The Delphi Method
    Based on IPCC documents and the "consensus" terminology, this comes from use of the Delphi Method, developed by the RAND Corporation for the U.S. Department of Defense, about half a century ago to make predictions about the future or other events for which there is insufficient data to make a statistical forecast.

    In the Delphi Method, a moderator or facilitator exchanges questions anonymously amongst the participants (in this context, typically a panel of "experts"), summarizes the answers, and sends the summary back to the participants. The process repeats as participants may change their perspectives on each iteration, perhaps because they learned something from the other responses. Over time, the method may lead to the anonymous members of the group finding "consensus" on some questions while not achieving consensus on other topics. The facilitator makes a judgment as to when the review process should be halted as no further progress is being made.

    The IPCC operated in a similar manner to identify "consensus" (although per Dr. Schneider's paper, see page 47, not necessarily anonymous, which could result in group dynamics and questions of independence in the views expressed). The result is a set of probabilities and confidence levels that are determined subjectively through qualitative analysis, rather than analytically (quantitative analysis). There is no problem using these methods but they must be disclosed by the IPCC reports, per Dr. Schneider's recommendation. These methods were not disclosed in the SPM. A qualitative analysis is likely to be viewed and interpreted differently than a quantitative analysis.

    Per Dr. Schneider's paper (see page 36 concerning the inability to produce objective data based on observations):
    It is certainly true that "science‚" itself strives for objective empirical information to test theory and models. But at the same time "science for policy‚" must be recognized as a different enterprise than "science‚" itself, since science for policy (e.g., Ravetz, 1986) involves being responsive to policymakers" needs for expert judgment at a particular time, given the information currently available, even if those judgments involve a considerable degree of subjectivity.

    Reporting Qualitative Analysis Estimates
    Because a panel of subjective judgment is likely to produce a range of probabilities and confidence levels, the formal report should include the full range of values and a traceable record as to why these subjective values were chosen (per Dr. Schneider's paper and also several papers on application of the Delphi Method.) The SPM, however, omits the range of values provided and the record as to how these estimates were offered; presumably this will appear in later reports. (Update: The range of values and how the estimates were created was never released by the IPCC. The IPCC also refused to release "reviewer comments" made by those who reviewed the IPCC draft reports. The review comments were eventually released to the public only after a Freedom of Information Act request was successfully made in the United States. The reviewer comments showed that relatively few participants took an active part in the review process and that some sections had significant dissent from the final report but that the dissenting views were ignored without even providing a reference to published research or other citations as to why the dissenting scientific views should be ignored.)

    The subjective estimates are then combined to calculate a "best guess" (Dr. Schneider's wording) as a mean, median, mode, etc.

    Dr. Schneider writes:
    "It is important to note that by providing only a truncated estimate of the full range of outcomes (e.g., not specifying outliers that include "surprises‚", and thus making the range of outcomes described smaller), one is not conveying to potential users a representation of the full range of uncertainty associated with the estimate. This has important implications regarding the extent to which the report accurately conveys uncertainties. Some authors are likely to feel uncomfortable with the full range of uncertainty, because the likelihood of a "surprise‚" or events at the tails of the distribution may be extremely remote or essentially impossible to gauge experimentally, and the range implied could be extremely large.
    Related to Dr. Schneider's comment is that the SPM generally avoids comment on areas that are uncertain - meaning equal chances in the 34% to 66% range - which can bias the presentation of the results. When only likely or unlikely values are shown, the reader may not grasp the full span of possibilities including the unknowns in the middle.

    The end result of the Delphi Method is to combine the subjective estimates from the panel of experts (hence "expert judgment"). While similar to a poll or survey on the opinions or beliefs of a group of people, if done correctly, the experts should have provided a traceable account to provide a justification for their expert opinion. Other wise it is just a poll of what people believe - its not science at all.

    Summary
    • The probability and confidence levels in the Summary for Policymakers are not (in general) statistically derived but are from subjective analysis made by experts in the field. The probability and confidence levels are literally "gut feel" guesses.
    • The Delphi Method is a popular method of organizing a feedback process amongst participants with the goal of identifying "consensus" around some of the issues at hand.
    • The Delphi Method has achieved success in some endeavors, and well known failures. For example, expert panels were used to estimate the risk of Space Shuttles disintegrating, nuclear plant safety, and nuclear waste storage safety and their estimates proved to be disastrously wrong. Some environmental groups attacked those original estimates in part, because they were based on "fallible expert judgment".
    • Where consensus is not reached, it is important for panel reports to reflect the full range of perspectives. Without th