Friday, October 19, 2007

Where does the IPCC terminology "Very highly confident" come from?

Where does the terminology "Very highly confident" come from?

Note: This article appeared originally on the Wordpress blog that was killed by iPowerweb. It was of a few articles that I have restored from backups since it was a popular landing spot for Internet searches.

On February 2, the International Panel on Climate Change's (IPCC) issued a 21-page Summary for Policymakers (SPM) written by Fourth Assessment Report science section leads and government political or staff officials from 113 countries.

In the SPM, they issue pronouncements with probabilities and degrees of confidence expressed as "likely", "very likely" and "virtually certain". In a footnote, they relate these adjectives to confidence levels of 67%, 90% and 99%, which is detailed in the UncertaintyGuidanceNote.pdf IPCC policy for team members document.

Where does this terminology come from?
In statistics, similar terminology is used but with different meanings to make claims of statistically "significant" (95%), "highly significant" (99%), and "very highly significant" (99.9%).

Many of us were confused by the similar - but not the same - terminology used in the SPM and posted questions at RealClimate asking for an explanation of what the SPM terminology meant since the meaning is not provided by the IPCC documents. No answer was provided by RealClimate.

Where does this terminology come from and what does it mean?
Per a footnote in the SPM, as well as this paper by Dr. Steven Schneider, Professor of Biology at Stanford University, most of these estimates came from subjective "expert judgment". Dr. Schneider's paper appears to have been written for the IPCC and used as the basis of the terminology used by the IPCC. While some estimates may be data-derived, the SPM does not say which are based on data and which are based on subjective analysis. (Read Dr. Schneider's paper for yourself to understand the recommended best practices.)

To learn more about the consensus process, expert judgment, the use of the Delphi method, the use of qualitative and quantitative analysis, read on.

The Delphi Method
Based on IPCC documents and the "consensus" terminology, this comes from use of the Delphi Method, developed by the RAND Corporation for the U.S. Department of Defense, about half a century ago to make predictions about the future or other events for which there is insufficient data to make a statistical forecast.

In the Delphi Method, a moderator or facilitator exchanges questions anonymously amongst the participants (in this context, typically a panel of "experts"), summarizes the answers, and sends the summary back to the participants. The process repeats as participants may change their perspectives on each iteration, perhaps because they learned something from the other responses. Over time, the method may lead to the anonymous members of the group finding "consensus" on some questions while not achieving consensus on other topics. The facilitator makes a judgment as to when the review process should be halted as no further progress is being made.

The IPCC operated in a similar manner to identify "consensus" (although per Dr. Schneider's paper, see page 47, not necessarily anonymous, which could result in group dynamics and questions of independence in the views expressed). The result is a set of probabilities and confidence levels that are determined subjectively through qualitative analysis, rather than analytically (quantitative analysis). There is no problem using these methods but they must be disclosed by the IPCC reports, per Dr. Schneider's recommendation. These methods were not disclosed in the SPM. A qualitative analysis is likely to be viewed and interpreted differently than a quantitative analysis.

Per Dr. Schneider's paper (see page 36 concerning the inability to produce objective data based on observations):
It is certainly true that "science‚" itself strives for objective empirical information to test theory and models. But at the same time "science for policy‚" must be recognized as a different enterprise than "science‚" itself, since science for policy (e.g., Ravetz, 1986) involves being responsive to policymakers" needs for expert judgment at a particular time, given the information currently available, even if those judgments involve a considerable degree of subjectivity.

Reporting Qualitative Analysis Estimates
Because a panel of subjective judgment is likely to produce a range of probabilities and confidence levels, the formal report should include the full range of values and a traceable record as to why these subjective values were chosen (per Dr. Schneider's paper and also several papers on application of the Delphi Method.) The SPM, however, omits the range of values provided and the record as to how these estimates were offered; presumably this will appear in later reports. (Update: The range of values and how the estimates were created was never released by the IPCC. The IPCC also refused to release "reviewer comments" made by those who reviewed the IPCC draft reports. The review comments were eventually released to the public only after a Freedom of Information Act request was successfully made in the United States. The reviewer comments showed that relatively few participants took an active part in the review process and that some sections had significant dissent from the final report but that the dissenting views were ignored without even providing a reference to published research or other citations as to why the dissenting scientific views should be ignored.)

The subjective estimates are then combined to calculate a "best guess" (Dr. Schneider's wording) as a mean, median, mode, etc.

Dr. Schneider writes:
"It is important to note that by providing only a truncated estimate of the full range of outcomes (e.g., not specifying outliers that include "surprises‚", and thus making the range of outcomes described smaller), one is not conveying to potential users a representation of the full range of uncertainty associated with the estimate. This has important implications regarding the extent to which the report accurately conveys uncertainties. Some authors are likely to feel uncomfortable with the full range of uncertainty, because the likelihood of a "surprise‚" or events at the tails of the distribution may be extremely remote or essentially impossible to gauge experimentally, and the range implied could be extremely large.
Related to Dr. Schneider's comment is that the SPM generally avoids comment on areas that are uncertain - meaning equal chances in the 34% to 66% range - which can bias the presentation of the results. When only likely or unlikely values are shown, the reader may not grasp the full span of possibilities including the unknowns in the middle.

The end result of the Delphi Method is to combine the subjective estimates from the panel of experts (hence "expert judgment"). While similar to a poll or survey on the opinions or beliefs of a group of people, if done correctly, the experts should have provided a traceable account to provide a justification for their expert opinion. Other wise it is just a poll of what people believe - its not science at all.

Summary
  • The probability and confidence levels in the Summary for Policymakers are not (in general) statistically derived but are from subjective analysis made by experts in the field. The probability and confidence levels are literally "gut feel" guesses.
  • The Delphi Method is a popular method of organizing a feedback process amongst participants with the goal of identifying "consensus" around some of the issues at hand.
  • The Delphi Method has achieved success in some endeavors, and well known failures. For example, expert panels were used to estimate the risk of Space Shuttles disintegrating, nuclear plant safety, and nuclear waste storage safety and their estimates proved to be disastrously wrong. Some environmental groups attacked those original estimates in part, because they were based on "fallible expert judgment".
  • Where consensus is not reached, it is important for panel reports to reflect the full range of perspectives. Without the full range, an incomplete view of the consensus or non-consensus is presented to the world. (Of interest, just five IPCC reviewers made comments on all chapters of the report and just 32 made comments on three or more chapters.)
  • None of these comments question the rise in CO2, changes in earth's temperature, effects of CO2 on climate, or northern hemisphere warming trends. This was written in an attempt to understand the unusual terminology used in the SPM and to share the apparent explanation for that terminology. It's based on expert judgment, created from a Delphi Method-like panel. Hopefully later IPCC reports will provide clarity on how the estimates were reached and the justifications for the qualitative selection of probabilities and confidence levels. The important point is that the confidence levels are personal judgments (a.k.a. gut level impressions) and not scientifically derived.
Update: A day later, Dr. Pielke has related comments that echo Dr. Schneider's comments about the importance of the full range of perspectives appearing in a report based on expert judgment panels.', '"Confident", "Very confident" and "Very highly confident"