8.2.3 Figures of Merit
There have been many attempts to obtain a “figure of merit” for climate models.
Usually such quantification is only attempted for wellobserved atmospheric
variables and range from calculation of simple root mean square errors (r.m.s.)
between a model variable and an observation, to more complex multivariate calculations.
Among the most promising attempts at generating skill scores deemed more suitable
for climate models are: the normalised mean square error approach of Williamson
(1995) that follows on, in part, from Murphy (1988); and the categorisation
of models in terms of combination of the error in the time mean and the error
in temporal variability along the lines suggested by Wigley and Santer (1990)
(see Chapter 5, Section 5.3.1.1. of the SAR for an example). Other less widely
used nondimensional measures have also been devised (e.g., Watterson, 1996).
Although a number of skill scoring methods have been devised and used for the
seasonal prediction problem (e.g., Potts et al., 1996; linear error in probability
space score  LEPS) these have not found general application in climate models.
Attempts to derive measures of the goodness of fit between model results and
data containing large uncertainties have been partially successful in the oceanographic
community for a limited number of variables (Frankignoul et al., 1989; Braconnot
and Frankignoul, 1993). Fuzzy logic techniques have been trialled by the palaeoclimatology
community (Guiot et al., 1999). It is important to remember that the types of
error measurement that have been discussed are restricted to relatively few
variables. It has proved elusive to derive a fully comprehensive multidimensional
“figure of merit” for climate models.
Since the SAR, Taylor (2000) has devised a very useful diagrammatic form (termed
a “Taylor diagram”  see Section 8.5.1.2 for
description) for conveying information about the pattern similarity between
a model and observations. This same type of diagram can be used to illustrate
the relative accuracy amongst a number of model variables or different observational
data sets (see Section 8.5.1). One additional advantage
of the “Taylor diagram” is that there is no restriction placed on
the time or space domain considered.
While at times we use a figure of merit to intercompare models for some selected
variables, we usually apply more subjective assessments in our overall evaluations;
we do not believe it is objectively possible to state which model is “best
overall” for climate projection, since models differ amongst themselves
(and with available observations) in many different ways. Even if a model is
assessed as performing credibly when simulating the present climate, we cannot
be sure that the response of such a model to a perturbation remains credible.
Hence we also rely on evaluating models in their performance with individual
processes (see Chapter 7) as well as past climates as
in Section 8.5.5.
