
WESTERN REGION TECHNICAL ATTACHMENT
NO. 02-01
JANUARY 8, 2002
A PROPOSED GRIDDED
QPF VERIFICATION SCHEME
FOR THE GFE IN THE WESTERN REGION
Linda Cheng, Weather Forecast Office, Salt Lake City, UT
![]()
Introduction
The proposed implementation of the Interactive Forecast Preparation System (IFPS)
at National Weather Service forecast offices (NWSFOs) nationwide will allow
forecasters to issue gridded forecast products at relatively high resolutions.
The greater spatial detail provided by these gridded forecasts requires that
verification be performed over the gridded data set. Current verification procedures
typically focus on selected surface stations, but this type of verification
is incapable of providing a detailed account of the spatial accuracy of a forecast.
A method to verify quantitative precipitation forecasts (QPFs) derived from
the Graphical Forecast Editor (GFE) component of the IFPS is described in this
Technical Attachment.
Methodology
One of the purposes of verification is to provide forecasters with feedback
regarding the skill and accuracy of their forecasts so that improvements can
be made in the future. A challenge of forecasting precipitation, especially
in Western Region, is that the amount and distribution of precipitation depends
heavily on orographic influences. Even within a forecast office's county warning
area (CWA), significant local variations in precipitation totals are often observed.
Because of this variability, verification statistics derived over a large area
are not very meaningful to forecasters. In order to provide detailed and meaningful
verification for regions of complex terrain, statistics must be calculated for
different regions with similar precipitation climatologies.
The proposed verification scheme allows each office to divide its county warning
area (CWA) into a maximum of six climatologically-distinct regions. Since precipitation
is related to elevation, each office can define three elevation ranges which
cover, for example, the areas containing the valleys, the mountain slopes or
plateaus, and the mountain crests. Furthermore, each CWA can be divided into
two planar regions, e.g., north and south, east and west, etc.
Verification statistics generated for these regions must be easily interpreted
to be useful in the operational sense. It is important for forecasters to know
whether their QPFs were over- or underestimates and whether the precipitation
was forecast over the correct location. Consideration must also be taken for
forecasts that have the correct intensity but incorrect location and vice versa.
This must be done using a suite of skill scores, since no one score can give
an accurate assessment of the entire situation. The statistics generated in
the proposed verification scheme are described below.
Mean and Max
Mean and maximum values determined from both the forecast and observed precipitation
are useful for determining errors in intensity. These values are independent
of the locations of the forecast and observed precipitation relative to each
other, so forecasts are not be punished by slight errors in location within
each verification zone. Larger displacements, however, would be punished by
this measurement, since verification is performed by region. For example, if
the heaviest precipitation was forecast for the mountain crest but observed
on the slopes, then the forecast would be punished more severely.
One important point to note is the way the mean is determined in this scheme.
A mean is typically calculated using all of the available points. However, because
precipitation is not spatially continuous, many grid points may have values
of zero. Therefore, the size of the precipitation area influences the mean.
For example, a small area of heavy precipitation might have the same mean as
a large area of light precipitation. Thus, a mean averaged over an entire region
is not useful. However, a mean calculated using only those grid points with
nonzero values of precipitation gives a more useful measure of the average precipitation
that was forecast or observed.
Mean Error, Mean Absolute Error, and Root-Mean-Squared Error
The mean error (ME), mean absolute error (MAE), and root-mean-squared error
(RMSE) are standard statistical measurements widely used for verification (Wilks
1995). These scores are defined as:

where fi and oi are the forecast and observed values at each grid point, respectively, and N is the total number of grid points.
All three of these scores are different ways of measuring the error between
the forecast and observed total of precipitation at each grid point. Because
these scores are averaged over each grid point, they are location-dependent,
meaning they punish severely for precipitation forecast over the incorrect location.
Bias and Threat Score
Bias and threat scores are important for determining the spatial accuracy of
a forecast (Wilks 1995). These statistics are generated for different thresholds
of precipitation. For example, statistics calculated for a threshold of 0.01
in. show the spatial accuracy of the entire area of measurable precipitation,
and the higher thresholds show how well the locations of precipitation maxima
were forecast.
The bias measures how well the size of the area of forecast precipitation matches
that of the observed. Two types of bias scores are calculated by the verification
scheme. One takes the difference between the number of grid points forecast
to have precipitation and the number of grid points having observed precipitation,
while the other takes the ratio of forecast grid points to observed.
The threat score measures the overlap of forecast and observed areas of precipitation
and is given by:

where H is the number of "hits," or the number of grid points where
both observed and forecast precipitation met or exceeded the threshold amount,
F is the number of grid points that were forecast to meet or exceed the threshold
amount, and O is the number of grid points that had observed totals meeting
or exceeding the threshold amount. Threat scores range between 0 and 1, with
0 meaning there were no "hits," and 1 meaning that all of the forecast
and observed points were "hits."
GFE Procedures
Both the forecast and observed precipitation can be generated and output to netCDF files using the GFE. Gridded precipitation analyses from the National Centers for Environmental Prediction (NCEP) are currently available for viewing on the GFE from the D2D directories. The variable, tp (total precipitation), has units of kg m^-2. The data can easily be converted to inches and saved into an IFPS database through Smart Initialization (Forecast Systems Laboratory 2001). The forecast and/or observed precipitation for each GFE grid can be summed into the correct time length for verification. For example, if QPF is issued as 6-h grids and the observed precipitation is only available as 24-h totals, the four 6-h QPF grids can be summed up into a 24-h grid using a simple GFE Smart Tool.
Each office can define their own planar verification areas by using GFE "edit
areas" (Forecast Systems Laboratory 2001). For example, an office might
choose to aggregate all of the northern forecast zones for one verification
region and all of the southern zones for another. These regions are saved as
named edit areas which can be called up each time the verification is performed.
Forecast and observed precipitation is then be output to separate netCDF files
using the ifpnetCDF program (Forecast Systems Laboratory 2001) with the -m switch,
or mask, on the command line set to one of the verification areas. The -g switch
must also be set so that topographical information is also included in the file.
A PERL script is then run which reads the files and calculates the statistics
for each elevation zone in each planar area.
Example Output
An example of the statistics generated by the verification code is provided below. As a test case, the verification was performed over the NWSFO Salt Lake City CWA for the unmodified MesoEta forecast imported into the GFE at 5-km resolution. The forecast was valid for the 24-h period ending at 1200 UTC 23 November 2001. The observed precipitation data were obtained from the 24-h gauge-only Stage IV precipitation analysis produced by the River Forecast Centers (National Centers for Environmental Predicton 2001). Figures 1 and 2 show the forecast and observed precipitation for this period over the western United States. The Salt Lake CWA was divided into the northern and southern areas shown in Fig. 3. Elevation ranges used in the verification are 0-5000 ft, 5000-7000 ft, and 7000-15000 ft.
The statistics for the northern part of the Salt Lake CWA are shown in Tables
1, 2, and 3,
and those for the southern part of the CWA are shown in Tables 4,
5, and 6.
In addition to the scores mentioned previously, the number of hits, false alarms, misses, and the total number of observed and forecast gridpoints exceeding each threshold is also included in the tables. These additional numbers are helpful in showing the size of the area of forecast or observed precipitation.
From the statistics generated for the test case, one can deduce from the mean
and max values that the intensity of precipitation was underforecast at all
six verification regions. However, the extent of the underforecasting appears
to be greater in the north than the south. The values for the ME show that in
the north, precipitation was underforecast at each grid point on average, and
the values of MAE and RMSE show that the extent of the errors increase with
elevation. In the south, however, ME values show that each grid point was, on
average, overforecast, even though the mean and max values show that the intensity
of the total area of precipitation was underforecast. This apparent contradiction
could be due to large errors in the placement or size of the precipitation areas.
As in the north, the extent of errors at each grid point also increases with
elevation.
The areal bias statistics show a general decrease with increasing threshold
values at all elevation ranges in the north. This indicates that the total size
of the precipitation areas were overforecast, but the areal coverage of the
higher values of precipitation was underforecast. In the south, however, the
largest bias ratio was at the 0.10-in. threshold.
Threat scores were generally high in the north for all thresholds except values
above 1.00 in. Threat scores were lower in the south. In the north, the best
forecasts were for the mid-elevation range, whereas in the south, the forecast
was better at the highest-elevation zones.
Discussion
A feasible way of verifying precipitation
for gridded GFE forecasts was presented above. However, the success of the verification
scheme requires a better gridded precipitation analysis than those currently
available. Furthermore, resolution is important in any gridded verification
scheme. Western Region forecast offices will likely begin running the GFE operationally
at approximately 5-km resolution. Obviously, higher horizontal resolution would
allow for better representation of smaller-scale precipitation features such
as convection in the verification scheme. Future increases in computing power
at the local offices should allow for the needed improvements to grid resolution.
The GFE is still undergoing development, so slight changes in the verification
process may be necessary in the future. In any case, gridded verification by
climatological regimes should prove useful for forecasters when the GFE becomes
operational.
Acknowledgments
Thanks to Andy Edman, Steve Vasiloff, and Jason Burks for their comments, and Kirby Cook for his review of this TA and help with the PERL code.
References
Forecast Systems Laboratory, Enhanced
Forecaster Tools Branch, cited 2001: GFESuite Information. [Available on-line
from
http://www-md.fsl.noaa.gov/eft/rpp/doc/onlinehelp_RPP14/GFESuite.html].
National Centers for Environmental
Prediction, Environmental Modeling Center, Mesoscale Modeling Branch, cited
2001: National Stage II Analyses ("Stage IV"). [Available on-line
from http://www.emc.ncep.noaa.gov/mmb/stage2/].
Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic
Press, 467 pp.