As part of my running program, I’ve been diving into and attempting to decipher the bewildering array of metrics that are tossed around. Terms such as VDOT, CTL, ATL, TSS, Suffer Score, and the like have been preoccupying much of my interest over the past several days. Many of these metrics attempt to compress a number of different measures of performance into a single number which an athlete can use to evaluate themselves. While on my morning jog, I was ruminating on the similarities between these metrics and cyber security metrics.
My natural inclination is to look at all single number metrics with a healthy degree of skepticism. Trying to distill the range of possible outcomes in a complex and chaotic environment such as a typical organization’s technology stack into a single number is fraught with peril (cue Castle Anthrax references). Yet such reductions are persistent, including efforts such as the Index of Cyber Security, the Bitsight Security Ratings, and countless others. There’s something very appealing about boiling the ocean of distributions, scenarios, and uncertainties down to a single number that answers…well..what exactly?
What was once a knee-jerk reaction to such efforts has mellowed over the years. Why, now I even even to look at them with a certain affection, much like precocious, albeit occasionally unintentionally destructive, puppies. Much like running metrics, using a number to benchmark relative performance against oneself (e.g. last season I was at a risk score of 55 and this season I’m at a 65, therefore I’m getting better) has a utility that I try not to discount in a rush to ranges, confidence intervals, and multiple values. At a certain level of decision making, a single number may be the right amount of additional information to help move the conversation forward.
There are a few aspects that I look for to evaluate a single metric index as helpful vs. harmful, including:
- Is the methodology open? If a measured party is so inclined, being able to reconstruct the metric from the inputs and drill down deeper into the components helps keep the option for a deeper conversation open.
- Is the metric used only to benchmark a single entity over time? If attempting to measure different entities against one another, is the space constrained in a reasonable way that comparisons are defensible?
- Have cut offs for good/bad values been made? Comparisons against multiple parties at a given point of time or for a single entity over time are much easier to understand than arbitrary cut offs (90 + is good while <89 is bad).
If the only opportunity you have to present information to a decision maker is in a single data point, then an index may be a useful tool. I still try to avoid them whenever practical, often presenting aggregate measures simultaneously with supplemental information via hyperlinks or in a less attention grabbing portion of a page (taking a cue from Ben Shneiderman’s famous “details on demand” maxim). Much like Christmas candy, a little indexing can be enjoyable, but a diet of only gingerbread indices does not a program make! Now you must excuse me, I have to go work off a few more holiday calories…