Baseball team and player performance examined realistically and accurately.Search this site, or just roll your cursor over the colored boxes below the pictures.
"But the main thing is, does it hold good measure?"
Quick page jumps:
(To understand this discussion, you need to already be familiar with the TOP—what it signifies and whence it comes. If you aren't, please first read the "White Paper" here on baseball analysis theory.)
There are two "quality of pitching measures": the TPP ("Total Pitching Productivity") and the QofP ("Quality of Pitching". The TPP is, in effect, the TOP ("Total Offensive Productivity") of the composite of all the batters that a given pitcher has faced. That is only an approximate statement, meant to give you an idea of the concept, and we'll spell out the differences in a moment; but on that simplified basis, the TPP as a number signifies the number of runs that would be given up in a normal, full-length season by a pitching staff each member of which was pitching exactly like the man in question. And, likewise over-simplified, the QofP is the TPP translated into an ERA-like figure, based again on average defense and the normal percentage of total runs that are "earned" (which is a relatively stable percentage, about 90% in round numbers).
Now let's look at the small differences. When we calculate real team runs scored and allowed, we use the full TOP formula elaborated elsewhere. But when we evaluate individuals, we artificially set some of their real-world stats to zero, because those stats are things that are managerially directed (or allowed) and that often actually lower scoring. For batters, we set sacrifice bunts, stolen bases, and caught stealing to zero; for pitchers (for whom we call the stat the TPP, Total Pitching Productivity), we leave those alone but set Intentional Bases On Balls to zero, for the same reason.
(Also note that most intentional walks are situational: that is, the manager orders them because of the game situation—base-runner positions and outs suggesting to him an attempt to set up a double play—rather then by who is at the plate. A very few batters get inappropriate numbers of IBBs owing to their power numbers, but in the large scheme those are not enough to say that subtracting IBBs for a given pitcher biases the results by not counting the dangerous batters a pitcher faces. For what it's worth, putting the double play "in order"—a bizarre announcer phrase—apparently has roughly zero effect on net DPs turned.)
The QofP figure is just the TPP calculation, but with IBBs left in, so we get a number representing what the pitcher actually did, even though it is therefore not quite as good a measure of his actual performance as the TPP. But we generate the QofP so that folk used to ERAs can have something to look at that looks like numbers they're familiar with. The QofP still tells an interesting story that the ERA does not, and that crucial difference is elaborated a litttle farther down this page, after the Table below.
Well, that's not quite true. There's a certain amount of fudge-factor jugglery involved in converting the total-run values to earned-run-like values, and slightly different factors apply to starters and to relievers. We really very much prefer the TPP number for pitcher evaluation, but the QofP satisfies those who like ERA figures (a diminishing number these days as analysis takes hold of the public consciousness).
To help put these data, including the TPPs and QofPs you'll be seeing, in perspective, here are the numbers for the two Leagues and MLB as a whole during the sixteen-year period 1977-1992 that pre-dates the SillyBall. The "ERA" value at the rightmost end is the actual ERA value.
(The very slight differences—under 1%—between the QofPs and the ERAs come from just using an average rate for estimating earned runs from total runs yielded.)
This caveat has to do with data significance. In almost all cases, the ERA will correspond in a very broad way to the QofP; you won't find many 2.03/7.78 type pairings. But it is absolutely critical that you understand this: the ERA will not equal the QofP except by chance, nor should it. They are "measures" of the same thing, but signify very differently. The QofP tells just what its name says—the quality level of the man's pitching. Given a sufficient length of time pitching at that same quality level, the man will inevitably come to have a closely matching ERA; but that "sufficient length" may well be more than a single season—especially for relievers, far more. Again: the QofP is the actual quality of the pitching; the ERA is the result, with a lot of luck mixed into it.
The specific form of luck involved is usually called "sequencing". As a simple example, consider a pitcher who, in a given inning, gives up a single and a home run. If he does that in one order, he has two runs on his ERA; if he does it in the reverse order, he has one run. The same idea extends to whole games: seven hits in seven innings can produce five runs or zero runs (or many other results), depending on when and in what order they are given up. And so on. That sequencing is almost all random chance—sheer luck—is demonstrated by the fact that after some number of innings, the QofP and the actual ERA come into very close alignment, with just a fluctuating trivial plus/minus from other statistical noise. (The number varies with interpretation of "very close", but is probably not over 100 innings, and might be as little as 50 IP.)
If you have read the background material on baseball-analysis theory, you will remember that this is all probabilistic work. What the QofP measures is the man's demonstrated norm of pitching behavior; what the ERA measures is the chance-influenced actual results of applying that behavior in games. Half heads and half tails is the norm of coin-tossing behavior; what we get when we actually toss a coin a number of times may in fact be quite different—but as that number of tosses increases, that difference will invariably become progressively smaller. So with ERAs and QofPs.
If you want a crude rule of thumb, the expected average error in ERA versus QofP should be, in runs, about 10 divided by the square root of the BFP (batters faced) value. So if we have, let's say, a dozen pitchers who have each so far this year faced only 25 men, we would expect that the average difference between their QofP and ERA values would be a full 2 runs! (Like a QofP of 3.00 and an ERA of 5.00.) And in several cases it would be more, occasionally much more. By the time our set of pitchers have faced 100 men each, the expected average error drops to 1 run—still a lot. If they are all starters, they might end up facing 900 men each over the season; then, the expected average error would be down to around 0.33 run. It should now be clear why it often takes several seasons for the ERA to really show what it is purported to show—that which the QofP shows at once, the quality of the pitching performance.
Another factor, which we regret that we didn't realize in early years, is that a poor manager can artificially expand the QofP/ERA differential (as he artificially raises the pitcher's ERA). Consider: all probabilistic analysis, including our baseball work, relies on the data being independent and of equal value: any one coin toss is as likely to produce a head as is any other. Given enough data samples, the random peaks and valleys will average each other out. For batters, that is essentially true (actually, there are complicating factors which are beyond the scope of this discussion, but it's mostly true). For pitchers, it may not be so: a pitcher, unlike a batter, is performing in a continuous-effort mode, and a starter's work in the 8th inning is by no means necessarily the same as in the 2nd inning. A manager whose only idea of pitching use is "run 'em out there until they're in trouble" is a terrible manager, but—unfortunately—not a rare phenomenon. All too many pitchers are badly hurt by managers who insist on going to the well too often. A man who pitches splendidly through 6 and then decently but manifestly with effort through the 7th should be pulled then and there; but a manager who sends him out to get shelled in the 8th ("well, he looked pretty good up until then"—film at 11) is going to generate a data sample that is decidedly not representative of the man's work. His overall hits per batter faced, for example, may look good—but, because many of those hits were clustered in the few innings when he was tired and shouldn't have been out there—his actual ERA will be notably higher than what we calculate as his QofP. That is, he does pitch well on average, but his ERA is pumped up by his having been used when he wasn't at his average. (And in fact, the victims of such managers, if handled in a sane manner, would be significantly better performers overall, because mostly only bad performance would be subtracted by getting them out sooner.) In a civilized world, managers who work like that would be taken out back of the stadium, given a final smoke, and shot; but if we really did that, there would be very few still left to manage.
The key point here—and it is the power and importance of the TPP and QofP—is that is is they, not the ERA, that tell you accurately how well the man is truly pitching. If his ERA is better, or worse, that's just short-term luck. So, with the understanding that there is some minor fuzziness owing to not using a fully exact stat set, the QofP is the value that tells you how well or poorly your chosen subject is really pitching.
All content copyright © 2002 - 2017 by The Owlcroft Company.
This page was last modified on Sunday, 9 August 2015, at 8:51 pm Pacific Time.