Quality of Pitching

“But the main thing is, does it hold good measure?”

– The Poetic and Dramatic Works of Robert Browning, v. IV
Robert Browning

Quick page jumps:

Measuring True Pitching Performance
- The TPP as a “Reverse TOP”
- The Quality-of-Pitching Measure: the QofP
ERA/QofP Correspondence

Measuring True Pitching Performance

The TPP as a “Reverse TOP”

(To understand this discussion, you need to already be familiar with the TOP—what it signifies and whence it comes. If you aren’t, please first read the page here on baseball analysis theory.)

We generate two pitching-performance measures: the TPP (“Total Pitching Productivity”) and the QofP (“Quality of Pitching”. The TPP is, in effect, the TOP (“Total Offensive Productivity”) of the composite of all the batters that a given pitcher or team has faced. The TPP as a number signifies the number of runs that would be given up in a normal, full-length season by a pitching staff each member of which was pitching exactly like the man in question. The QofP is—more or less—the TPP translated into an ERA-like figure, as discussed below. The TPP is the solid and recommended performance metric; the QofP is at best an approximate number, depending on a given man’s ratio of total-to-earned runs, which can vary considerably from league averages owing to pure luck.

That “luck” involves what is usually called sequencing. If a pitcher gives up a walk and a home run in some inning, the order—the sequence—in which they occcur will dramatically affect his stats (1 run scored or 2). Likewise, if a pitcher gives up a bunch of hits, walks, and thus runs in an inning in which an error occurs, when in the inning that error occurred can drastically affect which of the runs scored will be counted as “earned”, even though the quality of his pitching in that inning is not affected. An error behind a pitcher is not a license to stop pitching well.

Return to the page top. ↑

The Quality-of-Pitching Measure: the QofP

The QofP figure is just the TPP calculation divided by 162 (to get a per-game figure, typically listed by stat services as “RA9” (Runs Allowed per 9 Innings Pitched), then multiplied by the league-average earned-to-total runs ratio. But, as we noted just above, pure luck can have a significant effect on a given man’s total-to-earned runs ratio; and especially for relievers, even just one or two bad-luck outings of that sort can materially affect that ratio. It is thus important to understand that the QofP is measuring something quite different from the ERA: it is measuring pitcher performance with “luck” (sequencing) neutralized. (Not all luck—that is impossible—but sequencing luck, which can be a big part of stats, especially, as we keep saying, for relievers.)

Nonetheless, we generate the QofP so that folk used to ERAs can have something to look at that looks like numbers they’re familiar with. The QofP tells an interesting story that the ERA does not; nonetheless, the TPP is the better yardstick, because it does not have that league-average total-to-earned adjustment factor rolled into it. So remember: exactly because they are telling different stories, the QofP and the ERA need not, and often will not, align.

To help put these data, including the TPPs and QofPs you’ll be seeing, in perspective, here are the numbers for the two Leagues and MLB as a whole during the sixteen-year period 1977-1992, which starts with the advent of the Rawlings baseball and ends just before the first of the several SillyBall juicings. The “ERA” value at the rightmost end is the actual ERA value.

1977-1992	BA	SA	HA	PF	BBP	OBP	TBP	K/W	TPP	QofP	ERA
M.L. AVG.	.259	.388	.236	1.496	.078	.319	.352	1.826	694	3.90	3.87
N.L. AVG.	.256	.379	.233	1.479	.075	.313	.345	1.960	665	3.73	3.69
A.L. AVG.	.262	.397	.238	1.513	.081	.324	.360	1.702	724	4.07	4.05

As you see, the QoP and the ERA compare closely. But…that is over data representing 16 years of a full league, which is a lot of “evening-out” time. When you look at the many individual-team and, especially, individual-pitcher lists in these pages, you will see bigger differences.

Return to the page top. ↑

ERA/QofP Correspondence

Here we elaborate on the matters briefly touched on farther up this page.

This caveat has to do with data significance. In almost all cases, the ERA will correspond in a very broad way to the QofP; you won’t find many 2.03/7.78 type pairings. But it is absolutely critical that you understand this: the ERA will not exactly equal the QofP except by chance, nor should it. They are “measures” of the same thing, but signify very differently. The QofP tells just what its name says—the quality level of the man’s pitching. Given a sufficient length of time pitching at that same quality level, the man will almost always come to have a fairly closely matching ERA; but that “sufficient length” may well be more than a single season—especially for relievers, far more. Again: the QofP is the actual quality of the pitching; the ERA is the result, with a fair bit of luck mixed into it.

If you have read the background material on baseball-analysis theory, you will remember that this is all probabilistic work. What the QofP measures is the man’s demonstrated norm of pitching behavior; what the ERA measures is the chance-influenced actual results of applying that behavior in games. Half heads and half tails is the norm of coin-tossing behavior; what we get when we actually toss a coin a number of times may in fact be quite different—but as that number of tosses increases, that difference will invariably become progressively smaller. So with ERAs and QofPs.

If you want a crude rule of thumb, the expected average error in ERA versus QofP should be, in runs, about 10 divided by the square root of the BFP (batters faced) value. So if we have, let’s say, a dozen pitchers who have each so far this year faced only 25 men, we would expect that the average difference between their QofP and ERA values would be a full 2 runs! (Like a QofP of 3.00 and an ERA of 5.00.) And in several cases it would be more, occasionally much more. By the time our set of pitchers have faced 100 men each, the expected average error drops to 1 run—still a lot. If they are all starters, they might end up facing 900 men each over the season; then, the expected average error would be down to around 0.33 run. It should now be clear why it often takes several seasons for the ERA to really show what it is purported to show—that which the QofP shows at once, the quality of the pitching performance.

Another factor, which we regret that we didn’t realize in early years, is that a poor manager can artificially expand the QofP/ERA differential (as he artificially raises the pitcher’s ERA). Consider: all probabilistic analysis, including our baseball work, relies on the data being independent and of equal value: any one coin toss is as likely to produce a head as is any other. Given enough data samples, the random peaks and valleys will average each other out. For batters, that is essentially true (actually, there are complicating factors which are beyond the scope of this discussion, but it’s mostly true). For pitchers, it may not be so: a pitcher, unlike a batter, is performing in a continuous-effort mode, and a starter’s work in the 8th inning is by no means necessarily the same as in the 2nd inning. A manager whose only idea of pitching use is “run ’em out there until they’re in trouble” is a terrible manager, but—unfortunately—not a rare phenomenon. All too many pitchers are badly hurt by managers who insist on going to the well too often.

A man who pitches splendidly through 6 and then decently but manifestly with effort through the 7th should be pulled then and there; but a manager who sends him out to get shelled in the 8th (“well, he looked pretty good up until then”—film at 11) is going to generate a data sample that is decidedly not representative of the man’s work. His overall hits per batter faced, for example, may look good—but, because many of those hits were clustered in the few innings when he was tired and shouldn’t have been out there—his actual ERA will be notably higher than what we calculate as his QofP. That is, he does pitch well on average, but his ERA is pumped up by his having been used when he wasn’t at his average. (And in fact, the victims of such managers, if handled in a sane manner, would be significantly better performers overall, because mostly only bad performance would be subtracted by getting them out sooner.) In a civilized world, managers who work like that would be taken out back of the stadium, given a final smoke, and shot; but if we really did that, there would be very few still left to manage.

The key point here—and it is the power and importance of the TPP and QofP—is that is is they, not the ERA, that tell you accurately how well the man is truly pitching. If his ERA is better, or worse, that’s just short-term luck. So, with the understanding that there is some minor fuzziness owing to not using a fully exact stat set, the TPP and QofP are the values that tell you how well or poorly your chosen subject is really pitching.

Return to the page top. ↑

Where Next?

Having come this far on the site, you may be wondering where to look next. We recommend that before you cut to the daily data pages, you look over our discussion of fielding and its importance or lack thereof—and how to measure it—so that the results we present will have meaning for you.

The logical next page, then, is the one on Fielding and Defense in Baseball.

Return to the page top. ↑

This page was last modified on Tuesday, 17 September 2024, at 4:42 pm Pacific Time.

The Owlcroft Baseball-Analysis Site

Measuring True Pitching Performance

The TPP as a “Reverse TOP”

The Quality-of-Pitching Measure: the QofP

ERA/QofP Correspondence

Where Next?