Owing to the screen size of your device, you may obtain a better viewing experience by rotating your device a quarter-turn (to get the so-called “panorama” screen view).
Search term(s):
Quick page jumps:
The Graph farther below shows the results of applying the full Owlcroft formula for runs scored to almost two-thirds of a century—65 full years, 3,324 team-seasons—of actual major-league data: all teams—both offense and pitching—in all years from 1955 through 2019 inclusive. As to why those year limits: prior to the early end, 1955, there were scoring-rule differences that make data from those prior years incommensurable with stats from 1955 and on. (Mostly it was that Sacrifice Flies were not recognized as a separate category; the difference is small, but we deal in tenths and hundredths of a percent in reckoning error rates.) And 2019 is when this graph was made. (One of these days, we will update to the current season, but 65 years is a lot of seasons.)
The team data used in that formula include: Plate Appearances (BFP for pitchers, same thing); Left On Base; Hits; Doubles; Triples; Home Runs; Bases On Balls; Intentional Bases On Balls; Stolen Bases; Hit Batsmen; Sacrifice Bunts; Sacrifice Flies; Catcher’s Interferences; and times Reached On Error. All of those can be found by looking at the Baseball Reference site.
(Many stat services—though not Baseball-Reference—treat CI, catcher’s interference, as a pariah stat, presumably out of simple laziness: it is an official stat, on a par with at-bats, and is required of the Official Scorer for every game; indeed, without it, he cannot “prove up” his results. Granted, it is usually zero for a given team in a given season, but sometimes it’s not and it should be in every stat-line listing out there. But it’s not. Many published PA—plate appearances—stats are wrong, because they got them by adding AB + BB + HBP + SH + SF, omitting CI.)
The methodology of the TOP (projected-runs) calculation is explained elsewhere on this site.
As to the statistical measures of “error” (variation between projected and actual):
“Expected” error sizes are calculated using standard statistical probability formulae. The expected average error is 79.79% of one Standard Deviation. The Standard Deviation for any one team-season is, in turn, the square root of npq, where n is the number of data samples, p is the probability of a success, and q is the probability of a failure (by definition, then, q = 1 - p). For this tabulation, a “success” is a run scored and a “data sample” is a batter at the plate. Thus, the probability of a success—a batter eventually scoring—is the just team seasonal runs scored divided by the team’s total of batter plate appearances. So, the “expected” average error per team-season is thus:
err = 0.7979 x SquareRoot(PA x [R/PA] x (1 - R/PA))
As a “sanity check”, let’s look at 2019 data. For all MLB teams averaged, the per-team plate appearances were 6217 (all-MLB 186518 divided by 30 teams) and the per-team runs were 782 (all-MLB 23467 divided by 30 teams). Thus, p is simply 782 divided by 6217, or 0.12578414; correspondingly, then, q—which is 1-p—is 0.87421586, and npq is 683.6368009. The square root of that is 26.146449107, which is that hypothetical perfectly average team’s Standard Deviation for Runs scored. The expected average deviation (“error”) per team is then 0.7979 x 26.146449107, or 20.862251742 runs. As a percentage of runs scored, that is 2.668%.
The actual “error” rate for 2019 for the formula was 2.321%, only a hair off the “expected” rate (2.30%), but clearly so close that we can accept that method of reckoning what size of “error” is rightly to be expected, which further means that we are about as close as one can get to correctness: the formula error is right around the statistical “noise” level—we cannot get non-trivially better.
The average per-team-per-season error rate for the forumla is 2.30% (more precisely, 2.300035409999%). That is the average size of error; if we allow over and under errors to balance out (so, for example, a +2.1% and a -2.3% would net to -0.2%), we get a mere -.006% per-team-season average, which is much less than one run. In effect, the average true error (not size of error) is zero…as it should be for any formula claiming accuracy.
Since “a picture is worth a thousand words”, here is a graph of the results: the red line is exact accuracy, and, as you can see, the results are a truly beautiful approximation to that red line.
One thing that is quite important here is that accuracy remains excellent at the extremes, not just around the mid-range area where most of the data bunches up. Not a few other such formulae have good average accuracy numbers, but have a definite tendency to concentrate their errors at either the high or the low end of run-scoring (most often, the high end), indicating that they are not actually tracking well the real mechanisms of run-scoring.
Another important thing is that the errors in the Owlcroft formula are essentially symmetrical: they do not, as so many other formulae’s results do, slew toward over- or under-estimating, which is another marker of whether or not a given formula is tracking the real mechanisms of run-scoring. (Visually, the dots above the red “equals” line are closely symmetrical to those below it.)
All this, we feel strongly, validates the Owlcroft Runs-Scored formula as being about as good as it can get, and thoroughly satisfactory for real-world use.
This simply projects expected games won from Runs scored and Opponents’ Runs allowed. The data basis is the same 1955 - 2019 period: 1,662 team-seasons. The average error is less than 2 games a team-season (1.853 wins). As with the TOP formula, the results display narrow variation, symmetrical distribution, and accuracy even at the extremes.
Advertisement:
Advertisement:
All content copyright © 2002 - 2024 by The Owlcroft Company.
This web page is strictly compliant with the WHATWG (Web Hypertext Application Technology Working Group) HyperText Markup Language (HTML5) Protocol versionless “Living Standard” and the W3C (World Wide Web Consortium) Cascading Style Sheets (CSS3) Protocol v3 — because we care about interoperability. Click on the logos below to test us!
This page was last modified on Saturday, 7 September 2024, at 12:19 am Pacific Time.