Skip to main content 

Owing to the screen size of your device, you may obtain a better viewing experience by rotating your device a quarter-turn (to get the so-called “panorama” screen view).

The Owlcroft Baseball-Analysis Site

Baseball team and player performance examined realistically and accurately.

(click for menu)
(click for menu)
You are here:  Home  »   ( = this page)
(Click on any image above to see it at full size.)

You are here:  Home  »   ( = this page)
You can get a site directory by clicking on the “hamburger” icon () in the upper right of this page.
Or you can search this site with Google (standard Google-search rules apply).
(Be aware that “sponsored” links to other sites will appear atop the actual results.)

Search term(s):

Proving the TOP: Graphical

“Upon the whole you have proved to be
Much as you said you were.”

– He Never Expected Much,
Thomas Hardy

Quick page jumps:

This page displays the full results in graphical form; those wanting to see the individual data points laid out in tabular form can do so on this alternative proof-results page.

Thousands of Team-Seasons of Baseball Data

The Graph farther below shows the results of applying the full Owlcroft formula for runs scored to almost two-thirds of a century—65 full years, 3,324 team-seasons—of actual major-league data: all teams—both offense and pitching—in all years from 1955 through 2019 inclusive. As to why those year limits: prior to the early end, 1955, there were scoring-rule differences that make data from those prior years incommensurable with stats from 1955 and on. (Mostly it was that Sacrifice Flies were not recognized as a separate category; the difference is small, but we deal in tenths and hundredths of a percent in reckoning error rates.)

The team data used in that formula include: Plate Appearances (BFP for pitchers, same thing); Left On Base; Hits; Doubles; Triples; Home Runs; Bases On Balls; Intentional Bases On Balls; Stolen Bases; Hit Batsmen; Sacrifice Bunts; Sacrifice Flies; Catcher’s Interferences; and times Reached On Error. All of those can be found by looking at the Baseball Reference site.

(Many stat services—though not Baseball-Reference—treat CI, catcher’s interference, as a pariah stat, presumably out of simple laziness: it is an official stat, on a par with at-bats, and is required of the Official Scorer for every game; indeed, without it, he cannot “prove up” his results. Granted, it is usually zero for a given team in a given season, but sometimes it’s not and it should be in every stat-line listing out there. But it’s not. Many published PA—plate appearances—stats are wrong, because they got them by adding AB + BB + HBP + SH + SF, omitting CI.)

Return to the page top. ↑

The Calculations

The methodology of the TOP (projected-runs) calculation is explained elsewhere on this site.

As to the statistical measures of “error” (variation between projected and actual):

“Expected” error sizes are calculated using standard statistical probability formulae. The expected average error is 79.79% of one Standard Deviation. The Standard Deviation for any one team-season is, in turn, the square root of npq, where n is the number of data samples, p is the probability of a success, and q is the probability of a failure (by definition, then, q = 1 - p). For this tabulation, a “success” is a run scored and a “data sample” is a batter at the plate. Thus, the probability of a success—a batter eventually scoring—is the just team seasonal runs scored divided by the team’s total of batter plate appearances. So, the “expected” average error per team-season is thus:

err = 0.7979 x SquareRoot(PA x [R/PA] x (1 - R/PA))

As a “sanity check”, let’s look at 2019 data. For all MLB teams averaged, the per-team plate appearances were 6217 (all-MLB 186518 divided by 30 teams) and the per-team runs were 782 (all-MLB 23467 divided by 30 teams). Thus, p is simply 782 divided by 6217, or 0.12578414; correspondingly, then, q—which is 1-p—is 0.87421586, and npq is 683.6368009. The square root of that is 26.146449107, which is that hypothetical perfectly average team’s Standard Deviation for Runs scored. The expected average deviation (“error”) per team is then 0.7979 x 26.146449107, or 20.862251742 runs. As a percentage of runs scored, that is 2.668%. The actual “error” rate for 2019 for the formula was 2.321%, actually a bit smaller than the “expected” rate, but sufficiently close that we can accept that method of reckoning what size of “error” is rightly to be expected, which further means that we are about as close as one can get to correctness: the formula error is right around the statistical “noise” level—we cannot get non-trivially better.

Return to the page top. ↑

Results Summary

In Numbers

The average per-team-per-season error rate for the forumla is 2.167% (more precisely, 2.1666512127564%). That is the average size of error; if we allow over and under errors to balance out (so a +2.1% and a -2.3% would net to -0.2%), we get a mere -.006% per-team-season average, which is much less than one run. In effect, the average true error (not size of error) is zero…as it should be for any formula claiming accuracy. And do recall that this is applying the formula to both team batting stats for runs scored and to team pitching stats for runs allowed.

Return to the page top. ↑

As a Graph

Since “a picture is worth a thousand words”, here is a graph of the results: the red line is exact accuracy, and, as you can see, the results are a truly beautiful approximation to that red line.

Projected-vs.-Actual Runs Graph

One thing that is quite important here is that accuracy remains excellent at the extremes, not just around the mid-range area where most of the data bunches up. Not a few other such formulae have good average accuracy numbers, but have a definite tendency to concentrate their errors at either the high or the low end of run-scoring (most often, the high end), indicating that they are not actually tracking well the real mechanisms of run-scoring.

Another important thing is that the errors in the Owlcroft formula are essentially symmetrical: they do not, as so mny other formulae’s results do, slew toward over- or under-estimating, which is another marker of whether or not a given formula is tracking the real mechanisms of run-scoring. (Visually, the dots above the red “equals” line are closely symmetrical to those below it.)

All this, we feel strongly, validates the Owlcroft Runs-Scored formula as being about as good as it can get, and thoroughly satisfactory for real-world use.

Return to the page top. ↑

The Games-Won Formula

This simply projects expected games won from Runs scored and Opponents’ Runs allowed. The data basis is the same 1955 - 2019 period: 1,662 team-seasons. The average error is less than 2 games a team-season (1.853 wins). As with the TOP formula, the results display narrow variation, symmetrical distribution, and accuracy even at the extremes.

Projected-vs.-Actual Wins Graph

Return to the page top. ↑



Want detailed, careful, unhysterical analysis of the effects of “Performance-Enhancing Drugs” in baseball? Click here to visit the Steroids & Baseball web site.

All content copyright © 2002 - 2024 by The Owlcroft Company.

This web page is strictly compliant with the WHATWG (Web Hypertext Application Technology Working Group) HyperText Markup Language (HTML5) Protocol versionless “Living Standard” and the W3C (World Wide Web Consortium) Cascading Style Sheets (CSS3) Protocol v3  — because we care about interoperability. Click on the logos below to test us!

This page was last modified on Sunday, 4 February 2024, at 6:24 pm Pacific Time.