Owing to the screen size of your device, you may obtain a better viewing experience by rotating your device a quarter-turn (to get the so-called “panorama” screen view).

Baseball team and player performance examined realistically and accurately.

(click for menu)

(click for menu)

You can get a site directory by clicking on the “hamburger” icon () in the upper right of this page.

Or you can search this site with Google (standard Google-search rules apply).

(Be aware that “sponsored” links to other sites will appear atop the actual results.)

Or you can search this site with Google (standard Google-search rules apply).

(Be aware that “sponsored” links to other sites will appear atop the actual results.)

Search term(s):

“Upon the whole you have proved to be

Much as you said you were.”

–*He Never Expected Much*,

Thomas Hardy

Much as you said you were.”

–

Thomas Hardy

Quick page jumps:

This page displays the full results in *graphical* form; those wanting to see the individual data points laid out in *tabular* form can do so on this alternative proof-results page.

The Graph farther below shows the results of applying the full Owlcroft formula for runs scored to almost two-thirds of a century—65 full years, 3,324 team-seasons—of actual major-league data: all teams—both offense and pitching—in all years from 1955 through 2019 inclusive. As to why those year limits: prior to the early end, 1955, there were scoring-rule differences that make data from those prior years incommensurable with stats from 1955 and on. (Mostly it was that Sacrifice Flies were not recognized as a separate category; the difference is small, but we deal in tenths and hundredths of a percent in reckoning error rates.)

The team data used in that formula include: Plate Appearances (BFP for pitchers, same thing); Left On Base; Hits; Doubles; Triples; Home Runs; Bases On Balls; Intentional Bases On Balls; Stolen Bases; Hit Batsmen; Sacrifice Bunts; Sacrifice Flies; Catcher’s Interferences; and times Reached On Error. All of those can be found by looking at the Baseball Reference site.

(Many stat services—though not Baseball-Reference—treat CI, catcher’s interference, as a pariah stat, presumably out of simple laziness: it is an official stat, on a par with at-bats, and is required of the Official Scorer for every game; indeed, without it, he cannot “prove up” his results. Granted, it is usually zero for a given team in a given season, but sometimes it’s not and itshouldbe in every stat-line listing out there. But it’s not. Many published PA—plate appearances—stats are wrong, because they got them by adding AB + BB + HBP + SH + SF, omitting CI.)

The methodology of **the TOP (projected-runs) calculation** is explained elsewhere on this site.

As to the statistical measures of “error” (variation between projected and actual):

“Expected” error sizes are calculated using standard statistical probability formulae. The expected average error is 79.79% of one Standard Deviation. The Standard Deviation for any one team-season is, in turn, the square root of **npq**, where **n** is the number of data samples, **p** is the probability of a success, and **q** is the probability of a failure (by definition, then, *q* = 1 - *p*). For this tabulation, a “success” is a run scored and a “data sample” is a batter at the plate. Thus, the probability of a success—a batter eventually scoring—is the just team seasonal runs scored divided by the team’s total of batter plate appearances. So, the “expected” average error per team-season is thus:

err = 0.7979 x SquareRoot(PA x [R/PA] x (1 - R/PA))

As a “sanity check”, let’s look at 2019 data. For all MLB teams averaged, the per-team plate appearances were 6217 (all-MLB 186518 divided by 30 teams) and the per-team runs were 782 (all-MLB 23467 divided by 30 teams). Thus, *p* is simply 782 divided by 6217, or 0.12578414; correspondingly, then, *q*—which is 1-*p*—is 0.87421586, and *npq* is 683.6368009. The square root of that is 26.146449107, which is that hypothetical perfectly average team’s Standard Deviation for Runs scored. The expected average deviation (“error”) per team is then 0.7979 x 26.146449107, or 20.862251742 runs. As a percentage of runs scored, that is 2.668%. The actual “error” rate for 2019 for the formula was 2.321%, actually a bit smaller than the “expected” rate, but sufficiently close that we can accept that method of reckoning what size of “error” is rightly to be expected, which further means that we are about as close as one can get to correctness: the formula error is right around the statistical “noise” level—we cannot get non-trivially better.

The average per-team-per-season error rate for the forumla is 2.167% (more precisely, 2.1666512127564%). That is the average *size* of error; if we allow over and under errors to balance out (so a +2.1% and a -2.3% would net to -0.2%), we get a mere -.006% per-team-season average, which is much less than one run. In effect, the average true error (not *size* of error) is zero…as it should be for any formula claiming accuracy. And do recall that this is applying the formula to both team batting stats for runs scored *and* to team pitching stats for runs allowed.

Since “a picture is worth a thousand words”, here is a graph of the results: the red line is exact accuracy, and, as you can see, the results are a truly beautiful approximation to that red line.

One thing that is quite important here is that **accuracy remains excellent at the extremes**, not just around the mid-range area where most of the data bunches up. Not a few other such formulae have good *average* accuracy numbers, but have a definite tendency to concentrate their errors at either the high or the low end of run-scoring (most often, the high end), indicating that they are not actually tracking well the real mechanisms of run-scoring.

Another important thing is that the errors in the Owlcroft formula are essentially symmetrical: they do not, as so mny other formulae’s results do, slew toward over- or under-estimating, which is another marker of whether or not a given formula is tracking the real mechanisms of run-scoring. (Visually, the dots above the red “equals” line are closely symmetrical to those below it.)

All this, we feel strongly, validates the Owlcroft Runs-Scored formula as being about as good as it can get, and thoroughly satisfactory for real-world use.

This simply projects expected games won from Runs scored and Opponents’ Runs allowed. The data basis is the same 1955 - 2019 period: 1,662 team-seasons. The average error is less than 2 games a team-season (1.853 wins). As with the TOP formula, the results display narrow variation, symmetrical distribution, and accuracy even at the extremes.

__Advertisement:__

__Advertisement:__

All content copyright © 2002 - 2023 by
**The Owlcroft Company**.

This web page is strictly compliant with the W3C
(World Wide Web Consortium)
Extensible HyperText Markup Language (XHTML) Protocol
v1.0 (Transitional)
and the W3C Cascading Style Sheets (CSS) Protocol
v3 — because
*we care about interoperability.* Click on the logos below to test us!

This page was last modified on Sunday, 26 November 2023, at 1:57 am Pacific Time.