Owing to the screen size of your device, you may obtain a better viewing experience by rotating your device a quarter-turn (to get the so-called "panorama" screen view).
owlcroft logo
An Owlcroft Company web site
Click here to email us.

The Owlcroft Baseball-Analysis Site

Baseball team and player performance examined realistically and accurately.

Search this site, or just roll your cursor over the colored boxes below the pictures.



Proving the TOP: Graphical

"Upon the whole you have proved to be
Much as you said you were."

—Thomas Hardy

Quick page jumps:

This page displays the full results in graphical form; those wanting to see the individual data points laid out in tabular form can do so on this alternative proof-results page.

A Half-Century of Baseball Data

The Graph below shows the results of applying the full Owlcroft formula for runs scored to over a half-century, 55 full years, of actual major-league data: all teams in all years from 1955 through 2009 inclusive. As to why those year limits: at the early end, there were scoring-rule changes in prior years that make certain raw data suspect or unavailable; at the late end, that's when I did the job.

The team data used in that formula include: at-bats, walks, hit batsmen, sacrifice hits (bunts), sacrifice flies, singles, doubles, triples, home runs, stolen bases, caught stealing, catcher's interference calls, and opponents' errors allowing an otherwise-out man to reach base safely. All but those last two are widely published, and the other two can be found by looking (the Baseball Reference site, for one, has them).

(There is no reason why CI, catcher's interference, should be such a pariah stat: it is an official stat, on a par with at-bats, and is required of the Official Scorer for every game; indeed, without it, he cannot "prove up" his results. Granted, it is usually zero for a given team in a given season, but sometimes it's not and it should be in every stat-line listing out there. But it's not. And almost every PA—plate appearances—stat published is wrong, because CI is omitted.)

The Calculations

The methodology of the TOP (projected-runs) calculation is explained elsewhere on this site.

As to the statistical measures in the summary table:

The "expected" errors are calculated using the standard statistical probability formulae. The expected average error is 79.79% of one Standard Deviation. The Standard Deviation for any one team-season is, in turn, the square root of npq, where n is the number of data samples, p is the probability of a success, and q is the probability of a failure (by definition, then, q = 1 - p). For this tabulation, a "success" is a run scored and a "data sample" is a batter at the plate. Thus, the probability of a success—a batter eventually scoring—is the just team seasonal runs scored divided by the team's total of batter plate appearances. So, the "expected" average error per team-season is thus:

err = 0.7979 x SquareRoot(PA x (R/PA) x (1 - R/PA))

We then average the individual expected errors for an overall average expected error figure (the individual expected-error figures per team-season will be rather similar because neither PAs nor TOPs vary all that much, in an absolute sense, from one team or season to another).

Mind, this is just a sort of guide to the general level of accuracy we are looking for. It assumes something that is not quite true, which is that for a given team in a given season, the probability of that team's average batter becoming a run is a fixed thing: it is not. But calculating as if it were gives us a good idea of the sort of accuracy we should expect, more or less as a minimum, for any run-projection equation of interest.

Results Summary

A tabular listing of these results appears elsewhere on this site; this is a graphical representation.

For 1378 Team-Seasons Evaluated:
Actual Average Error Size: 16.91 runs/team/season 2.45%
"Expected" Average Error Size: 19.82 runs/team/season 2.82%
Actual Standard Error: 21.62 runs  
Expected Standard Error: 24.84 runs  
Cumulative Error: 0.09 run/team/season circa 0%

The "Error Sizes" disregard whether the error is high or low—they measure its size. The "Cumulative Error" allows plus and minus errors to cancel; as we should expect, it is virtually zero, far less than a run a team a season.

Of the projections, 49% were over, 2% were exact, and 48% were under (the missing 1% is rounding error). (Not every run-predicting formula produces symmetrical results—indeed, most do not, which means they are more prone to error on certain types of data, typically high-scoring teams.)

Full Results Graph

Since "a picture is worth a thousand words", here is a graph of the results: the red line is exact accuracy, and, as you can see, the results are a truly beautiful approximation to that red line.

Projected-vs.-Actual Runs Graph

One thing that is notably important here is that accuracy remains excellent at the extremes, not just around the average where most of the data bunches up. Not a few other such formulae have good average accuracy numbers, but have a definite tendency to concentrate their errors at either the high or the low end of run-scoring (most often, the high end), indicating that they are not actually tracking well the real mechanisms of run-scoring.

Another important thing is that the errors in the Owlcroft formula are essentially symmetrical: they do not, as so mny other formulae's results do, slew toward over- or under-estimating, which is another marker of whether or not a given formula is tracking the real mechanisms of run-scoring.

(For those who might want to see the data in tabular form, we have provided it on a separate page (separate owing to its length—it may take some time to fully download on a slow connection). If you want to review it, here's the tabulated data.)



Want detailed, careful, unhysterical analysis of the effects of "Performance-Enhancing Drugs" in baseball? Click here to visit the Steroids & Baseball web site.

All content copyright © 2002 - 2019 by The Owlcroft Company.

This web page is strictly compliant with the W3C (World Wide Web Consortium) Extensible HyperText Markup Language (XHTML) Protocol v1.0 (Transitional) and the W3C Cascading Style Sheets (CSS) Protocol v3 — because we care about interoperability. Click on the logos below to test us!

This page was last modified on Sunday, 9 August 2015, at 8:51 pm Pacific Time.