Analysis & This Site

“A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it.”

– Scientific autobiography,
Max Planck

Quick page jumps:

Baseball Analysis: An Overview
Baseball Analysis: Some History
Where Next?

Baseball Analysis: An Overview

An Important Note

When this site was first established, now over two decades ago, few knew about baseball analysis, so that explanations were needed about what it is and how it works. Today, we have the reverse problem: “analytics” is a word that falls trippingly from every tongue—but “analytics” and analysis are very different things.

“Analytics” is a broad umbrella under which lie a great many metrics; but only a few of those are actual analysis. “Analytics” include such things as spin rate on pitches and launch angle on batted balls. Those sorts of things are valuable information, some very much so and others not so much so: but such metrics are essentially coaching tools. They measure mechanical things, and those measures can be quite useful in working with players to increase their performance value. But true analysis is concerned with what we may call the end product: reckoning how individual men’s actual performances contribute to (as appropriate) their team’s run scoring or run preventing; how a team’s stats indicate what their runs scored and allowed are most likely to be; and how team runs scored and runs allowed predict probable team win totals.

That’s it really: just two formulas or equations or whatever one wants to call them: expected wins for a given Runs-Scored/Runs-Allowed pair, and expected runs scored or runs allowed from a set of familiar, commonly available count stats.

(Mind, there are also strategic and tactical evaluations that are analytic, such as the break-even success rate for base stealing—about 73%—but those all derive from uses of the runs-scored formula.)

The basics of true analysis having been largely settled matters for many years now, latecomers to the analysis field, finding the essential ground already well covered, have conjured a myriad of other supposedly analytic measures—to no help and, too often, to some harm to a good understanding of baseball. There is, for just one example, the so-called “WAR” (“Wins Above Replacement”) measure; aside from the nonsensicality of of comparing a man’s performance to some imaginary “replacement” player, and of measuring not stat rates but accumulated stats (like judging a race car not by its speed capability but by miles it’s covered), there is the fact that different sources produce different numbers. Really!

“WAR is necessarily an approximation and will never be as precise or accurate as one would like.”
— Baseball-Reference.com WAR Explained.

The whole essence of true analysis is to generate definite real-world numbers—runs scored, games won—not more fodder for tavern arguments.

We believe that the measures used on this site—TOP (Total Offensive Productivity) for batting and TPP (Total Pitching Productivity) for pitching—are the most accurate available (the TOP has, over some 71 years of baseball data, an average runs-scored error rate of a mere 2.36%); but there are others of value (Base Runs, for example). Farther on, we explain and discuss how such measures are derived, what “expected” means, and a lot more. But only those two measures—expected wins from R|OR data and expected runs scored (or allowed, as appropriate) from familiar, commonly available count stats—are what one can rightly call “analysis”. (We wish analysis and analytics could be recognized as distinguishable terms, but that ship seems to have sailed.)

We now return you to your regularly scheduled web site.

What Kind of Baseball Information Have We Here?

This site is not a baseball “statistics” site as such—the internet (and, for that matter, your daily newspaper) can supply you with tons of raw baseball statistics. This site is about analyzing baseball statistics to extract deep and powerful measures showing what batters, pitchers, and teams are really doing toward scoring and giving up runs and—the bottom line—winning baseball games. We know how important those special measures are because this was our business for quite a long time.

There are several extensive introductory articles in the pages of this site, and those are where you will find the detailed explanations of the principles and practices that underlie the baseball measures we present on our main display pages. Farther down this page we will give you links to those explanatory essays, but first we will summarize the highlights.

It has been recognized for nearly three-quarters of a century now that the winning of baseball games, and the underlying constituents of winning—the scoring and yielding of runs—are processes capable of numerical analysis using the statistics of the game as raw materials. To put it simply, we can take just a few familiar team stats—at-bats, hits, walks, and total bases are one such set—and (assuming these are from some reasonable number of games) predict with striking accuracy a team’s runs-scored total for those games. And, of course, what we can do for batters scoring runs we can turn around and do for pitchers yielding runs. Moreover, given team runs-scored and runs-yielded figures, we can predict with similarly striking accuracy how many of its baseball games a team will have won.

(We’re not going to explain here the how and why of those things, because we do that at length on other pages of the site, to which we will refer you farther on.)

And we can go a step farther—a very, very important step. Using the same stats we would use to analyze a team, we can analyze a single man—batter or pitcher. The results we get are best thought of as the runs that would be scored or given up, as appropriate, by a lineup or a pitching staff made up of exact clones of that man. With that kind of information, we can put together the performance stats for the individual men on a baseball team and derive projected team-total values—and from them, calculate games-won values.

In other words, given the identities of the men on a baseball team, we can reliably forecast how many games that team will win with substantial accuracy. (But that does not mean that, for example, we can win lots of money at a baseball sports book, because no one can know in advance how much playing time a given manager will give a specific player, nor who will get injured for how long, nor what trades a team may make when.) But we can say quite exactly just how much a given batter or pitcher is (or is not) doing to help his team win baseball games—and those measures are not some arbitrary relative “rankings”: they are absolute numbers that can be combined to give familiar and important real-world values (notably runs and wins) for an entire baseball team.

Now as anyone remotely familiar with baseball knows, one of the crucial problems in comparing men’s abilities and performances is that the players throughout major-league baseball are not competing on the proverbial “level playing field.” A given batter playing in, for example, Coors Field would compile very different statistics than he would playing in, again for example, Oracle Park (or whatever they may be calling it this year), even though his actual abilities would be the same.

The 30 quite different baseball parks affect statistics 30 different ways. Nowadays you can find many places that purport to show such park effects on each baseball statistic, often to several decimal places. We here—many years ago—used to calculate, and use, such correction factors to generate “park-neutral” results, but we have stopped doing so. The reasons we stopped are given at some length in our page on park effects, but in essence it has, in our firmly held opinion, become virtually impossible to get even moderately accurate park-effect results, chiefly owing to two things. For one, parks now have results-affecting structural changes made with striking frequency (you might be surprised at how little it takes to have some effect), such that “historical” data—even as recent as last year’s—is too often meaningless; and, even if last year’s data is usable, it is nowadays unlikely that several seasons’ worth of commensurable data exist, meaning we’d be working with an undesireably small data sample. Not to even mention things like retractable roofs that are actually opened and closed (which can have a considerable effect on how the park “plays”) during ongoing ball games, sometimes even in the middle of an inning.

(And while we speak of “changes”, besides those there have been wholly new ballparks coming on line at something close to one a season for a long time now, and that trend may well continue for a while.)

And, for another thing, the vagaries of scheduling have made it hard (even putting aside those changes in the parks) to get a set of normalizeable data representing some sort of standard against which to compare a given park. The old technique was home versus away data, but “away” now represents drastically different combinations of parks (something interleague play has further corrupted) from team to team. We prefer to present results that are frankly unadjusted rather than results “adjusted” by methods that yield deceptive precision but horrid accuracy; one can make crude, rule-of-thumb mental adjustments to data from the more extreme parks, and that’s about as good as applying any of those deceptive published factors.

Park-specific biases are not the only normalization applicable to baseball stats. Another factor is the undoubtable and major “juicing” that the baseball itself has undergone at several points, some quite recent (there is substantial discussion of that at the page linked in this sentence). That frivolous tinkering with the ball means that stats from one era to another, sometimes even one season to the next, are not truly commensurable. When there were only two eras applicable to then-contemporary players (pre-1993 and post-1993), we could and did make adjustments to the raw stats; but that would now—owing to there being several “eras”, many quite short) be a massive job involving some subjectivity in exactly identifying an “era”, so (though we may chnage our minds later) we for now do not adjust for the ball, either. Just review the linked page and keep in mind those big changes.

Going back for a moment to the issue of predicting runs scored or runs yielded from ordinary stats: we want to emphasize the special significance of such measures for pitchers. The usual measure of pitching performance, the ERA, is actually in many ways a better measure of pitchers than any one conventional offensive stat is for batters; but the fact remains that the ERA nevertheless has several severe defects. One is that it can be influenced, often strongly so, by factors not well within any pitcher’s control. Such influences—in simplest terms, “luck”—will more or less average out over the long run, but, for pitchers much more so than for batters, the “long run”, even for a starter, can be a good bit more than one full season. And for relievers, who often come into a game with one or two men already out and don’t stay in long, the ERA is nearly meaningless. The measure that we calculate, on the other hand, shows the actual quality level at which any given pitcher (or staff) is really performing. In the long run, it and the ERA will eventually come into pretty close agreement, but, as someone once remarked, “in the long run, we are all dead”; in the short to medium run, our figure is a much more important and accurate measure of pitching quality than anything else now available.

Return to the page top. ↑

How Is All This Baseball Information Presented?

The information pages on this site are quite numerous, but way they are laid out is (we hope and believe) simple enough. You should have no trouble picking it up from the colored “zones” in the drop-down menu accessible from the little “hamburger” logo () in the upper right corner of every page here.

(There is also an extensive explanation of what is on this site given on the site Home Page; if you haven’t yet read it, you should stop here and follow the link back to it.)

Besides the complete player and pitcher listings, we also include a page with lists for all “regular” pitchers and batters—all those meeting a minimum-playing-time criterion (in plate appearances or batters faced, as appropriate, with starters and relievers considered separately), which minimum changes daily through the season. Using those lists, one can see at once how the more significant players and pitchers compare all across major-league baseball. We also segregate out by-position lists for position players. Most of those various lists are presented both alphabetically and in a by-performance sorting.

Perhaps chiefest in importance of all the results pages on this site, we have a Team-Performance page on which the performance of all 30 major-league baseball teams as units are shown in terms of both actual and projected runs scored, runs allowed, and games won. That page may be the chiefest here because it shows how well each team is currently playing in terms of how many games they appear aimed—by the quality of their play, not their actual record so far—toward winning on the season, a very simple, easily understood datum. (There is a separate explanation page here about that Table, to help make its many columns readily comprehensible.)

Return to the page top. ↑

How Is This Site Operated?

We try to keep the baseball stats on these pages up to date daily, but please understand that this site is a sideline for us. That also means that we may not be able to respond individually to all e-mail…but be assured that we will read it all! (There is a click-on link atop every page here from which to email us, but we’ll just repeat it here: email us.)

As a rule, updating will happen sometime around 8:00 a.m. Pacific (11:00 a.m. Eastern); but that is not guaranteed. Our software creates and posts the updates automatically, but those procedures can fail if there is some sort of defect in the raw data we begin with, and we may not discover and be able to repair such failures till later in the day, if at all. (Our raw-data comes from Baseball-Reference.com, whom we recommend as a resource and further recommend that you sign up for their “Stathead” premium service.) Check the “through games of” notice on each of our data pages to be sure of what you’re seeing. (If the software detects any data problems, it leaves the previous day’s pages in place.)

Even after decades of operation, this entire site is always under development. Just as with the old beat writers’ observation, “You come out to the ballyard every day for twenty years, and every day you see something you never saw before,” so with this site; the only constant is change. And if you miss something you would like to see, or see something you would as soon miss, please…let us know: use that “email us” page-top link. Take some time, explore the site, then send us your thoughts. And, again, thank you both for visiting here and for any feedback.

One other thing: all pages on this web site have been third-party verified as being 100% fully and strictly compliant with the latest official written HTML and CSS standards. You can verify that with the click-ons at the bottom of this (and every) page.

Return to the page top. ↑

Baseball Analysis: Some History

Since that joyous day in 1846 when baseball was played for the very first time—on the grounds called by the cosmically appropriate name The Elysian Fields—followers of the art, both amateur and professional, have been seeking measures of player and team performance. There is a fine summary of that search in Alan Schwarz’s book The Numbers Game (which we recommend), so we won’t recount it at length here.

(Though it is conventional to say that baseball “began” in 1846 at the Elysian Fields, the truth is that, as Wikipedia tell us, “There were countless baseball clubs and games played during the 1830s, if not earlier…and the first rules were drawn up by the Gotham Club of New York in 1837.” But, as they further add, “[the Elysian Fields has] historic standing for its pivotal role in the early game as it evolved from a pleasant leisure time pursuit to a highly competitive—and commercial—spectator sport.)

Let it suffice to say that a myriad of baseball performance measures have been proposed, and the count grows daily; and each proposed tool has claimed for it by its adherents various reasons why it is important. Till relatively modern times, all such tools—the good and the bad alike—were relative measures: they compared, in one or another way or ways, one man with another or one team with another. They were excellent conversation starters but little else. It was not till the middle of the twentieth century that a serious and sustained effort was begun to derive some sort of absolute baseball measures—ones that would allow calculation of actual, real-world values of importance (such as games won, the ultimate measure of importance).

In the August 2, 1954, issue of Life magazine, Branch Rickey, possibly the finest mind ever to grace baseball management, set forth a formula (developed with the aid of mathematicians from M.I.T.) that would, in a crude way, predict how many games a team would win based on various commonly available team statistics. In 1964, a Johns Hopkins professor named Earnshaw Cook put out a book titled Percentage Baseball, wherein he derived—by using arcane mathematical methods (stochastic analysis)—criteria for both teams and individual players that were reasonably successful absolute measures. (He later followed up that work using the then-new computer, and wrote the aptly titled Percentage Baseball and the Computer.)

Rather later on, the highly articulate and provocative writing style of Bill James propelled these forms of analysis into the consciousness of the general baseball public, and even the ranks of baseball management. Nowadays, everybody and his cousin is producing both books and proprietary measures with which to fill the pages of those books. While a few of these are good and several are awful (no names—you know who you are, or you should), the point of greatest interest to the public is that almost all of these books and measures, no matter how differently expounded, reach pretty much the same results. (Here is one example, and here’s another.) Elsewhere on this site, we present a list of some of the most interesting baseball books out there now.

That consensus of results will, of course, tell the thoughtful one vital thing: this is now a science. Theoretical physicists still argue, with some heat, over small and abstruse side issues about Einstein’s principles; but those debates are of little consequence to anybody besides the participants, whereas nuclear power plants—and weapons—are an everyday fact of life. So in baseball: the practitioners of analysis still quibble and squabble over the minor arcana, but on the whole no reasonable person (which, regrettably, omits many professionally associated with the sport) doubts any more that analysis is the only basis for a true understanding of the inner workings of the subtle and wonderful game of baseball.

(Mind, the question of exactly what “analysis” is needs discussion, because—as noted atop this page—even in this day of heavy use of “analytics” there are some significant misunderstandings about that.)

Return to the page top. ↑

Where Next?

Having come this far on the site, you may be wondering where to look next. We recommend that before you cut to the daily data pages, you read some more about how analysis works in general, and how we apply it in particular, so that the stats we present will have meaning for you.

The logical next page, then, is the one on Some Baseball Analysis Theory.

Return to the page top. ↑

This page was last modified on Saturday, 26 October 2024, at 4:32 pm Pacific Time.

The Owlcroft Baseball-Analysis Site

Baseball Analysis: An Overview

An Important Note

What Kind of Baseball Information Have We Here?

How Is All This Baseball Information Presented?

How Is This Site Operated?

Baseball Analysis: Some History

Where Next?