"A new scientific truth does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die and a new generation grows up that is familiar with it."

—Max Planck

Baseball Analysis: An Overview

What Kind of Baseball Information Have We Here?

This site is not a baseball "statistics" site as such—the internet (and, for that matter, your daily newspaper) can supply you with tons of raw baseball statistics. This site is about analyzing baseball statistics to extract deep and powerful measures showing what batters, pitchers, and teams are really doing toward scoring and giving up runs and—the bottom line—winning baseball games. We know how important those special measures are because this was our business for quite a long time.

There are several extensive introductory articles in the pages of this site, and those are where you will find the detailed explanations of the principles and practices that underlie the baseball measures we present on our main display pages. Farther down this page we will give you links to those explanatory essays, but first we will just summarize the highlights.

It has been recognized for well over half a century now that the winning of baseball games, and the underlying constituents of winning—the scoring and yielding of runs—are processes capable of numerical analysis using the statistics of the game as raw materials. To put it simply, we can take just a few familiar team stats—at-bats, hits, walks, and total bases are one such set—and (assuming these are from some reasonable number of games) predict with striking accuracy a team's runs-scored total for those games. And, of course, what we can do for batters scoring runs we can turn around and do for pitchers yielding runs. Moreover, given team runs-scored and runs-yielded figures, we can predict with similarly striking accuracy how many of its baseball games a team will have won.

(We're not going to explain here how and why those things are so, because we do that at length on other pages of the site, to which we will refer you farther on.)

And we can go a step further—a very, very important step. Using the same stats we would use to analyze a team, we can analyze a single man, batter or pitcher. The results we get may best be thought of as the runs that would be scored or given up by, as appropriate, a lineup or a pitching staff made up of exact clones of that man. With that kind of information, we can put together the performance stats for the individual men on a baseball team and derive projected team-total values—and from them, calculate games-won values. In other words, given the identities of the men on a baseball team, we can reliably forecast how many games that team will win with substantial accuracy. (But that does not mean that, for example, we can win lots of money at a baseball sports book, because no one can know in advance how much playing time a given manager will give a specific player, nor who will get injured for how long, nor what trades a team may make when.) But we can say quite exactly just how much a given batter or pitcher is (or is not) doing to help his team win baseball games—and those measures are not some arbitrary relative "rankings": they are absolute numbers that can be combined to give familiar and important real-world values (like runs scored) for an entire baseball team.

Now as anyone remotely familiar with baseball knows, one of the crucial problems in comparing men's abilities and performances is that the players throughout major-league baseball are not competing on the proverbial "level playing field." A given batter playing in, for example, Coors Field would compile very different statistics than he would playing in, again for example, AT&T Park (or whatever they may be calling it this year), even though his actual abilities would be the same.

The 30 quite different baseball parks affect statistics 30 different ways. Nowadays you can find many places that purport to show such park effects on each baseball statistic, often to several decimal places. We here used to calculate, and use, such correction factors to generate "park-neutral" results, but we have stopped doing so. The reasons we stopped are given at some length in our page on park effects, but in essence it has become virtually impossible, in our opinion, to get even moderately accurate results, chiefly owing to two things. For one, parks now have results-affecting structural changes made with striking frequency (you might be surprised at how little it takes to have some effect), such that "historical" data—even as recent as last year's—is too often meaningless; and, even if last year's data is usable, it is nowadays unlikely that several seasons' worth of commensurable data exist, meaning we'd be working with an undesireably small data sample. Not to even mention things like retractable roofs that are actually opened and closed (which can have a considerable effect on how the park "plays") during ongoing ball games.

(And while we speak of "changes", besides those there have been wholly new ballparks coming on line at something close to one a season for a long time now, and that trend may well continue for a while.)

And for another thing, the vagaries of scheduling have made it hard (even putting aside those changes in the parks) to get a set of normalizeable data representing some sort of standard against which to compare a given park. The old technique was home versus away data, but "away" now represents drastically different combinations of parks (something interleague play has further corrupted) from team to team (as by use or non-use of the DH Rule). We prefer to present results that are frankly unadjusted rather than results "adjusted" by methods that yield deceptive precision but horrid accuracy; one can make crude, rule-of-thumb mental adjustments to data from the more extreme parks, and that's about as good as applying any of those deceptive published factors.

Park-specific biases are not the only normalization applicable to baseball stats. Another factor is the undoubtable and major "juicing" that the baseball itself underwent sometime in 1993: results from 1977 (when the brand of ball was changed) through 1992 are simply not commensurable with results from 1994 on (with 1993 as a bizarre in-between thing of its own). In years past, we used to apply a correction for that change; now, since stats from 1993 and earlier no longer show in anyone's resumés, there is no further need for such historical corrections. But the phenomenon is worth remembering for those interested in historical evaluations (and discussions of the fictitious, factitious "steroids era").

Going back for a moment to the issue of predicting runs scored or runs yielded from ordinary stats: we want to emphasize the special significance of such measures for pitchers. The usual measure of pitching performance, the ERA, is actually in many ways a better measure of pitchers than any one conventional offensive stat is for batters; but the fact remains that the ERA nevertheless has several severe defects. One is that it can be influenced, often strongly so, by factors not well within any pitcher's control. Such influences—in simplest terms, "luck"—will more or less average out over the long run, but, for pitchers much more so than for batters, the "long run," even for a starter, can be a good bit more than one full season. And for relievers, who often come into a game with one or two men already out and don't stay in long, the ERA is nearly meaningless. The measure that we calculate, on the other hand, shows the actual quality level at which any given pitcher (or staff) is really performing. In the long run, it and the ERA will eventually come into pretty close agreement, but, as someone once remarked, "in the long run, we are all dead"; in the short to medium run, our figure is a much more important and accurate measure of pitching quality than anything else now available.

How Is All This Baseball Information Presented?

The way the information pages on this site are laid out is simple enough. You should have no trouble at all picking it up from the colored Site Directory roll-over menu blocks at the top of every page of this site.

(There is also an extensive explanation of what is on this site given on the site Home Page; if you haven't yet read it, you should stop here and follow the link back to it.)

Besides the complete player and pitcher listings, we also include a page with lists for all "regular" pitchers and batters—all those meeting a minimum-playing-time criterion (in plate appearances or batters faced, as appropriate, with starters and relievers considered separately), which minimum changes daily through the season. Using those lists, one can see at once how the more significant players and pitchers compare all across major-league baseball.

Perhaps chiefest in importance of all the results pages on this site, we have a Team-Performance page on which the performance of all 30 major-league baseball teams as units are shown in terms of both actual and projected runs scored, runs allowed, and games won. That page may be the chiefest here because it shows how well each team is currently playing in terms of how many games they appear aimed—by the quality of their play, not their actual record so far—toward winning on the season, a very simple, easily understood datum. (There is a separate explanation page here about that Table, to help make its many columns readily comprehensible.)

How Is This Site Operated?

We try to keep the baseball stats on these pages up to date daily, but please understand that this site is a sideline at a pretty busy place of business. That also means that we may not be able to respond individually to all e-mail…but be assured that we will read it all! As a rule, updating will happen around 6:00 a.m. Pacific Time (9:00 a.m. Eastern Time), but that is not guaranteed. Our software creates and posts the updates automatically, but those procedures can fail if, as is often the case, there is a defect in the raw data we begin with, and we may not discover and be able to repair such failures till later in the day, if at all. It is also by no means uncommon for our supposedly professional sources of raw baseball data to fail to post their information timely. Check the "through games of" notice on each of our data pages to be sure of what you're seeing. (If the software detects bad raw data, it leaves the previous day's pages in place.)

The data and measures were formerly only for the current major-league season. Starting with 2009, we have made available full career results—both season-by-season and cumulative—for each man appearing on a major-league roster during the current season; those pages include the current season's data in the cumulative career numbers, so those change every day, just as all the other results pages do. (But we do not, at this time have or plan on minor-league stats, though methods exist to convert them with reasonable reliability into major-league equivalencies; perhaps some other year…) Those career pages are not separately listed anywhere: you access them by clicking on a man's name wherever you find it on one of the regular listings—by team, by position or role, whatever.

Even after years of operation, this entire site is always under development. Just as with the old beat writers' observation, "You come out to the ballyard every day for twenty years, and every day you see something you never saw before", so with this site; the only constant is change. And if you miss something you would like to see, or see something you would as soon miss, please…let us know. There is a simple click-on "email me" link atop every page of this site to make it easy for you. Take some time, explore the site, then send us your thoughts. And, again, thank you both for visiting here and for any feedback.

One other thing: all pages on this web site have been third-party verified as being 100% fully and strictly compliant with the latest official written standards (currently version 1.0 "Transitional") for extended hypertext language ("XHTML"). That means that any properly designed web browser should display these pages exactly as they were meant to be seen. If you experience any systemic trouble reading these pages, your web browser is likely to be, as with so many, including—notably including—the most famous names (can you spell "Micro$oft"?), ill-designed.

Baseball Analysis: Some History

Since that joyous day in 1845 when baseball was played for the very first time—on the grounds called by the cosmically appropriate name The Elysian Fields—followers of the art, both amateur and professional, have been seeking measures of player and team performance. There is a fine summary of that search in Alan Schwarz's book The Numbers Game, so we won't recount it at length here.

Let it suffice to say that a myriad of such baseball measures have been proposed, and the count grows daily; and each proposed tool has claimed for it by its adherents various reasons why it is important. Till relatively modern times, all such tools—the good and the bad alike—were relative measures: they compared, in one or another way or ways, one man with another or one team with another. They were excellent conversation starters but little else. It was not till the middle of the twentieth century that a serious and sustained effort was made to derive some sort of absolute baseball measure—one that would allow calculation of actual, real-world values of importance (such as games won, the ultimate measure of importance).

In the August 2, 1954, issue of Life magazine, Branch Rickey, possibly the finest mind ever to grace baseball management, set forth a formula (developed with the aid of mathematicians from M.I.T.) that would, in a crude way, predict how many games a team would win based on various commonly available team statistics. In 1964, a Johns Hopkins professor named Earnshaw Cook put out a book titled Percentage Baseball, wherein he derived—by using somewhat arcane mathematical methods (stochastic analysis)—criteria for both teams and individual players that were reasonably successful absolute measures.

More recently, the highly articulate and provocative writing style of Bill James propelled these forms of analysis into the consciousness of the general baseball public, and even the ranks of baseball management. Nowadays, largely because of the success of James' books, everybody and his cousin is producing both books and proprietary measures with which to fill the pages of those books. While a few of these are good and several are awful (no names—you know who you are, or you should), the point of greatest interest to the public is that almost all of these books and measures, no matter how differently expounded, reach pretty much the same results. (Here is one example, and here's another.) Elsewhere on this site, we present a list of some of the most interesting baseball books out there now, but you can also visit our Baseball "Library" to see thousands of popular baseball-related books (which you can buy there).

That consensus of results will, of course, tell the thoughtful one vital thing: this is now a science. Theoretical physicists still argue, with some heat, over small and abstruse side issues about Einstein's principles; but those debates are of little consequence to anybody besides the participants, whereas nuclear power plants—and weapons—are an everyday fact of life. So in baseball: the practitioners of analysis—which some like to call SABRemetrics (often mistakenly rendered as sabermetrics) after SABR, an excellent nonprofit research organization which has done outstanding work in restoring missing baseball records and developing standards for measuring performance, although analysis of the sort we discuss here was not their primary focus—still quibble and squabble over the minor arcana, but on the whole no reasonable person (which omits many professionally associated with the sport) doubts any more that analysis is the only basis for a true understanding of the inner workings of the subtle and wonderful game of baseball.

Where Next?

Having come this far on the site, you may be wondering where to look next. We recommend that before you cut to the daily data pages, you read some more about how analysis works in general, and how we apply it in particular, so that the stats we present will have meaning for you.

The logical next page, then, is the one on Analysis Basics.



