Analysis & This Site

An Important Note

When this site was first established, many years ago, few knew about baseball analysis, so that explanations were needed about what it is and how it works. Today, we have the reverse problem: “analytics” is a word that falls trippingly from every tongue—but “analytics” and analysis are very different things.

“Analytics” is a broad umbrella term under which lie a great many metrics; but only a few of those are actual analysis. “Analytics” include such things as spin rate on pitches and launch angle on batted balls. Those sorts of things are valuable information, some very much so and others not so much so; but such metrics are essentially coaching tools. They measure mechanical things, and those measures can be quite useful in working with players to increase their performance value. But true analysis is concerned with what we may call the end product: reckoning how individual men’s actual performances contribute to (as appropriate) their team’s scoring of or prevention of runs, how a team’s stats show how their runs scored and allowed can be expected to be, and how team runs scored and runs allowed contribute to team wins.

That’s it really: just two formulas or equations or whatever one wants to call them: expected wins for a given Runs-Scored/Runs-Allowed pair, and expected runs scored or runs allowed from a set of commonly available count stats.

(Mind, there are also strategic and tactical evaluations that are analytic, such as the break-even success rate for base stealing—about 73%—but those all derive from uses of the runs-scored formula.)

The basics of analysis having been essentially settled matters for many years now, latecomers to the analysis field, finding the essential ground already well covered, have conjured a myriad of other supposedly analytic measures, to no help and, too often, some harm to a good understanding of baseball. There is, for just one example, the so-called “WAR” (“Wins Above Replacement”) measure; aside from the nonsensicality of of comparing a man’s performance to some imaginary “replacement” player, and of measuring not stat rates but accumulated stats (like judging a race car not by its best speed but by miles covered), there is the fact that different sources produce different numbers. Really! The whole essence of true analysis is to generate definite, real-world numbers—runs scored, games won—not more fodder for tavern arguments.

We believe that the measures used on this site—TOP (Total Offensive Productivity) for batting and TPP (Total Pitching Productivity) for pitching—are the most accurate available (The TOP has, over 70 years of baseball data, an average runs-scored error rate of 2.30%); but there are others of value (Base Runs, for example). Farther on, we explain and discuss how such measures are derived, what “expected” means, and a lot more. But only those two measures—expected wins from R|OR data and expected runs scored (or allowed, as appropriate) from commonly available count stats, are about all that one can rightly call “analysis”. (We wish analysis and analytics could be recognized as distinguishable terms, but that ship seems to have sailed.)

We now return you to your regularly scheduled web site.

What Kind of Baseball Information Have We Here?

This site is not a baseball “statistics” site as such—the internet (and, for that matter, your daily newspaper) can supply you with tons of raw baseball statistics. This site is about analyzing baseball statistics to extract deep and powerful measures showing what batters, pitchers, and teams are really doing toward scoring and giving up runs and—the bottom line—winning baseball games. We know how important those special measures are because this was our business for quite a long time.

There are several extensive introductory articles in the pages of this site, and those are where you will find the detailed explanations of the principles and practices that underlie the baseball measures we present on our main display pages. Farther down this page we will give you links to those explanatory essays, but first we will just summarize the highlights.

It has been recognized for something approaching three-quarters of a century now that the winning of baseball games, and the underlying constituents of winning—the scoring and yielding of runs—are processes capable of numerical analysis using the statistics of the game as raw materials. To put it simply, we can take just a few familiar team stats—at-bats, hits, walks, and total bases are one such set—and (assuming these are from some reasonable number of games) predict with striking accuracy a team’s runs-scored total for those games. And, of course, what we can do for batters scoring runs we can turn around and do for pitchers yielding runs. Moreover, given team runs-scored and runs-yielded figures, we can predict with similarly striking accuracy how many of its baseball games a team will have won.

And we can go a step further—a very, very important step. Using the same stats we would use to analyze a team, we can analyze a single man—batter or pitcher. The results we get may best be thought of as the runs that would be scored or given up by, as appropriate, a lineup or a pitching staff made up of exact clones of that man. With that kind of information, we can put together the performance stats for the individual men on a baseball team and derive projected team-total values—and from them, calculate games-won values. In other words, given the identities of the men on a baseball team, we can reliably forecast how many games that team will win with substantial accuracy. (But that does not mean that, for example, we can win lots of money at a baseball sports book, because no one can know in advance how much playing time a given manager will give a specific player, nor who will get injured for how long, nor what trades a team may make when.) But we can say quite exactly just how much a given batter or pitcher is (or is not) doing to help his team win baseball games—and those measures are not some arbitrary relative “rankings”: they are absolute numbers that can be combined to give familiar and important real-world values (like runs scored) for an entire baseball team.

Now as anyone remotely familiar with baseball knows, one of the crucial problems in comparing men’s abilities and performances is that the players throughout major-league baseball are not competing on the proverbial “level playing field.” A given batter playing in, for example, Coors Field would compile very different statistics than he would playing in, again for example, Oracle Park (or whatever they may be calling it this year), even though his actual abilities would be the same.

The 30 quite different baseball parks affect statistics 30 different ways. Nowadays you can find many places that purport to show such park effects on each baseball statistic, often to several decimal places. We here used—many years ago—to calculate, and use, such correction factors to generate “park-neutral” results, but we have stopped doing so. The reasons we stopped are given at some length in our page on park effects, but in essence it has become virtually impossible, in our firmly held opinion, to get even moderately accurate park-effect results, chiefly owing to two things. For one, parks now have results-affecting structural changes made with striking frequency (you might be surprised at how little it takes to have some effect), such that “historical” data—even as recent as last year’s—is too often meaningless; and, even if last year’s data is usable, it is nowadays unlikely that several seasons’ worth of commensurable data exist, meaning we’d be working with an undesireably small data sample. Not to even mention things like retractable roofs that are actually opened and closed (which can have a considerable effect on how the park “plays”) during ongoing ball games, sometimes even in the middle of an inning.

And, for another thing, the vagaries of scheduling have made it hard (even putting aside those changes in the parks) to get a set of normalizeable data representing some sort of standard against which to compare a given park. The old technique was home versus away data, but “away” now represents drastically different combinations of parks (something interleague play has further corrupted) from team to team (as by use or non-use of the DH Rule). We prefer to present results that are frankly unadjusted rather than results “adjusted” by methods that yield deceptive precision but horrid accuracy; one can make crude, rule-of-thumb mental adjustments to data from the more extreme parks, and that’s about as good as applying any of those deceptive published factors.

Going back for a moment to the issue of predicting runs scored or runs yielded from ordinary stats: we want to emphasize the special significance of such measures for pitchers. The usual measure of pitching performance, the ERA, is actually in many ways a better measure of pitchers than any one conventional offensive stat is for batters; but the fact remains that the ERA nevertheless has several severe defects. One is that it can be influenced, often strongly so, by factors not well within any pitcher’s control. Such influences—in simplest terms, “luck”—will more or less average out over the long run, but, for pitchers much more so than for batters, the “long run”, even for a starter, can be a good bit more than one full season. And for relievers, who often come into a game with one or two men already out and don’t stay in long, the ERA is nearly meaningless. The measure that we calculate, on the other hand, shows the actual quality level at which any given pitcher (or staff) is really performing. In the long run, it and the ERA will eventually come into pretty close agreement, but, as someone once remarked, “in the long run, we are all dead”; in the short to medium run, our figure is a much more important and accurate measure of pitching quality than anything else now available.

How Is All This Baseball Information Presented?

The information pages on this site are quite numerous, but way they are laid out is (we hope and believe) simple enough. You should have no trouble at all picking it up from the colored Site Directory roll-over menu blocks at the top of every page of this site.

Besides the complete player and pitcher listings, we also include a page with lists for all “regular” pitchers and batters—all those meeting a minimum-playing-time criterion (in plate appearances or batters faced, as appropriate, with starters and relievers considered separately), which minimum changes daily through the season. Using those lists, one can see at once how the more significant players and pitchers compare all across major-league baseball. We also segregate out by-position lists for position players. Most of those various lists are presented both alphabetically and in a by-performance sorting.

Perhaps chiefest in importance of all the results pages on this site, we have a Team-Performance page on which the performance of all 30 major-league baseball teams as units are shown in terms of both actual and projected runs scored, runs allowed, and games won. That page may be the chiefest here because it shows how well each team is currently playing in terms of how many games they appear aimed—by the quality of their play, not their actual record so far—toward winning on the season, a very simple, easily understood datum. (There is a separate explanation page here about that Table, to help make its many columns readily comprehensible.)

How Is This Site Operated?

We try to keep the baseball stats on these pages up to date daily, but please understand that this site is a sideline at a pretty busy place of business. That also means that we may not be able to respond individually to all e-mail…but be assured that we will read it all! As a rule, updating will happen around 5:30 a.m. Pacific Time (8:30 a.m. Eastern Time), but that is not guaranteed. Our software creates and posts the updates automatically, but those procedures can fail if there is some sort of defect in the raw data we begin with, and we may not discover and be able to repair such failures till later in the day, if at all. (As part of our recent site upgrade, we switched raw-data sourcing to Baseball-Reference.com, a much more reliable source, so such problems should now range from very few to none.) Check the “through games of” notice on each of our data pages to be sure of what you’re seeing. (If the software detects bad raw data, it leaves the previous day’s pages in place.)

Even after years of operation, this entire site is always under development; indeed, it has just (late 2020) undergone a major overhaul. Just as with the old beat writers’ observation, “You come out to the ballyard every day for twenty years, and every day you see something you never saw before”, so with this site; the only constant is change. And if you miss something you would like to see, or see something you would as soon miss, please…let us know. There is a simple click-on “email us” link atop every page of this site to make it easy for you. Take some time, explore the site, then send us your thoughts. And, again, thank you both for visiting here and for any feedback.

One other thing: all pages on this web site have been third-party verified as being 100% fully and strictly compliant with the latest official written HTML and CSS standards.

Baseball Analysis: Some History

Since that joyous day in 1845 when baseball was played for the very first time—on the grounds called by the cosmically appropriate name The Elysian Fields—followers of the art, both amateur and professional, have been seeking measures of player and team performance. There is a fine summary of that search in Alan Schwarz’s book The Numbers Game, so we won’t recount it at length here.

Let it suffice to say that a myriad of such baseball measures have been proposed, and the count grows daily; and each proposed tool has claimed for it by its adherents various reasons why it is important. Till relatively modern times, all such tools—the good and the bad alike—were relative measures: they compared, in one or another way or ways, one man with another or one team with another. They were excellent conversation starters but little else. It was not till the middle of the twentieth century that a serious and sustained effort was made to derive some sort of absolute baseball measure—one that would allow calculation of actual, real-world values of importance (such as games won, the ultimate measure of importance).

In the August 2, 1954, issue of Life magazine, Branch Rickey, possibly the finest mind ever to grace baseball management, set forth a formula (developed with the aid of mathematicians from M.I.T.) that would, in a crude way, predict how many games a team would win based on various commonly available team statistics. In 1964, a Johns Hopkins professor named Earnshaw Cook put out a book titled Percentage Baseball, wherein he derived—by using somewhat arcane mathematical methods (stochastic analysis)—criteria for both teams and individual players that were reasonably successful absolute measures.

Rather later on, the highly articulate and provocative writing style of Bill James propelled these forms of analysis into the consciousness of the general baseball public, and even the ranks of baseball management. Nowadays, everybody and his cousin is producing both books and proprietary measures with which to fill the pages of those books. While a few of these are good and several are awful (no names—you know who you are, or you should), the point of greatest interest to the public is that almost all of these books and measures, no matter how differently expounded, reach pretty much the same results. (Here is one example, and here’s another.) Elsewhere on this site, we present a list of some of the most interesting baseball books out there now.

That consensus of results will, of course, tell the thoughtful one vital thing: this is now a science. Theoretical physicists still argue, with some heat, over small and abstruse side issues about Einstein’s principles; but those debates are of little consequence to anybody besides the participants, whereas nuclear power plants—and weapons—are an everyday fact of life. So in baseball: the practitioners of analysis still quibble and squabble over the minor arcana, but on the whole no reasonable person (which omits many professionally associated with the sport) doubts any more that analysis is the only basis for a true understanding of the inner workings of the subtle and wonderful game of baseball. (Mind, the question of exactly what “analysis” is needs discussion, because even in this day of heavy use of “analytics” there are some significant misunderstandings about that.)

Where Next?

Having come this far on the site, you may be wondering where to look next. We recommend that before you cut to the daily data pages, you read some more about how analysis works in general, and how we apply it in particular, so that the stats we present will have meaning for you.

This page was last modified on Sunday, 8 September 2024, at 10:56 pm Pacific Time.

The Owlcroft Baseball-Analysis Site