«

»

Nov 10

Which Pitching Metrics Matter?

Baseball analysts, gurus, pundits, and fans love to throw around pitching metrics (by metric I mean any statistic related to pitching) and their various importances. By all accounts, the five most popular pitching metrics to date are: Wins, ERA, FIP, WHIP, and WAR. Some prefer the simplicity of Wins, ERA, and WHIP; while others prefer the complexity of FIP (fielding independent pitching) and WAR (wins above replacement).

But how do we know which of these metrics matter, or if all of them do, or if none of them do?

I don’t have any way of determining which is the most important and will lead to the best accuracy in pitching prediction, and if I did I’d probably be hired by the Mets, but I can relate all these metrics to each other and find their correlations.

Before I throw up a chart that makes people cry, a very quick lesson in this one statistic: the coefficient of determination (or R-squared). Each number in the chart represents the R-squared value between two metrics (wins vs ERA for example) I’m comparing. I converted these values into percentages so they’re easier to interpret. Basically, the higher the percentage, the higher the two metrics are correlated, or affect each other. If the R-squared value is 100%, then the two metrics are the same (if I compared ERA to ERA). If it’s 0%, then I’m probably comparing ERA to the number of apples in Wal-Mart. The data I used is from qualifying starters in the NL in 2010 (44 of them). So here’s the chart, and then I’ll do a bit of analysis.

Wins ERA FIP WHIP
ERA 32.8%
FIP 18.1% 53.0%
WHIP 23.4% 68.0% 30.3%
WAR 26.5% 50.2% 88.0% 39.1%

You’ll first notice that the highest correlation is between FIP and WAR, 88%, which makes sense because FIP is used to calculate WAR, so they better be highly related! We’re off to a good start.

The second highest correlation is more telling – ERA and WHIP have a 68% correlation. Neither is used to calculate the other, so we actually have something worth talking about here. What this means is that a pitcher’s WHIP is actually a fairly good determinant of his ERA! A 68% correlation probably isn’t good enough to actually make a pin-point prediction, but it’s enough to see a very solid trend.

Another really interesting thing I see is that Wins have low correlations with…everything! Wins aren’t highly correlated with ERA, FIP, WHIP, or WAR! None of them. What does this mean? It effectively means that the stat of “Wins” is useless. Any Cy-Young voters voting based on a starter hitting some arbitrary win mark, have just been proven to be quite silly. Either Wins are meaningless, or every other stat I’ve listed is. I’ll side with the other stats.

One last thing I’ve noticed is that WAR doesn’t have an incredibly high correlation with anything aside from FIP, again, which have similar calculations. What this means to me is that an analyst should probably either use ERA/WHIP or FIP/WAR to determine the worth a pitcher, but probably not a combination of the pairs, or things may become confusing. Notice how WAR and WHIP’s R-squared value is only 39%. Fangraphs may love WHIP and they may love WAR, but this proves that the two really don’t have much in common!

Wins ERA FIP WHIP
ERA 32.8%
FIP 18.1% 53.0%
WHIP 23.4% 68.0% 30.3%
WAR 26.5% 50.2% 88.0% 39.1%

Related posts:

28 comments

  1. Ceetar

    interesting..but, how about team wins instead of just wins? probably see a slightly higher correlation. But yes, pitching can’t win games, at best it keeps you from losing them. (Well, unless you bean/injure enough opponents that they have to forfeit I guess, but that doesn’t really count)

    And yet ,pitching supposedly wins games. Why not just grab a bunch of average pitchers and slug your way to victory?

    Because there is a goal. Pitchers don’t exist in an vacuum, it’s a duel, and there is no statistic that measures how good you have to pitch on any given day. It’s wholy determined by the other pitcher. The goal is to pitch in a manner that allows the other team to score less runs than you score.

    And this is where you get into defining the Cy Young award, which is should be. Every single person would rather have had (notice the past tense) a pitcher with C.C. Sabathia’s results in 2010 over Felix Hernandez’s.

    1. Prismo

      Re: team wins
      I’d be interested in those results as well, but unfortunately they’re not available in mass, as in I’d have to look that up for each individual player of the 44…which I don’t feel like doing.

      However, just for you, I recalculated the correlations using win% instead of wins…and found very similar results.

  2. wannybackstra

    Nice work.

    1. Prismo

      Thank you. Tell Sandy to hire me and I’ll do this stuff for him…and more!

      1. wannybackstra

        I won’t ask what the “more” is that you’d do for him if he gave you the job, Mr. Prismo Lewinsky. ;-)

  3. Mr North Jersey

    Sorry Prismo your not our sabermetrician your our meteorologist.

    Do not pass Go Do not collect $200.00

    :-P

    Now that that’s settled will it rain tomorrow?

    ;-)

    LoL just fu*in wit u.

    1. Prismo

      BRUTAL. ;)

      You might be surprised how much the two actually have in common. i did this sort of stuff in my forecasting job as well.

      1. Mr North Jersey

        You have peaked my curiosity Prismo.

        When you get bored how bout a lil back story on how in your experiences the 2 have similarities?

        I know i would def find that interesting to read.

        1. Prismo

          Well, statistics are everywhere!

          But for a great example, we had a forecast product that we sold, which was basically a daily/weekly summer heat forecast for a number of US cities.

          The way my company had been doing the product was to categorize historical summer temperatures into a few categories (extreme, strong, moderate, low heat…categories like that).

          I don’t remember exactly how this went down, but just by making forecasts and fitting them into those categories, it sometimes didn’t make sense. I’d forecast a crazy hot day somewhere and it would only be “strong heat”…but only because in the past there were a few even hotter temperatures on that day in that city…by complete coincidence.

          I recategorized the heat categories based on standard deviations from the mean (how far from the average temperature) and emailed it all to my boss.

          But he didn’t like it – because the clients like things being compared to historical heat events instead of what I was doing. Statistically, my way was much better, but the clients liked to be able to look up how stocks did and gas traded in similar heating events…and the old way we did it was better for that…I guess.

          I wasn’t totally sold on it.

          1. Mr North Jersey

            Nice Prismo!

          2. rustyjr

            lawn chairs are every where also !!

  4. GravediggerHebner

    In the last 4 or 5 years I became aware of WHIP and started to use it both in just my being a baseball fan and also in playing fantasy baseball.

    This knowledge has had a much more profound effect on me in a fantasy baseball context. I’ve had consistently better pitching staffs on my fantasy teams since I began paying attention to it.

    In the context of just being a baseball fan it hasn’t had as obvious an effect but I do know that when I discuss players or trades with friends I certainly use it as a tool in the discussion.

    All that leads me to say that Mike Pelfrey’s WHIP scares the hell out of me. I would never dream of owning him in fantasy baseball and as a Mets fan just watching him pitch is never a relaxing thing with all the baserunners he allows.

    1. Prismo

      Psst, here’s the linear equation for 2010:
      ERA = (4.440)(WHIP)-(2.033)

      A ~70% correlation isn’t great, but check this out for Mike Pelfrey.
      Career WHIP = 1.46
      Career ERA = 4.31

      Using the equation, the predicted ERA is 4.45.

      Pretty frickin’ close.

      1. Prismo

        And that equation’s only from 44 samples! Imagine if I used a 5-year sample…could become deadly accurate. HIRE ME SANDY!!!

  5. wannybackstra

    I don’t think there could be any question that WHIP would correlate to the number of runs scored. As Sandy would tell you, it’s about probabilities. The more runners on, the more runners that will score.

    What’s scary about Pelfrey’s WHIP is the number of Ks that he achieves to get out of those jams. Can’t expect double plays — and have to rely on the defense to turn them.

    1. GravediggerHebner

      Yes the lack of Ks makes the sheer number of baserunners that much more terrifying. The GB% and potential DP are nice but a lot more has to go right for that to work out than simply striking out the batter.

      This is only 2010 but NL avg WHIP was 1.347, Pelfs was 1.377. Of the starting pitchers worse than league average only Maholm, Bush & Wolf had worse K/BB ratio and only Kendrick, Volstad, Bush & Maholm had fewer total K.

      The idea of the Mets throwing Pelfrey opening day and against the “aces” of the league frightens me. I don’t hate the guy and think he’s useful but would much rather he be deployed later in a rotation.

      1. Prismo

        Great stuff there Grave. Now you’re starting to worry me about Pelf… :p

        1. Prismo

          Hey, try again taking out 7 starts in a row from June to 30 to Aug 4. Think his ERA is under 3 aside from that period, and his WHIP is far below 1.377!

          There’s hope afterall! He just has to be more consistent, perhaps.

          1. GravediggerHebner

            Understood. He’s young and still has upside, and as someone who attended his 1st MLB start I’m rooting for him. Just not excited about him being forced due to Santana’s absence and the poor contracts given by the previous regime to be matched up against starters perhaps he shouldn’t be, yet.

        2. GravediggerHebner

          If you or anyone care to check it out here’s the NL SP 2010 list by WHIP

          http://www.baseball-reference.com/leagues/NL/2010-standard-pitching.shtml#players_standard_pitching::27

  6. kingman 26

    Prismo, great work!

    I have thought about walks and hits allowed since I was a kid, and long before the term WHIP became cool.

    I am so glad this agrees with the seemingly obvious points about wins—Trachsel could win 15 while sucking with a good offensive team.

    Blyleven or Seaver could go 18–12, but if on the 2006 Mets they’d have been 25-5 with the same personal stats.

    Nice work.

    1. Prismo

      Thanks so much King!

      You’ll notice I prefer not to use “sabermetrics” in my statistics posts, because it’s really just a fad word to make people feel like they’re extra-intelligent.

      Sabermetrics are just statistics which are just data which has been around FOREVER.

      Maybe you’re more of a “sabermetrician” than you think. ;) haha

      1. kingman 26

        Agreed all around!

        I swear, in the 70s I would look at baseball cards and try to figure out how many runners a pitcher allowed each inning!

  7. metsfan4decades

    Wow, Prismo….nice work. I’m especially impressed because I’m in the camp of the mathematically challenged and I actually understood what you’re trying to say here.

    I think any serious baseball fan has to have come to the realization in the past several years that win totals for pitchers are not the best indicator on how good of a pitcher he is.

    1. Prismo

      Thank you *so* much 4D! This really means a lot to me – this stuff means nothing if people can’t understand it. :D

  8. stickguy

    WHIP makes sense as a positive indicator. basically, it is OBP for pitchers (in my persepctive).

    for offense, more guys on (OBP) logically means more guys should score.

    For pitchers, less guys on (lower WHIP) logically means that fewer runs will be scored against you.

    on average of course, individual circumstances will vary!

    1. GravediggerHebner

      closed course, professional driver, batteries not included, may cause diarrhea.

  9. oleosmirf

    If Pelf gives us exactly the same numbers as he did in 2008 and 2010, he’ll be perfectly fine. Those type of numbers are basically what you are looking for from a #3 pitcher and i’m sure he will improve on his craft as he gets older.

Leave a Reply

Your email address will not be published. Required fields are marked *