I wish you would stop beating the same single drum on bWAR - I've been reading this article from you about Aaron Nola for several years now. It was illuminating the first time I read it and now, it's just old. If you want to change bWAR (and your case seems persuasive as does Tom Tango's), then there are a few people that work in Philadelphia that you have to convince, and making the same public case of it over and over again based on one player for multiple years probably works makes them retrench. Frankly, I'm more concerned about the way that the BBWAA has gone from completely ignoring both bWAR and fWAR to treating them like they are some sort of sacred text. As an example, many writers voted for Altuve over Judge for 2017 MVP, citing bWAR as the reason. Six months later (when park factors were adjusted), Judge was all of a sudden ahead of Altuve (and has remained so). These things are metrics that we are still just learning about and are still evolving and will continue to evolve over the years.
I think that there's nothing wrong with writing, at the end of the season where there are going to be awards given, to bring up the WAR issue again. If he went down a rabbit hole regarding prior years & comparing old individual cases, that would be very tiring. That said, your overall point is correct. WAR is neither something to embrace as the Bible (I just list my votes for CY in WAR order), nor is it something to be ignored because it's too nerdy (Bring back wins!). I will say that Joe has many times treated WAR as Bible (Mike Trout!). But it's good to see when he digs in to make sure that the WAR makes sense in high profile cases.
I don't have an issue with the Nola comparison, even if used prior. I thought it was a good set up for this discussion. I do agree that BBWAA members going from completely ignoring WAR to treating WAR as the final word is a problem. It's almost as if they're afraid to put additional thought into the matter, and they'll be criticized if they vote for a "non-WAR" candidate. They're voting in some cases out of fear of ridicule.
I agree, too much time missing for deGrom to garner awards consideration, but CYA wasn't the conversation. Wondering wher deGrom was at the point he left the active roster for good is a curiosity given his dominance in the half-or-so season we got to see him.
Cf., 1971, when Vida Blue led the league in ERA, shutouts and WHIP, and won the Cy Young and MVP, and had a bWAR of 9.0, while Wilbur Wood, who had a great season too (1/10 of a run behind Vida, 30 more IP, 100 fewer Ks and 20 fewer walks), was somehow off the charts: 11.7.
I also think both WARs drastically underrate catchers. By bWAR, the greatest OF is Barry Bonds (1st overall), greatest shortstop is Honus (7th overall), 2B/Rajah is 9th, 1B/Lou is 13th and 3B/Schmidt is 18th. And the greatest catcher of all time? Johnny Bench, who's 51st. I get that catchers don't last as long; but when they play, they're involved in *every single play* on defense and it never really shows up in their defensive WAR. Not sure how you do it, but seems a rook the way it is.
I was thinking about this the other day due to Salvador Perez and why his RAA (runs above average) wasn't as high as other top position players. My hunch is the rarity of producing outs. A catcher who blocks pitches in the dirt is preventing runners from advancing. Obviously throwing out runners trying to steal counts, but when you throw out half of 'em then they stop trying as often. It takes a lot of blocked pitches to add up to a defensive win and I don't even know if runners not attempting to steal is included which if true validates your case that a great catcher is undersold by WAR.
I think the Blue/Wood rabbit hole is another example of the defensive adjustment. The White Sox had a very poor defensive team, which got perhaps a little worse the next year when Dick Allen joined them. While the A's had an overall really good defensive team.
BRef and FanGraphs have two different purposes and data sets. It’s not really great to say one is better than another’s. The better question is which one are you using and why? Or design your own WAR as many teams and analysts do based on what they value and keep in mind that as a model there is a margin of error. It’s the framework and concept that’s important. There are times I use BR, sometimes I use FG, sometimes I use baseball prospectus or I’ll combine components from each of the systems together.
Glad I clicked through to see this on the substack page because there was a swap of Fangraphs for BR in one section on the email version and I started to confuse myself on which one was using which defensive adjustments.
The thing that always confuses me is: why haven't the WAR stats updated yet? We've known about this issue with B-WAR for many year now, and Joe's post about the Nola season was close to three years ago at this point. But the formula doesn't seem to have changed.
Statcast has been around since 2015 now. But BR still uses DRS, which is (should be?) less reliable. If we had perfect defensive data, the defensive adjustment wouldn't be as big of a problem because we would always be making the "correct" adjustment. So why isn't BR trying to stay at the forefront of defensive statistics? Or are they, and I just haven't realized?
This is mostly a question for Tango or another expert. I'll admit that I haven't followed baseball stats too closely since about 10 years ago. But in a lot of ways the "headline numbers" don't seem to have changed since then. Am I wrong about this? Where should I go to see the most up-to-date measures of baseball value, regardless of particular philosophies?
DRS, which looks at every play, is certainly more reliable than the outdated UZR estimation, which fangraphs uses for defense for offensive players.
The problem is not necessarily DRS, but the fact that BR's adjustment is too high. I think the adjustments for park and team defense should be cut about in half. It would make BR, which has the better base and is far better offensively (partially because they are using a far superior defensive rating) easily the king
Fangraphs WAR just flat out sucks. On offense they insist on continuing to use UZR, because I believe it is proprietary to them, over DRS, even though they use DRS on their site already. With pitching, they start with FIP. FIP has it's uses as an estimator. It is a fine thing to look at when trying to project a pitcher's future. It is no place to begin their WAR with. WAR is about how a pitcher performed. FIP is about how you would expect a pitcher to perform based on the three true outcomes.
While I agree BR needs to lessen their defensive adjustments on the pitchers - I don't think they break it down to how the defense performs while the pitcher is on the mound, they take the complete defense and assume it played the same. You get into problems when you make big adjustments from estimations. Not quite as big of a problem as FG does basing their entire WAR on an estimation from three things, but still a problem. It is an easy fix and should be addressed. You have to lower that adjustment.
I wonder if Wainwright's defensive support isn't somewhat driven by the way he pitches. It's weird to watch- at least by today's standards. He pitches quickly, for one. And it looks like he is throwing junk balls on virtually every pitch. It's like Mike Flanagan after he lost velocity on his fastball (or Mike Boddicker pretty much his entire career). The major leagues used to be full of guys like this- pitchers not throwers. So he gets people to put the ball in play and his defense does the work
I wonder what the advanced stats would say about "Madduxes" - complete games with fewer than 100 pitches. It would have said "BABIP was lower than average so this isn't sustainable" but the way Greg Maddux pitched was obviously sustainable. He just didn't waste pitches
The Orioles philosophy used to be "3 pitches per at bat" as the goal. They preferred ground balls to strikeouts. When you have Brooks Robinson and Mark Belanger on one side of the infield, you understand that, but you do wonder if this is another place where the "strikeout/home run" phenomenon has hurt the game
For example, Jim Palmer is slightly less valuable (career value) than Dwight Gooden and Chuck Finley (high 50s) under Fangraphs while under BR he is up around Kevin Brown and Max Scherzer (high 30s) while Finley is 78 and Gooden is 95. Just does not pass the smell test for me.
I think we are overvaluing strikeouts in an era where everyone strikes out. Palmer would easily have 250+ strikeouts a year pitching against today's hitters. Try doing that in 1974 (Nolan Ryan struck out 367 in 1974- that would be 500 today). Mike Schmidt led the league with 138 strikeouts. 138!!!! And hit 36 home runs. 138 strikeouts is what you get for a 2B who hits .231 and has 16 home runs today.
I know the point is to balance between eras but it feels like the stats are actually warped
Not the point of Joe's post, but the fact that bWAR used RA instead of RE24 means that nonsense related to inherited runners is slipping into the conversation.
The flip side of all this is to look at fWAR for Burnes (7.1) compared to Scherzer (5.2). As you noted in your original post, Scherzer has a better ERA and WHIP, and he has also pitched slightly more innings (162 to 152). As you note fWAR uses FIP, where Burnes is king, but looking at the numbers underlying FIP, they have the same K% (35.4%) and Burnes only has a slight lead in BB% (4.9% to 5.3%), which means the HR avoidance is driving most of the value. Burnes certainly gives up less FBs than Scherzer (30.2% to 48.3%), which is significant, but Burnes is also benefit from a notably low 4.8% HR/FB rate, as opposed to Scherzer's league average-ish 11.8%. Unless research has changed recently, my understanding is that pitcher HR/FB rates tend to be mostly out of their control. As dissatisfying as I find Miley having a higher bWAR than Burnes, fWAR concluding that he has been two full wins better than Scherzer feels equally absurd.
Yes, the research changed. Or more accurately, the research you cite was more theoretical. We now know how hard each ball is hit and at the angle, so we don't have to wish-away all those HR. Burnes giving up as few HR as he has is consistent with his performance.
Funny you mention Nola and Wainwright in this, as looking at their numbers this year helps to show that FanGraphs WAR must necessarily also have some issues:
Aaron Nola: 7-8, 4.58 ERA in 163 IP
Wainwright: 16-7, 2.88 ERA in 190 IP
Both pitchers are worth 3.9 fWAR, to date. Nola strikes out more batters (about 3 more per 9IP), so he gets extra credit for that, I guess, but his much higher BABiP (.306 vs .254) speaks to how much Wainwright's defense has helped him, so I guess fWAR takes credit away from him and gives it to his fielders.
bWAR has them as 2.1 for Nola and 3.5 for Wainwright, which makes more sense.
Perhaps bWAR is a better metric for evaluating a pitcher's performance looking back, since it starts with the runs they actually allowed, but fWAR is better for predicting future performance, since it focuses more on the things they can control?
Either way, I agree that the defensive adjustments (just like in your column about Jeter last week) are far too aggressive.
Nothing wrong with taking both sites' numbers, adding them, and dividing by two... that gives you Burnes 6.2 wins above and Miley 4.5. Sounds about right imo
Where do I find the Outs above average for each pitcher?
I wish you would stop beating the same single drum on bWAR - I've been reading this article from you about Aaron Nola for several years now. It was illuminating the first time I read it and now, it's just old. If you want to change bWAR (and your case seems persuasive as does Tom Tango's), then there are a few people that work in Philadelphia that you have to convince, and making the same public case of it over and over again based on one player for multiple years probably works makes them retrench. Frankly, I'm more concerned about the way that the BBWAA has gone from completely ignoring both bWAR and fWAR to treating them like they are some sort of sacred text. As an example, many writers voted for Altuve over Judge for 2017 MVP, citing bWAR as the reason. Six months later (when park factors were adjusted), Judge was all of a sudden ahead of Altuve (and has remained so). These things are metrics that we are still just learning about and are still evolving and will continue to evolve over the years.
I think that there's nothing wrong with writing, at the end of the season where there are going to be awards given, to bring up the WAR issue again. If he went down a rabbit hole regarding prior years & comparing old individual cases, that would be very tiring. That said, your overall point is correct. WAR is neither something to embrace as the Bible (I just list my votes for CY in WAR order), nor is it something to be ignored because it's too nerdy (Bring back wins!). I will say that Joe has many times treated WAR as Bible (Mike Trout!). But it's good to see when he digs in to make sure that the WAR makes sense in high profile cases.
I don't have an issue with the Nola comparison, even if used prior. I thought it was a good set up for this discussion. I do agree that BBWAA members going from completely ignoring WAR to treating WAR as the final word is a problem. It's almost as if they're afraid to put additional thought into the matter, and they'll be criticized if they vote for a "non-WAR" candidate. They're voting in some cases out of fear of ridicule.
NERDS!!!!!
jk
Where’s deGrom in this conversation?
On the IL for far too long to be in the CYA conversation.
I agree, too much time missing for deGrom to garner awards consideration, but CYA wasn't the conversation. Wondering wher deGrom was at the point he left the active roster for good is a curiosity given his dominance in the half-or-so season we got to see him.
Yep.
Cf., 1971, when Vida Blue led the league in ERA, shutouts and WHIP, and won the Cy Young and MVP, and had a bWAR of 9.0, while Wilbur Wood, who had a great season too (1/10 of a run behind Vida, 30 more IP, 100 fewer Ks and 20 fewer walks), was somehow off the charts: 11.7.
I also think both WARs drastically underrate catchers. By bWAR, the greatest OF is Barry Bonds (1st overall), greatest shortstop is Honus (7th overall), 2B/Rajah is 9th, 1B/Lou is 13th and 3B/Schmidt is 18th. And the greatest catcher of all time? Johnny Bench, who's 51st. I get that catchers don't last as long; but when they play, they're involved in *every single play* on defense and it never really shows up in their defensive WAR. Not sure how you do it, but seems a rook the way it is.
I was thinking about this the other day due to Salvador Perez and why his RAA (runs above average) wasn't as high as other top position players. My hunch is the rarity of producing outs. A catcher who blocks pitches in the dirt is preventing runners from advancing. Obviously throwing out runners trying to steal counts, but when you throw out half of 'em then they stop trying as often. It takes a lot of blocked pitches to add up to a defensive win and I don't even know if runners not attempting to steal is included which if true validates your case that a great catcher is undersold by WAR.
I think the Blue/Wood rabbit hole is another example of the defensive adjustment. The White Sox had a very poor defensive team, which got perhaps a little worse the next year when Dick Allen joined them. While the A's had an overall really good defensive team.
Some companies calculate standard product cost to 5 decimal points and the decimal point is in the wrong place. Sounds like DRS.
On the other hand, watching the Royals new pitchers from start to start, it's obvious that command has ENORMOUS impact on balls in play.
BRef and FanGraphs have two different purposes and data sets. It’s not really great to say one is better than another’s. The better question is which one are you using and why? Or design your own WAR as many teams and analysts do based on what they value and keep in mind that as a model there is a margin of error. It’s the framework and concept that’s important. There are times I use BR, sometimes I use FG, sometimes I use baseball prospectus or I’ll combine components from each of the systems together.
Glad I clicked through to see this on the substack page because there was a swap of Fangraphs for BR in one section on the email version and I started to confuse myself on which one was using which defensive adjustments.
The thing that always confuses me is: why haven't the WAR stats updated yet? We've known about this issue with B-WAR for many year now, and Joe's post about the Nola season was close to three years ago at this point. But the formula doesn't seem to have changed.
Statcast has been around since 2015 now. But BR still uses DRS, which is (should be?) less reliable. If we had perfect defensive data, the defensive adjustment wouldn't be as big of a problem because we would always be making the "correct" adjustment. So why isn't BR trying to stay at the forefront of defensive statistics? Or are they, and I just haven't realized?
This is mostly a question for Tango or another expert. I'll admit that I haven't followed baseball stats too closely since about 10 years ago. But in a lot of ways the "headline numbers" don't seem to have changed since then. Am I wrong about this? Where should I go to see the most up-to-date measures of baseball value, regardless of particular philosophies?
DRS, which looks at every play, is certainly more reliable than the outdated UZR estimation, which fangraphs uses for defense for offensive players.
The problem is not necessarily DRS, but the fact that BR's adjustment is too high. I think the adjustments for park and team defense should be cut about in half. It would make BR, which has the better base and is far better offensively (partially because they are using a far superior defensive rating) easily the king
Fangraphs WAR just flat out sucks. On offense they insist on continuing to use UZR, because I believe it is proprietary to them, over DRS, even though they use DRS on their site already. With pitching, they start with FIP. FIP has it's uses as an estimator. It is a fine thing to look at when trying to project a pitcher's future. It is no place to begin their WAR with. WAR is about how a pitcher performed. FIP is about how you would expect a pitcher to perform based on the three true outcomes.
While I agree BR needs to lessen their defensive adjustments on the pitchers - I don't think they break it down to how the defense performs while the pitcher is on the mound, they take the complete defense and assume it played the same. You get into problems when you make big adjustments from estimations. Not quite as big of a problem as FG does basing their entire WAR on an estimation from three things, but still a problem. It is an easy fix and should be addressed. You have to lower that adjustment.
I wonder if Wainwright's defensive support isn't somewhat driven by the way he pitches. It's weird to watch- at least by today's standards. He pitches quickly, for one. And it looks like he is throwing junk balls on virtually every pitch. It's like Mike Flanagan after he lost velocity on his fastball (or Mike Boddicker pretty much his entire career). The major leagues used to be full of guys like this- pitchers not throwers. So he gets people to put the ball in play and his defense does the work
I wonder what the advanced stats would say about "Madduxes" - complete games with fewer than 100 pitches. It would have said "BABIP was lower than average so this isn't sustainable" but the way Greg Maddux pitched was obviously sustainable. He just didn't waste pitches
The Orioles philosophy used to be "3 pitches per at bat" as the goal. They preferred ground balls to strikeouts. When you have Brooks Robinson and Mark Belanger on one side of the infield, you understand that, but you do wonder if this is another place where the "strikeout/home run" phenomenon has hurt the game
For example, Jim Palmer is slightly less valuable (career value) than Dwight Gooden and Chuck Finley (high 50s) under Fangraphs while under BR he is up around Kevin Brown and Max Scherzer (high 30s) while Finley is 78 and Gooden is 95. Just does not pass the smell test for me.
I think we are overvaluing strikeouts in an era where everyone strikes out. Palmer would easily have 250+ strikeouts a year pitching against today's hitters. Try doing that in 1974 (Nolan Ryan struck out 367 in 1974- that would be 500 today). Mike Schmidt led the league with 138 strikeouts. 138!!!! And hit 36 home runs. 138 strikeouts is what you get for a 2B who hits .231 and has 16 home runs today.
I know the point is to balance between eras but it feels like the stats are actually warped
Not the point of Joe's post, but the fact that bWAR used RA instead of RE24 means that nonsense related to inherited runners is slipping into the conversation.
I love these baseball stat nerd-gasm screeds interspersed with heart-felt tearjerker pieces about his daughters. Classic JoeBlogs territory.
The flip side of all this is to look at fWAR for Burnes (7.1) compared to Scherzer (5.2). As you noted in your original post, Scherzer has a better ERA and WHIP, and he has also pitched slightly more innings (162 to 152). As you note fWAR uses FIP, where Burnes is king, but looking at the numbers underlying FIP, they have the same K% (35.4%) and Burnes only has a slight lead in BB% (4.9% to 5.3%), which means the HR avoidance is driving most of the value. Burnes certainly gives up less FBs than Scherzer (30.2% to 48.3%), which is significant, but Burnes is also benefit from a notably low 4.8% HR/FB rate, as opposed to Scherzer's league average-ish 11.8%. Unless research has changed recently, my understanding is that pitcher HR/FB rates tend to be mostly out of their control. As dissatisfying as I find Miley having a higher bWAR than Burnes, fWAR concluding that he has been two full wins better than Scherzer feels equally absurd.
Yes, the research changed. Or more accurately, the research you cite was more theoretical. We now know how hard each ball is hit and at the angle, so we don't have to wish-away all those HR. Burnes giving up as few HR as he has is consistent with his performance.
Funny you mention Nola and Wainwright in this, as looking at their numbers this year helps to show that FanGraphs WAR must necessarily also have some issues:
Aaron Nola: 7-8, 4.58 ERA in 163 IP
Wainwright: 16-7, 2.88 ERA in 190 IP
Both pitchers are worth 3.9 fWAR, to date. Nola strikes out more batters (about 3 more per 9IP), so he gets extra credit for that, I guess, but his much higher BABiP (.306 vs .254) speaks to how much Wainwright's defense has helped him, so I guess fWAR takes credit away from him and gives it to his fielders.
bWAR has them as 2.1 for Nola and 3.5 for Wainwright, which makes more sense.
Perhaps bWAR is a better metric for evaluating a pitcher's performance looking back, since it starts with the runs they actually allowed, but fWAR is better for predicting future performance, since it focuses more on the things they can control?
Either way, I agree that the defensive adjustments (just like in your column about Jeter last week) are far too aggressive.
Nothing wrong with taking both sites' numbers, adding them, and dividing by two... that gives you Burnes 6.2 wins above and Miley 4.5. Sounds about right imo
That's what I do. The true answer is somewhere between the two.