Photo

MLB General Talk 2015 Through LCS


  • Please log in to reply
2129 replies to this topic

#41 BSLChrisStoner

BSLChrisStoner

    Owner

  • Administrators
  • 156,156 posts

Posted 22 February 2015 - 09:54 AM

Sports on Earth: Projecting baseball's best rookie pitchers

http://www.sportsone...d-andrew-heaney



#42 DJ MC

DJ MC

    HOF

  • Members
  • PipPipPipPipPip
  • 23,680 posts
  • LocationBeautiful Bel Air, MD

Posted 22 February 2015 - 01:31 PM

B.J. Upton will now go by Melvin Upton Jr.

 

Beeeee-Jaaaaaaay...


@DJ_McCann

#43 SBTarheel

SBTarheel

    HOF

  • Members
  • PipPipPipPipPip
  • 14,851 posts
  • LocationEldersburg, Md

Posted 22 February 2015 - 02:59 PM

B.J. Upton will now go by Melvin Upton Jr.

 

Beeeee-Jaaaaaaay...

I wonder where BJ came from..

 

Actually, no I don't. 


  • fishteacher likes this
@beginthebegin71

#44 BSLChrisStoner

BSLChrisStoner

    Owner

  • Administrators
  • 156,156 posts

Posted 22 February 2015 - 09:51 PM

Five Thirty Eight: Rich Data, Poor Data



#45 BSLChrisStoner

BSLChrisStoner

    Owner

  • Administrators
  • 156,156 posts

Posted 22 February 2015 - 10:04 PM

Hardball Times: Are Groundball Pitchers Overrated?

http://www.hardballt...hers-overrated/



#46 RShack

RShack

    Fair-weather ex-diehard

  • Members
  • PipPipPipPipPip
  • 22,994 posts

Posted 23 February 2015 - 02:18 AM

Hardball Times: Are Groundball Pitchers Overrated?
http://www.hardballt...hers-overrated/


If the guy understood correlation, it would be a nice article. Not picking on him in particular, but this is something that happens way too frequently when people discuss correlation in baseball discussions. The error is that people often take the measure of correlation (aka R-value) at face value and wrongly interpret it as if it means covariance or (even worse) as measure of prediction. In the article, the author says that .85 is a "very high" correlation (which it isn't), and says that a .37 correlation from one year to the next shows that throwing IF fly balls is "a repeatable skill" (which it doesn't, unless your criteria for "repeatable" is very low).

As best I can tell, when people talk about correlation in the context of baseball performance, what they really want to know (and what they think they're talking about, even though they're not) is covariance. Correlation tells you what the variance is between two things, but it's covariance that tells you the *proportion* of that variance that's in common between the two things being measured. In other words, covariance tells you how much of the measure can be attributed to the sameness you're looking for (vs. other things including chance).

Converting from correlation to covariance is easy: Just take the R-value, square it, and treat the result as a percentage.

So, for example, a correlation of .37 (P's getting IF flyballs from one year to the next in the sample the author used) gives us a covariance of .37-squared --> .1369, treated as a percentage --> 13.7%. That means that 13.7% of throwing IF fly balls from the two years is in common between those 2 years of throwing pitches, which means that 86.3% of it is due to other things, including chance. In other words, more than 86% of the relationship between how many IF fly balls were thrown by that group of P's from one year to the next is *not* about anything that could even possibly be a repeatable skill.

Similarly, the supposedly "very high" correlation of .85 gives a covariance of 72.25% which isn't high at all. Again, I think the author did what many (maybe most?) people do, which is to read correlation as a measure of covariance. I would agree that a covariance of .85 is fairly high (not "very high")... but to get that covariance, you need to have a correlation of better than .92.

Because you're dealing with the square of a R-value less than 1, the effect of squaring gets much more dramatic as the number gets smaller. For R=.70, the covariance is less than half (49%), for R=.50, the covariance is only 25%. Once you get down to an R-value of .30, you're getting covariance that's only in 9%.

Bottom line: Don't do what the author did. Whenever you read about the numerical correlation between 2 things, remember to square that correlation value before you decide what it means... because it's really the covariance you're after...


  • BSLChrisStoner and fishteacher like this

 "The only change is that baseball has turned Paige from a second-class citizen to a second-class immortal." - Satchel Paige


#47 SportsGuy

SportsGuy

    HOF

  • Members
  • PipPipPipPipPip
  • 91,979 posts
  • LocationBaltimore

Posted 24 February 2015 - 08:04 AM

Just saw that Profar is out for 2015.

Sucks for Texas.

#48 SBTarheel

SBTarheel

    HOF

  • Members
  • PipPipPipPipPip
  • 14,851 posts
  • LocationEldersburg, Md

Posted 24 February 2015 - 08:11 AM

Just saw that Profar is out for 2015.

Sucks for Texas.

He should have had the surgery in September, but they wanted him to "rest"...I don't get it at all. The Angels did the same with Hamilton, and now he'll miss a lot of time as well. 

 

Not to wish injury on anyone, but it'll happen with Tanaka too. Very strange all around. 


@beginthebegin71

#49 You Play to Win the Game

You Play to Win the Game

    HOF

  • Members
  • PipPipPipPipPip
  • 60,481 posts
  • LocationMaryland

Posted 24 February 2015 - 09:38 AM

Eutaw St Report doesn't do much for me but they are damn funny...

68c8cd5f5d4480a08e389931049499e4.jpg
  • fishteacher likes this

#50 Matt_P

Matt_P

    HOF

  • Members
  • PipPipPipPipPip
  • 4,552 posts

Posted 24 February 2015 - 10:18 AM

Similarly, the supposedly "very high" correlation of .85 gives a covariance of 72.25% which isn't high at all. Again, I think the author did what many (maybe most?) people do, which is to read correlation as a measure of covariance. I would agree that a covariance of .85 is fairly high (not "very high")... but to get that covariance, you need to have a correlation of better than .92.

 

That's extremely high. The only thing I can possibly think of is that you're mistaking what's necessary for statistical significance for what's necessary for a high correlation metric.



#51 BSLChrisStoner

BSLChrisStoner

    Owner

  • Administrators
  • 156,156 posts

Posted 24 February 2015 - 11:11 AM

Grantland: AL Spring Training Primer: The Key Question Facing Each Team in Camp

http://grantland.com...cing-each-team/



#52 BSLChrisStoner

BSLChrisStoner

    Owner

  • Administrators
  • 156,156 posts

Posted 24 February 2015 - 11:29 AM

Sports on Earth: Seasons of Change For MLB?

http://www.sportsone...sons-154-vs-162



#53 BSLChrisStoner

BSLChrisStoner

    Owner

  • Administrators
  • 156,156 posts

Posted 24 February 2015 - 04:30 PM

Beyond the Boxscore: Beyond the Box Score's rankings of the best players for 2015: 45-11

http://www.beyondthe...5-11-excellence

 

10: Rendon
http://www.beyondthe...ers-for-2015-10

 

9: Bautista

http://www.beyondthe...yers-for-2015-9

 

8: Cabrera

http://www.beyondthe...-miguel-cabrera



#54 RShack

RShack

    Fair-weather ex-diehard

  • Members
  • PipPipPipPipPip
  • 22,994 posts

Posted 25 February 2015 - 04:05 AM

That's extremely high. The only thing I can possibly think of is that you're mistaking what's necessary for statistical significance for what's necessary for a high correlation metric.

 

If it's high in any place other than baseball stats, I never heard of that place... if it's not statistically significant, what can you do with it?  It's certainly not trustworthy.

 

If you can't trust it, why even bring it up?  The only reason I can think of is that maybe lots of people who don't understand stats will trust it anyway. (There might be other reasons, but I can't think of what they are...)


 "The only change is that baseball has turned Paige from a second-class citizen to a second-class immortal." - Satchel Paige


#55 Matt_P

Matt_P

    HOF

  • Members
  • PipPipPipPipPip
  • 4,552 posts

Posted 25 February 2015 - 08:43 AM

If it's high in any place other than baseball stats, I never heard of that place... if it's not statistically significant, what can you do with it?  It's certainly not trustworthy.

 

If you can't trust it, why even bring it up?  The only reason I can think of is that maybe lots of people who don't understand stats will trust it anyway. (There might be other reasons, but I can't think of what they are...)

 

Yeah, you clearly made the mistake that I stated above.

 

For future reference, p-values are only significant if they're above .95 (or alternatively below .05).

 

R-values are significant at different levels. Whether or not you should give them credence is based on what you're using them for. For example, a model with an R-value of .1 might be significant if you're trying to predict the result of basketball games or whether the stock market will go up. It's not high but it's better than nothing which given the complexity of the stock market is usually what you have. Basically, better to have a 55% chance than a 50% chance. You still don't know much more than the average person but if you succeed 55% of the time than you'll almost definitely end up rich.

 

For baseball, it's arguable what type of r-value is high. But .85 is unquestionably a high value. I've used .5 or .6 before. Those aren't necessarily high but I'd still argue that they have significance.



#56 BSLChrisStoner

BSLChrisStoner

    Owner

  • Administrators
  • 156,156 posts

Posted 25 February 2015 - 09:24 AM

CBS Sports: Spring training positional battles to watch: NL East

http://www.cbssports...o-watch-nl-east



#57 RShack

RShack

    Fair-weather ex-diehard

  • Members
  • PipPipPipPipPip
  • 22,994 posts

Posted 25 February 2015 - 08:11 PM

Yeah, you clearly made the mistake that I stated above.

 

For future reference, p-values are only significant if they're above .95 (or alternatively below .05).

 

R-values are significant at different levels. Whether or not you should give them credence is based on what you're using them for. For example, a model with an R-value of .1 might be significant if you're trying to predict the result of basketball games or whether the stock market will go up. It's not high but it's better than nothing which given the complexity of the stock market is usually what you have. Basically, better to have a 55% chance than a 50% chance. You still don't know much more than the average person but if you succeed 55% of the time than you'll almost definitely end up rich.

 

For baseball, it's arguable what type of r-value is high. But .85 is unquestionably a high value. I've used .5 or .6 before. Those aren't necessarily high but I'd still argue that they have significance.

 

I don't think evaluating based on statistical significance is a mistake or confusion. It's what defines trustworthiness.  An R of .85 is "very high" only in the sense that the range is from 0 to 1.00, so it is high on the scale of possible values... but that's way, way different than saying it's a trustworthy number in the way that most people naturally tend to interpret it.  Same thing with the diff between correlation and covariance.

 

What I see in lots (maybe most) baseball articles is that the authors throw at R-values as if they mean something, regardless of whether they do or not, and they don't give the reader needed info about how to interpret the number they're reporting.  This has a predictable effect on most readers (who have no real way to understand whether that number actually means anything): they tend to interpret it as if it was a % grade on a school assignment.  

 

Even worse, when an author throws out an R-value of .37 (= covariance of 13.7%) as evidence of "repeatable skill", it makes me think that the author himself doesn't understand what the number means either.  I think this is the downside of automated stat tools:  they let people run stats who don't understand stats.  While it was a real pain in the rear to do that stuff before software could do it for you via cut-and-paste from the internet, at least you had reason to trust that whoever was doing it really cared enough to suffer what was a real PITA to get it done, which generally meant they were actual stat people (or else somebody who was forced to do it for college homework).


 "The only change is that baseball has turned Paige from a second-class citizen to a second-class immortal." - Satchel Paige


#58 fishteacher

fishteacher

    HOF

  • Members
  • PipPipPipPipPip
  • 26,880 posts
  • LocationHarrisburg, PA

Posted 25 February 2015 - 09:24 PM

Damn, sad story about Josh Hamilton relapsing into coke.  You have everything in life, and you keep slipping into drugs...I'm so glad I only ever drank or smoked a little weed in my life.  I can't imagine that.  Yes, he's an adult and should be held responsible, but I'm glad to hear he admitted it, seemingly looking for help I think, and hopefully can get his career back on track.  Remember when he was unstoppable in 2010?  Then he got the GOlden Sombero against us in the wild card game in 2012, and then his career seemed to go to hell as soon as he went to Cali.  

 

Hope he can get himself straight....sad.


  • BSLChrisStoner likes this
I'm here to do two things...chew bubblegum and kick ass, and I'm all out of bubblegum. ~ Roddy Piper
@therealjfisher

#59 BSLChrisStoner

BSLChrisStoner

    Owner

  • Administrators
  • 156,156 posts

Posted 25 February 2015 - 09:25 PM

FanGraphs: Pre-Spring Divisional Outlook: AL Central



#60 fishteacher

fishteacher

    HOF

  • Members
  • PipPipPipPipPip
  • 26,880 posts
  • LocationHarrisburg, PA

Posted 25 February 2015 - 09:29 PM

I don't think evaluating based on statistical significance is a mistake or confusion. It's what defines trustworthiness.  An R of .85 is "very high" only in the sense that the range is from 0 to 1.00, so it is high on the scale of possible values... but that's way, way different than saying it's a trustworthy number in the way that most people naturally tend to interpret it.  Same thing with the diff between correlation and covariance.

 

What I see in lots (maybe most) baseball articles is that the authors throw at R-values as if they mean something, regardless of whether they do or not, and they don't give the reader needed info about how to interpret the number they're reporting.  This has a predictable effect on most readers (who have no real way to understand whether that number actually means anything): they tend to interpret it as if it was a % grade on a school assignment.  

 

Even worse, when an author throws out an R-value of .37 (= covariance of 13.7%) as evidence of "repeatable skill", it makes me think that the author himself doesn't understand what the number means either.  I think this is the downside of automated stat tools:  they let people run stats who don't understand stats.  While it was a real pain in the rear to do that stuff before software could do it for you via cut-and-paste from the internet, at least you had reason to trust that whoever was doing it really cared enough to suffer what was a real PITA to get it done, which generally meant they were actual stat people (or else somebody who was forced to do it for college homework).

All the r-value in a correlational study tells you is the strength of the correlation.  An r-value of .37 is NOT all that strong.  The r-squared value of .137 tells you that 13.7% of the variance in the y-statistic (response variable) can be explained by the value of the x-variable.  Basically, the relationship is complete crap.  I think Shack has an idea what was going on there...maybe I missed something, but just using my stats knowledge (and I teach it).


I'm here to do two things...chew bubblegum and kick ass, and I'm all out of bubblegum. ~ Roddy Piper
@therealjfisher




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users


Our Sponsors


 width=