2006-11-13

When an a Bad Economist Watches a Hockey Game

Interesting article in the National Post (found via Kukla's, as usual).

"Despite increasing public unease over violence in hockey, a statistical analysis of NHL data by university professors shows that on-ice fighting is a good strategy for team success."
Well, isn't that interesting. And contradictory to current consensus.

At first sniff, it doesn't pass the BS test. While it's possible that instigating a fight may 'spark' one's team to a better performance, how do you distinguish which team is 'sparked?' If the fighting majors are coincidental, shouldn't both teams receive the same benefit? I just had to take a closer look. The paper is available here.

To summarize, Ordinary Least Squares is used to determine which variables contribute positively and negatively to team points and team goals against. Here are their equations:

PTS = a0 +a1(GA) + a2(A) + a3(TFW) + a4(TFL) + a5(PIM) +
a6(MAJORS) + a7(ESG) + a8(PPG) + a9(SHG) +
a10(G/SHOTS) + a11(PLSMIN) + a12(SAV) + a13(YEAR) + e

GA = a0 + a1LOG(TFW) + a2LOG(TFL) + a3LOG(PIM) +
a4LOG(G) + a5LOG(SAVPCT) + a6LOG(SHOTS) +
a7LOG(MAJORS) + a8LOG(YEAR) + e

The logs are used in the GA model "due to increased fit."

Right away, one glaring mistake stands out. To get useful results out of a regression analysis, you have to have independent variables (PIM, ESG, MAJORS etc.) that correlate highly with your dependent variable (PTS) but correlate minimally with each other. See caveat #5 here (pdf).

I.e. in hockey, it would be unwise to include, say, both 'total team salary' *and* 'team average age' as separate variables. One will surely be correllated with the other.

Your independent variables should not depend on one another. That's why they're called independent. Wikipedia calls this Multicollinearity. Multicollinearity is the source of "correlation, but not causation" effects. Other bloggers and hockey analysis websites have addressed this topic, but I'm too lazy to go looking for them.

Given that, how anyone could describe face-offs won and face-offs lost as separate, independent variables is beyond me. The authors of this paper do just that, and conclude that winning face-offs has a greater absolute effect than losing them:

"Thus, although TFW keeps an opposing team from scoring goals, TFL doesn’t necessarily imply a great chance for the opponent to score."
So, winning a face off is better than not losing a face-off. Impeccable logic. We're headed for the age of the high-event face-off man, who wins *and* loses more than 50% of his draws.

Of course, I picked out the most obviously interdependent variables to pick on but I think I made my point. Fighting majors could also correlate with one of the other variables. Other analyses have been done (again, too lazy to go look for them) that isolate the fighting majors and show that losing teams accumulate more of them than winning teams.

Reading the paper wasn't a complete waste of time. The most interesting line was in reference to a prior study by someone else whose results "...show that teams with unusually high or low numbers of French-Canadians tend to be less efficient." Unfortunately, that paper appears to be unavailable.

8 Comments:

Anonymous Anonymous said...

Wow - you're waaaay smarter than me.

I'd just point out that the last season's worth of data used is from 2003-04 and that the game (we're told) is different now.

11/13/2006 3:37 p.m.  
Blogger Jeff J said...

Smrt? Heck, no. Smart folk can articulate their thoughts. Phil Birnbaum does a much better job of explaining things.

11/13/2006 10:29 p.m.  
Anonymous Anonymous said...

The title of this post should be "When a bad Economist Watches a Hockey Game"...

11/14/2006 5:55 p.m.  
Blogger Dirk Hoag said...

Well said, Ben - you can't have folks like this giving all Economists a bad name...

11/14/2006 6:41 p.m.  
Anonymous Anonymous said...

The methodology in question is for the most part beyond me, but here are a couple of commonsensical issues:

Other analyses have been done... that isolate the fighting majors and show that losing teams accumulate more of them than winning teams.

Yes, but at what point in games are they accumulated? After the score is out of reach and the (already) losing team is trying to make a point? How important is the raw number of fighting penalties compared to the question of who is doing the fighting (and lost to his team for 5 min) and when?

The overall debate seems somewhat akin to the disputes over whether "clutch hitting" exists in baseball. Attempts to disprove it require some fixed definition of a clutch situation (x-th inning and later, game within x-runs, etc.), while defenses of the notion rest on the idea that defining such situations is highly subjective. This is of course somewhat circular - we'll only assess a player's performance in situations where we were impressed by him doing well - and the whole debate takes on the character of 'ships passing in the night.'

From the Birnbaum post:

team performance is determined directly by goals scored and goals allowed (assuming timing is random, as is normally assumed for baseball), and the other variables are expected to impact on goals, not on wins directly

Intuitively, I am more willing to accept the proposition that timing is random in the case of baseball - even if a team tacks on a bunch of 'meaningless' runs in a blowout game, everything should even out over the course of 162 games; hence pythagorean projections, etc.

However, it would seem to me that this premise needs to be more firmly established if we are going to use it for hockey. Baseball is a series of discrete 1 on 1 matchups, and because there is no time limit, it stands to reason that the factors affecting whether runs will be scored at any given time will remain the same within and across games.

In the third period of a 5-1 hockey game (or even a 3-1 game in the old era), when everyone knows it's over, do the same variables affect goals scored/goals allowed in the same way? Are the correlations between game events (faceoff wins/losses, penalties, etc.) and scoring events the same or close enough regardless of game situation? To put my question another way, in a league where teams only score 250 goals a year, how much does it affect the results of any analysis if for significant chunks of game time players are not giving 100% effort towards the objective of scoring and preventing goals?

Sorry if these questions are rambling and off-base. I do hope that statistical analysis can do for hockey some of what sabermetrics has done for baseball; I just think we need to keep checking our premises.

11/14/2006 7:04 p.m.  
Blogger Jeff J said...

hermit said...

"Yes, but at what point in games are they accumulated? After the score is out of reach and the (already) losing team is trying to make a point? How important is the raw number of fighting penalties compared to the question of who is doing the fighting (and lost to his team for 5 min) and when?"

That's an excellent point. Everyone knows fights are more likely to occur when the score is lopsided. To isolate the effect of fights, you have to account for this.

There are a couple of effects that fights reportedly have. There is (1) the in-game effect of motivating your team, and (2) the 'message sent' by starting fights when your team is on the wrong end of a lopsided score.

To isolate the first effect, you could look at only fights that occur when the score is tied. You still would have to deal with the fact that it takes two to tango. You would have to make the leap and say that teams that are aware that they get an advantage from fights start them more often. Then we could look for a connection between the number of tied-score fights and the teams' GF-GA differential after fights in those games.


"In the third period of a 5-1 hockey game (or even a 3-1 game in the old era), when everyone knows it's over, do the same variables affect goals scored/goals allowed in the same way?"

This analysis by Alan Ryder does a pretty good job of showing that scoring in hockey is basically a Poisson distribution. It's not perfect. In particular, I don't think GF and GA are completely independent (due to teams sitting on a lead), but it's close enough for most purposes.


"I do hope that statistical analysis can do for hockey some of what sabermetrics has done for baseball; I just think we need to keep checking our premises."

I hope so too, but I don't think there is any hope of hockey stats getting close to the level of baseball. Like you said, the 1-on-1 situation is infinitely easier to analyze than the n-on-n situation. A suitable analogy from the world of physics would be the two-body problem vs. the n-body problem. The two-body problem is trivial to solve. Three or more? Damn near impossible.

11/15/2006 11:17 a.m.  
Anonymous Anonymous said...

Jeff, thanks for the response. I've saved the Ryder article for future reading.

I don't think there is any hope of hockey stats getting close to the level of baseball. Like you said, the 1-on-1 situation is infinitely easier to analyze than the n-on-n situation.

Of course many of the things that might help us analyze it are not measured (board battles won, penalties drawn, etc.) and others are, of necessity, subjective measures (hits, giveaways, even faceoff wins and shots on goal).

11/15/2006 6:26 p.m.  
Anonymous Anonymous said...

Not just bad economists, but one of the most common errors made in statistics of any type (I taught a few classes in biostatistics) is failure to consider the fundamental assumptions of the statistical test being employed.

Thanks for recognizing the fact. Too many people see "statistics show" x and swallow it hook, line, and sinker without any critical thought whatsoever.

11/15/2006 9:14 p.m.  

Post a Comment

<< Home