Predicting the Heisman via Search Volume: It (Usually) Works!

Sam Darnold with USC
Sam Darnold with USC (Brian Rothmuller/Icon Sportswire)

The Heisman Trophy is the highest individual honor in college football. It is also a fiendishly difficult award to predict; it’s uniquely subjective, even in the world of individual-honors-in-team-sports. If you’re a baseball player and you post the best WAR in your league, congratulations, you’re probably going to be MVP. If you’re a basketball player and you average a triple-double (even on a lacklustre team), you’re the favorite to be MVP. There’s no easy quantitative measure for the Heisman, because players of all positions from ten different leagues are (technically) eligible. How do you compare an SEC running back to an ACC quarterback to a Big Ten defensive swiss army knife? How do you factor in values like leadership and character?

Is search volume a Heisman predictor?

I’ve been fiddling around with Google Trends, the free software that allows you to track search volume. Search volume is a really rough metric for the amount people are talking about something, asking Google about it, or looking for specific videos. In that way, it’s a passable measure of how much “buzz” is surrounding a specific player, that weird, indefinable quality that’s so important to Heisman voting. Turns out there’s a decent correlation between search volume and which Heisman finalist goes home with statue of an NYU running back.

If you look at the Heisman finalists from last year, for example, Lamar Jackson leads in search volume in all but three weeks, and by a considerable margin. This is mostly my fault, I’ve been playing highlights of the FSU game on repeat since last September, but I can’t account for all that volume. In the latter parts of the season, Jabrill Peppers started making some moves but was hampered by two things:

  1. Peppers’ best day of the season (search-volume score of 39) would have been Jackson’s third-best day in September (high of 70)
  2. Jabrill Peppers (mostly) plays defense, and the Heisman voter cannot abide defense

Some quick points on method: I compared the Heisman finalists over the course of the season, starting the data somewhere in late August and ending in January. Of course, the Heisman voting is all done by early December, so search volume after that isn’t hugely useful beyond spectacular bowl-game bumps. I didn’t include players that weren’t Heisman finalists (although it would be very interesting to see how a prominent player compares to a Heisman finalist in terms of search volume). Lastly, Google Trends reports volume as a percentage of the maximum value in the defined range. So Lamar Jackson’s 72 the week after the FSU game is 72% of the highest value, which is Deshaun Watson after the National Championship. Any numbers I use are the numbers I get directly from Google Trends and are thus formatted this way.

The rather rudimentary test works for 2015 as well, even though Google Trends data get a little worse any time before New Years’ Day 2016.* Christian McCaffrey and Derrick Henry trade weeks back and forth at a pretty low level until Henry gets a huge week (high of 32) in early November (the same week Henry ran for 210 yards and three touchdowns against LSU). McCaffrey doesn’t put up similar numbers until very late in the season. In fact, Christian McCaffrey’s best week in the season was the week the Heisman finalists were announced, which could be a really bad thing if those searches are of the “who the f*** is Christian McCaffrey” variety. McCaffrey, of course, stuns in the Rose Bowl, and puts up the best numbers of the year, and wins this insufferable Stanford fan’s vote, weeks after Derrick Henry has taken home the trophy.

Christian McCaffrey stiff arming a Wildcat
Christian McCaffrey (5) outran everyone in the Pac-12, but couldn’t beat Derrick Henry in the 2015 Heisman race. (Photo by Carlos Herrera/Icon Sportswire)

Going back in time even further: Marcus Mariota led most weeks in his Heisman-winning 2014 season, other than the week Melvin Gordon ran through Nebraska like a haunted combine harvester. The first year this doesn’t work is 2013, where Johnny Manziel clearly won the search volume battle but lost the Heisman war. You can explain this a few ways: Johnny Football was at that point more a celebrity than a football player, Jameis Winston put up some great weeks late in the Heisman campaign (and perhaps those weeks are more important to voters), or maybe Manziel failed to live up to the meteoric expectations everyone set for him after his enchanted 2012.

I thought 2012 was going to be more fun, with the whole Manti Te’o affair, but no. Johnny Football goes to Tuscaloosa, drives a stake into Nick Saban’s heart, and never relinquishes his iron grip of search volume. The 2011 season is something of an anomaly, with Andrew Luck leading in search, barely, but losing to Robert Griffin III. Cam Newton’s rout of the 2010 field is even more obvious through our lens.

Altogether, I looked at seven years, and in five of them, the leader in search volume won the Heisman. In one of the other years, the leader (Manziel) had won the Heisman the year prior. In the other, search volume was almost a dead-heat between Andrew Luck and RGIII. If I was building a predictive model for the Heisman trophy, then I would use average search volume over the course of the season as a metric. I would control for defending Heisman winners. I would weight volume in November more heavily that volume early in the season, but not hugely.

Can we use that information to make money?

That’s all well and good, you say, but what does it mean for 2017 and how I should wager on the Heisman futures?

Well, here are the five players Vegas thinks are most likely to win the Heisman, compared over the last year. Jackson, obviously, dominates the field with his regular-season performances, but notably quiets down in bowl season. The Rose Bowl game that we all know and love brings a big spike, and a surprisingly uneven one. I would have thought Saquon Barkley and Sam Darnold would come out of that game with comparable numbers, but Darnold posted a number three times as high as Barkley. Baker Mayfield put up some great search numbers in February, with impressive footwork in his road loss to Washington County police.

Right now, Darnold’s leading the pack, which squares with what Vegas expects (+400), but Jackson (a +800 underdog) is right there with him. That could be Heisman holdover, a la 2013 Johnny Manziel, or it could be that Jackson is being undervalued.

Baker Mayfield is also at +800 and about equal in search volume with Darnold and Jackson. That makes Mayfield, for now at least, the value pick at your sportsbook of choice.

As the season gets going and we get a few games and some real buzz surrounding the players, we’ll check back in. We’ll also incorporate players who are further down the Vegas list (or even off it completely), just for fun.

My hope, obviously, is that search volume can give us a new and interesting way to measure the pulse of the college football world in a way that meaningfully relates to Heisman prospects. We are at a slight handicap, living in a world with a previous Heisman-winner dominating our YouTube feeds (although probably not the ACC Atlantic). But hopefully we can overcome that.

*Something about Houston beating FSU in the Chick-fil-A Peach Bowl changed the way Google thought about search volume, apparently.