Buster the Dog or:

How I learned to stop worrying and love the blog…

Hockey Statisctics & Technology — Piggy backing off the big guy. April 13, 2013

There are quite a few things that I am passionate about in this world. Two of them, are sports and technology. I’ve recently become very interested in some of the mathematics behind the statistics in sports — most specifically, hockey.

I, by no means, fancy myself a statistician; but this does not mean I can’t play one on the internet ­čÖé In my quest to derive to develop a relationship between some of the statistics out there, I ran into a brick wall; these stats ain’t free. Companies pay quite a bit of money for these sorts of data feeds.

I immediately looked at one of the big boys. The biggest, in fact…but who will remain nameless for the duration of this article.

I wasn’t quite sure what I was looking for at first, but it had something to do with hockey. Given there are already quite a advanced statistics sites out there to calculate things like a player/team’s Corsi or Fenwick rating, I was looking for something with more of a direct graphical relationship.

What I set out to do, was gather statistical relationships between the quadrants of the ice (graphed using standard Cartesian coordinates) and the percentage of goals that happen as a result.

When I looked at the available data that my target data source has, this seemed fully available. There is a game cast section that streams and stores all of the goals, shots, hits, blocked shots and penalties all graphed on an ice overlay (which is just a big Cartesian plane).

This page (wrapped in a Flash interface) was just sending simple AJAX calls, parsing the results, and mapping the events on the graph. The problem here was reading this data.

Image

It took a few nights of looking at the data before realizing how to parse it, but after you get the hang of how it’s formatted, its quite simple. The issue I had was with my X,Y mapping. I wrote a quick app to download and parse the data, but when I tried to map the data how it was presented, it didn’t make much sense when you compare it to what actually happened in the game.

Image

It should have looked like this.

Image

Silly oversight on my part – X and Y coordinates are relative to where the goal/shot/whatever took place, and not absolute like they would appear to be in the above depiction. Simply reflect the X points over the X axis, and the Y points over the Y (depending upon the team) to make them absolute with respect to a certain area of the ice.

My initial idea was just focused on gathering data for the Pittsburgh Penguins (my team of preference…Let’s go Pens!). I’m still in the process of deriving some tangible data from the data that’s out there, but at first glance, this stood out to me.Image

The above is a graph of all shots scored in the 2013 season that our data source has available. I have to preface this as not all data was available on the site – there were probably around 8 games where there was no XML data available for us to parse. The goal mark is around the (-95, -5 through 5). You can immediately see, the the large majority of goals coming from within the -75 X coordinate (which equates to the 10ft mark).

Be on the lookout for another post in the near future where I’ll (attempt) to develop some trends from this seasons data.

As a footnote, I wasn’t able to find anything on the internet about this when I was developing it, so I figure I’d share my success. These are my notes from the data I was able to gather from the XML file that the Flash player is downloading. Happy to answer questions about it, as it took me a few looks to get it just right. I should also mention that these lines should be split using the ‘~’ character; each field will mean something, but tracking down exactly what that something is for each entry would be very tough, and frankly, not needed.

  • The “Game” XML node has some high level data about the game being played and the teams involved such as the date/time, teams playing, home/away tream. Sample node as follows:
<Game id="400443012">
<![CDATA[
3~1~6~3~0~7:30 PM~Apr 3~April 3, 2013~Madison Square Garden~New York~New York~27~39~0~1~0~ ~3~2~1~ ~13~New York~Rangers~nyr~0x0B3D91~16~Pittsburgh~Penguins~pit~0xC3B263~2~(28-10-0, 56 pts)~(18-15-3, 39 pts)~Series starts 1/20
]]>
</Game>
  • The “Player” node has data about some of the players where a “Play” node has been referenced. This is key as the “Play” nodes reference these internal player ID’s for instances where a reference is needed. For example, if the below player, Arron Asham, were to score a goal (however unlikely :p ), he would have a “Play” entry and a “Player” entry as well so there is no need for external API (or whatever) calls to pull things like his head shot.
    <Player id="f24">
    <![CDATA[
    24~1822~Arron Asham~45~13~RW~http://somewebsite.com/i/headshots/nhl/players/35/24.jpg~0~0~0~0~531~0~1~0~0~7~2~0~2~2~7875~11~30~0~0~24
    ]]>
    </Player>
  • The one where most of the magic happens is the “Play” node. For each NHL hockey play (eg., shot, goal, blocked shot, stoppage, etc.) one entry is logged into this file. Breakdown of the entry is as follows. Numbers referneced are the zero-based index of the parameter in question.
    <Play id="4004430120000681">
    <![CDATA[
    58~8~506~19:55~3~2179~0~0~Shot on goal by Chris Kunitz saved by Henrik Lundqvist(Snap 32 ft)~0~6~1~0~701~16~802~901~20~0~0
    ]]>
  • 0
    • X Coordinate for where the Play took place- remember that this is absolute and would change depending upon the period of play (as teams switch sides)
  • 1
    • Y Coordinate of where the Play took place – same footnote as [0]
  • 2
    • What type of play was it? An integer value for the type of play. I didn’t look for all of them, but these are the ones I cared about
      • 502 = faceoff win
      • 503 = hit
      • 505 = goal
      • 506 = shot
      • 507 = missed shot
      • 508 = shot blocked
      • 512 = penalty
      • 502 = faceoff
      • 516 = stoppage (iceing)
      • 516 = goal stoppage
      • 1401 = takeaway
      • 1402 = giveaway
  • 3
    • How far into the period did this Play happen?
  • 4
    • What period did it happen?
  • 5
    • What was the primary player ID for the player performing the action?
  • 6
    • The secondary action on the Play (eg., 1st assist on a goal)
  • 7
    • The tertiary action on the┬áPlay┬á(eg., 2nd assist)
  • 14
    • The internal team ID value – Can be cross referenced against the Game node.
 

Hacking the Nike+ Fuelband….ok, maybe not June 17, 2012

Filed under: Uncategorized — stevethaber @ 12:53 pm
Tags: , , , ,

I know you might not be able to tell by my svelte physique, but I’ve recently become very into running; and being a technologist at heart, I am always fascinated with seeing all the new cool gadgets being used out there.

When Buster (the Boxer) and I first started back in March, I found this very cool Nike+ GPS app which will track your runs via the GPS in my iPhone, and give you feedback and cheers as you reach your goals. Tim Tebow himself told me how good I was doing! The app is very cool, and will sync all of the data it keeps about your run, to the Nike Running website. Probably all stored in a data warehouse somewhere, so Nike can sell you products catered to you, or maybe just make fun of how slow I am; but that’s beside the point.

Nike seems to have a clear dominance in the running advertising space, so when I was browsing their website and stumbled across the Nike+ Fuleband, I was intrigued. This little “Livestrong” type wristband will track all of your movement throughout the day, so you can set goals, and track your movement day over day. Your movement is tracked in a arbitrary value known as “fuel”. Since the band has data for your height, weight, and step-counts, it then is able to determine how much “fuel” you are burning in your various movements.

Now on to the techy stuff….

The band itself has a standard USB port at one end of it, and built in Bluetooth. You can choose to sync the band with a Bluetooth enabled device, or plug it into a Mac or Windows based PC; the PC must be used in order to charge the device.

In order to get the device operational, you must visit the Nike+ website and download the Fuelband (or at least I thought it was dedicated to the Fuelband) software in order to sync the device. Once the software is installed, simply enter your Nike+ login details, and the device will sync away.

Again, being a techy at heart, I must know what is going on in that sync process. How is my fuel data being sync’d to the Nike cloud!!

At a high level, after you plug in your device, the following occurs: 1) FuelBand software auto-opens 2) Connection is made from the software to the Nike website 3) Data is downloaded from the FuelBand 4) Data from Fuelband is uploaded to the website and 5) A new browser website is opened to the Nike+ website to show you all the data that was just uploaded

To start disassembling this, I thought, what better way to do so, then to decompile the Nike+ FuelBand software. Attempts were foiled when all decompilers I tried, failed miserably. Looking closer at the software, it looks like it was written in some variant of C (being cross-OS compatible), and without digging deeper into tools like IDA, I was going to get nowhere.

The software itself has some localization support in it for multiple languages – I believe the device is only available in the US…maybe Nike has plans to roll out elsewhere?

Using procmon (on my Windows 7 PC), I was also able to see that the app is modifying a file called “config.dat” – This is where my perspective on the software changed. The brilliant folks at Nike seem to have made this one application for use with multiple different devices. The program files directory also contains some DLL files, which I assume are various drivers for performing IO operations to the different devices. The config.dat was also a dead end; the file is just a very large XML files with URL’s and locations to things like software updates for the devices.

So, what next? Wireshark. Having very little experience with the product, I wasn’t able to do much immediately. After some fiddling, I was able to see the network operations that the software was performing, which would be the key to “hacking” the FuelBand.

From step 1 to step 5 (mentioned above) there were around 120 packets being sent/received. Right out of the gate, the software is performing DNS queries…ET phone home. These DNS queries (for nikerunning.nike.com) were just so the software can download the XML file referenced above – Following the TCP stream, we can see an HTTP GET issued to /nikeplus/connect5/config/config.xml on nikerunning.nike.com; the file looks to be cached locally, thus the “config.dat” file.Sometime after this XML file download is where I assumed the meat and potatoes to be. So, scrolling packet 65 in the capture, we again see some DNS operations being performed…this time to api.nike.com. Again, from the URL, it looks to be indicative of things to come in the future. Google tells me also that there does appear to be some movement on the FuelBand API front; no real details yet though.

Also, if you, just as I, performed a dig against that api.nike.com domain, you will see the glorious amazonaws.com domain hosting API (assume its web services of some type). As a side note, I am very happy to see a brand as prestigious as Nike, link its brand to Amazon; it says a lot for the awesome work Amazon is doing in the cloud space.

So you all must be dying to know what data is being sent back and forth to the Amazon AWS/Nike API site! Well, the answer is….I can’t tell. What I can tell, is that the network traffic is encrypted (using the Thawte SSL root CA). From some of (very little) clear text I do see inside the TLS v1 payload, there are references to api-preprod.nike.com, api-tie1.nike.com, developer.nike.com and api.stage.nike.com. I would assume that the data being sent up to the AWS website is a serialized and encrypted version of the object data being stored on the device. I can’t really blame Nike for encrypting & serializing the data before sending to up to the API – best practices.

All in, the device is very cool – and has some very interesting technology behind it. Looking forward to the API, whenever that comes out…

Add some comments below if you got any further than I did ­čÖé