The Goal of this project was to utilize spatial kriging to create heat maps for NBA players. In doing so, we hoped to shed light on the following questions.
Here we go through an example of the steps to create a kriged shot chart for a given NBA player.
We’ll use the following packages to perform our analysis.
1. tidyverse - provides access to many important data wrangling, cleaning, and viz tools
2. SpatialBall - contains shot data for the 2016-17 NBA season including the location of every shot.
3. sp and gstat - provide methods for implementing spatial kriging.
# Dependencies
library(tidyverse)
library(SpatialBall)
library(sp)
library(gstat)
library(spdep)
First we need to wrangle our data into a format that makes sense for our analysis. We are interested in analyzing the scoring efficiency of various NBA players which is traditionally done with a half court shot chart. We’ll need to filter the data to only include shots from halfcourt and also select for our player of interest. We’ll also add a new feature, points per shot (PPS), to our data frame in order to analyze scoring efficiency.
# Player of Interest
player <- "Damian Lillard"
#Filter player's shot data to half court
PlayerShots <- season2017 %>%
select(PLAYER_NAME,LOC_X, LOC_Y, SHOT_TYPE, SHOT_MADE_FLAG) %>%
filter(PLAYER_NAME == player, LOC_Y < 500)
#Add columns to show shot type,
# whether it was made/missed, and points per shot
PlayerShots$TYPE <- ifelse(PlayerShots$SHOT_TYPE == "2PT Field Goal", 2, 3)
PlayerShots$Made_Miss <- ifelse(PlayerShots$SHOT_MADE_FLAG == 0, 0, 1)
PlayerShots$PPS <- (PlayerShots$TYPE)*(PlayerShots$Made_Miss)
PlayerShots
Note that some players have taken multiple shots from the same court location, so we need to adjust our PPS value to take this into account. First we group together our shot data by location.
# Find all of the duplicate shot locations
nvals <- PlayerShots %>%
group_by(LOC_X, LOC_Y) %>%
count()
# Number of shots taken at each location
nvals %>% arrange(desc(n))
After finding the number of shots taken at each location, we join these values back to the original table. Knowing the number of shots at each location, we now compute the average points per shot at each location.
#Join nvals to the original table and create pps avg - thin data set to include only distinct values.
PlayerShots2 <- PlayerShots %>%
left_join(nvals) %>%
group_by(LOC_X, LOC_Y) %>%
mutate(ppsum = sum(PPS), ppsavg = ppsum/n) %>%
distinct(LOC_X, LOC_Y, .keep_all = TRUE)
## Joining, by = c("LOC_X", "LOC_Y")
PlayerShots2
Now that we have our feature of interest (PPS) it’s time to create some spatial objects so that we can perform the analysis. We’ll need the following objects: -spatial points dataframe -prediction grid
# Create a spatial points dataframe from shot locations
coordinates(PlayerShots2)<- ~LOC_X + LOC_Y
#class(PlayerShots2)
#Make the spatial grid for predictions
grid <- expand.grid(x = seq(-255, 255, by = 7.5), y = seq(-75, 400, by= 7.5))
#class(grid)
#plot(grid)
coordinates(grid)<- ~x+y
#class(grid)
courtgrid <- SpatialPixels(grid)
#class(courtgrid)
plot(courtgrid)
One of the advantages of Spatial Kriging is it takes into account the variation of our feature of interest as a function of distance. Thus, we first need to capture this variation by building a sample variogram.
# Plot Variogram cloud
#vargram.cloud <- gstat::variogram(ppsavg~1, data = PlayerShots2, cloud = TRUE)
#plot(vargram.cloud)
# Plot variogram
vargram <- gstat::variogram(ppsavg~1, data = PlayerShots2)
plot(vargram)
# find the best fit for the variogram
fit <- fit.variogram(vargram, vgm("Sph", "Mat", "Exp"))
plot(vargram, model = fit)
Now we are ready to perform the kriging and visualize the results!
# Make Kriged predictions
z.krige <- gstat::krige(ppsavg~1, PlayerShots2, courtgrid, model = fit)
## [using ordinary kriging]
# Visualize
spplot(z.krige["var1.pred"], at = seq(0,3, by =.25))
#spplot(z.krige["var1.var"], at = seq(0,3, by = .25))
Our resulting plot shows Damian Lillards expected PPS mapped to the entire half court surface. By adding on the court lines, we get an even better idea of the different areas on the court where Damian Lillard seems to be most effective.
A kriged shot chart for Damian Lillard helps us to visualize his areas of high scoring efficiency, and especially highlight his effectiveness from deep in 3 point territory. Additionally our chart predicts Lillard to be more effective coming from the left side near the basket, which makes sense given he is left-handed.
As a final step, we might run some tests for spatial autocorrelation. For now let’s try Moran’s I
# Run Moran's I
#Create neighbor list (graph-based since working with points)
grph <- relativeneigh(PlayerShots2)
neib <- graph2nb(grph)
neib.listw <- nb2listw(neib, style="B", zero.policy = TRUE)
mtest <- moran.test(PlayerShots2@data$ppsavg, listw=neib.listw, alternative="two.sided", zero.policy = TRUE)
mtest
##
## Moran I test under randomisation
##
## data: PlayerShots2@data$ppsavg
## weights: neib.listw n reduced by no-neighbour observations
##
##
## Moran I statistic standard deviate = 1.1125, p-value = 0.2659
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.0261992965 -0.0010989011 0.0006020485
sim1 <- moran.mc(PlayerShots2@data$ppsavg, listw=neib.listw, nsim=99, zero.policy = TRUE, alternative="less")
sim1
##
## Monte-Carlo simulation of Moran I
##
## data: PlayerShots2@data$ppsavg
## weights: neib.listw
## number of simulations + 1: 100
##
## statistic = 0.026199, observed rank = 94, p-value = 0.94
## alternative hypothesis: less
After running Moran’s I we see that the p-value fairly high and the Moran’s I statistic very close to zero meaning that there is not strong positive spatial autocorrelation.
This process can be repeated for any player of interest in the dataset to visualize their scoring efficiency with a heatmap.