Trying to figure out why the disc golf course is so crowded these days, with code.
Anyone who plays disc golf has heard that disc golf grew this year—it’s been a very easy socially distant activity, and it’s awesome. I actually coached a disc golf team in 2015-2017 while I was a high school biology teacher, so it’s near and dear to me. A large part of me is happy for the growth, but I’d be lying if I didn’t admit a small part of me is annoyed how busy the courses are now, hah! While waiting on a teepad, I had an idea to quantify this and I hadn’t seen anyone try to quantify it yet.
I’m going to generally refer to the growth of disc golf based on it’s search popularity in Google Trends. This is an imperfect proxy for the overall growth of disc golf, but I am okay with that, and it’s my only source of data for this project.
Every time someone is looking for a disc, a nearby course, a YouTube tutorial, or a bit of Disc Golf Pro Tour coverage, they probably search for this on Google (or something owned by Google, i.e., YouTube).
Here’s what I hope to answer:
knitr::include_graphics(here::here("images", "ted-johnson.jpeg"))
photo by Ted Johnson
Before we hop in, pay attention to the units of relative popularity that Google gives:
“Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.”
So basically, when I query data from 2004-2020, all data will be scaled with 100 being the peak popularity at any time in that window. I have no information on absolute numbers, just change. Anyways, let’s dive in.
This GitHub Repo has all my data if you’re interested.
dat <- import("geoMap 2017.csv") %>%
left_join(import("geoMap 2018.csv")) %>%
left_join(import("geoMap 2019.csv")) %>%
left_join(import("geoMap 2020.csv")) %>%
mutate(Region = factor(Region)) %>%
rename(`2017` = 'disc golf: (2017)',
`2018` = 'disc golf: (2018)',
`2019` = 'disc golf: (2019)',
`2020` = 'disc golf: (2020)')
dat_trend <-
import("Search DG Since 2004.csv") %>%
transmute(Date =
ymd(parse_date_time(date, "ym")),
searches = searches,
year = year(Date),
month = month(Date))
Let’s start with average Google Searches of with words “Disc Golf” over time every year. I have data for each month, and we can show the variation across months in these error bars. Where they do not overlap, we have significant differences.
months_vector <-
c('Jan', 'Feb', 'Mar', 'April', 'May', 'Jun',
'July', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec')
dat_trend %>%
group_by(year) %>%
summarize(mean_yearly = mean(searches),
sd = sd(searches),
se_yearly = sd/sqrt(n())) %>%
ggplot(aes(x = year,
y = mean_yearly,
ymin = mean_yearly-1.96*se_yearly,
ymax=mean_yearly+1.96*se_yearly
),
show.legend = F) +
scale_x_continuous(breaks = 2004:2020) +
geom_col(fill = '#cc0000',
show.legend = F) +
geom_point() +
geom_errorbar() +
geom_line(aes(x = year, y = mean_yearly), show.legend = F) +
labs(
title = 'Figure 1. Average Yearly Google Search
Popularity of the Term `Disc Golf`',
caption = 'Error Bars Represent 95% Confidence Intervals',
y = 'Relative Search Popularity',
x = 'Year') +
theme(axis.text.x =
element_text(angle = 30, vjust = 0.5, hjust=0.5))
As you can see, 2020 was the first year with a significant change in relative search popularity from the year prior since 2004. Again, keep in mind this is using Google’s scaled units.
dat_trend %>%
group_by(month) %>%
summarize(mean_monthly = mean(searches),
sd = sd(searches),
se_monthly = sd/n()) %>%
ggplot(aes(x = month,
y = mean_monthly,
ymin = mean_monthly - 1.96*se_monthly,
ymax = mean_monthly + 1.96*se_monthly,
),
show.legend = F) +
scale_x_continuous(breaks = 1:12,
labels = months_vector) +
geom_col(fill = '#cc0000',
show.legend = F) +
geom_point() +
geom_errorbar() +
labs(title = 'Figure 2. Average Monthly Google Search
Popularity of the Term `Disc Golf`',
caption = 'Error Bars Represent 95% Confidence Intervals',
y = 'Relative Search Popularity',
x = 'Month') +
theme(axis.text.x =
element_text(angle = 30, vjust = 0.5, hjust=0.5))
It shouldn’t surprise me, but it really surprises me how clean the distribution of popularity over months are. Consistently, thhe highest searches are in the warmer months, and the colder months get less.
Another way to look at this would be line graphs over time, with different lines for every year. I’ve done that below (Figure 3), and I added a dashed black line for the overall average search popularity for all other years (2004-2019) and then a solid pink (2020 only) line to show search popularity increase this year.
dat_trend %>%
ggplot(aes(x = month,
y = searches,
group = year,
color = year)) +
scale_x_continuous(breaks = 1:12,
labels = months_vector) +
geom_line() +
labs(title = 'Figure 3. Relative Search Popularity of
Disc Golf Every Month Since 2004',
y = 'Relative Search Popularity',
x = 'Month',
color = 'Year') +
theme(axis.text.x =
element_text(angle = 30, vjust = 0.5, hjust = 1),
legend.position = 'right',
legend.direction = 'vertical') +
geom_hline(
aes(yintercept = 45.67188),
linetype = 2) +
geom_hline(
aes(yintercept = 71.41667),
color = '#cc0000')
Figure 4 shows similar data, except the dashed (orange) line represents all years’ (2004-2020) averages, and the solid (color-coded) lines tell each year’s mean.
dat_plot <- dat_trend %>%
group_by(year) %>%
mutate(mean_yearly = mean(searches),
sd = sd(searches),
se_yearly = sd/sqrt(n())) %>%
ungroup()
dat_plot %>%
ggplot(aes(x = month, y = searches)) +
geom_col(aes(fill = year),
show.legend = F) +
geom_hline(aes(yintercept = mean(searches)),
color = 'orange',
linetype = 2,
show.legend = F) +
facet_wrap(~year, ncol = 6) +
geom_hline(
aes(yintercept =
mean_yearly,
color = year),
show.legend = F) +
scale_x_continuous(breaks = 1:12,
labels = months_vector) +
labs(
title = 'Figure 4. Relative Search Popularity
of Disc Golf Every Year Since 2004',
y = 'Relative Search Popularity',
x = 'Month') +
coord_flip() +
theme_economist(horizontal = F) +
theme(axis.text.y = element_text(size = 7, angle = 45),
axis.text.x = element_text(size = 7, angle = 45, vjust = 0.75)
)
Anyway you cut it up, disc golf became more popular in terms of Google Search Popularity. Here are the actual numbers.
Webtraffic over time
Year | Average Webtraffic | Standard Deviation | Standard Error |
---|---|---|---|
2004 | 39.66667 | 13.91751 | 4.017638 |
2005 | 39.91667 | 13.93790 | 4.023526 |
2006 | 39.83333 | 14.97169 | 4.321955 |
2007 | 43.41667 | 14.07421 | 4.062874 |
2008 | 39.66667 | 13.64707 | 3.939569 |
2009 | 44.08333 | 13.89871 | 4.012213 |
2010 | 42.25000 | 11.20978 | 3.235984 |
2011 | 48.66667 | 14.18279 | 4.094219 |
2012 | 50.66667 | 16.40030 | 4.734358 |
2013 | 48.25000 | 15.87522 | 4.582782 |
2014 | 48.08333 | 15.13250 | 4.368375 |
2015 | 50.25000 | 15.53369 | 4.484189 |
2016 | 50.16667 | 14.91694 | 4.306150 |
2017 | 51.08333 | 14.29850 | 4.127620 |
2018 | 46.66667 | 12.54326 | 3.620927 |
2019 | 48.08333 | 12.71691 | 3.671054 |
2020 | 71.41667 | 21.80683 | 6.295090 |
So how many times (and when) has disc golf trends significantly increased? I’ll spare you the details, but I can do some exploratory analysis with something called a ‘generalized linear mixed effects regression tree’ which is an emerging exploratory technique to find group differences.
I wrote the model to account for seasonal trends with a random intercept of month, and then I ask the model to tell me between which years differences occured.
It looks like the first bit of growth was relatively small, but was significant. This was at year 2011. The window from 2004-2010 had an average of 41.26 of webtraffic, and we saw a significant (but modest) increase (of ~8%) to 49.10 in webtraffic from 2011-2019. These two windows together were significantly different than 2020, which had an average of 71.42! That’s an increase of over 20% from the prior window (2011-2019)
This shows no matter how the computer groups the years, there are only 2 significant increases in disc golf search popularity: before 2011 and after 2019. And the latter jump was much larger.
dat_long <- dat %>%
pivot_longer(`2017`:`2020`,
names_to = "year",
values_to = "searches") %>%
mutate(year = factor(year))
You can’t get webtraffic trends by state, but you can go in each year and get a single (averaged) snapshot about the relative disc golf webtraffic for a year. So I gathered webtraffic data for 2017-2020 individually and merged the data files. You really need to keep in mind what Google says about this webtraffic for Regions before you look at the data:
“A higher value means a higher proportion of all queries, not a higher absolute query count. So a tiny country where 80% of the queries are for”bananas" will get twice the score of a giant country where only 40% of the queries are for “bananas”
The vertical line is the average across all months of 2017-2020, and the error bars represent the 95% confidence intervals. It’s pretty clear that Maine is holding it down for Disc Golf Webtraffic (per volume webtraffic), whatever is going on there. Other places (e.g., California) may appear really low there potentially because of a really established disc golf scene which means everyone knows where the courses are / there are in-person pro shops, etc. It also can be conflated with overall webtraffic, so this isn’t as clean as an analysis as above, but it’s still interesting to see who is conducting relatively more searches.
dat_long %>%
group_by(Region) %>%
summarize(mean = mean(searches),
sd = sd(searches),
se = sd/sqrt(n())#,
#popularity = mean*num_courses
) %>%
ggplot(aes(y = reorder(Region, mean),
x = mean,
)) +
geom_errorbar(
aes(xmin = mean-1.96*se,
xmax = mean + 1.96*se,
color = mean),
show.legend = F) +
geom_point(
aes(color = mean),
show.legend = F) +
geom_vline(
aes(xintercept = mean(mean))) +
labs(title = 'Figure 6. Relative Search
Interest in `Disc Golf` by State',
caption = 'Error Bars Represent 95% Confidence Interval',
y = 'State',
x = 'Average Relative Search Interest on Google (2017-2020)') +
theme_economist(horizontal = F) +
theme(axis.text.y =
element_text(angle = 30, vjust = 0.5, hjust = 1, size = 5))
Here’s the 2017-2020 relative websearch popularity for you visual learners
plot_usmap(data = dat_plot2017,
values = 'searches',
labels = T,
label_color = "black",
) + labs(title = 'Figure 7. Search Popularity by State in 2017',
fill = 'Relative Search Popularity')
plot_usmap(data = dat_plot2018,
values = 'searches',
labels = T,
label_color = "black",
) + labs(title = 'Figure 8. Search Popularity by State in 2018',
fill = 'Relative Search Popularity')
plot_usmap(data = dat_plot2019,
values = 'searches',
labels = T,
label_color = "black",
) + labs(title = 'Figure 9. Search Popularity by State in 2019',
fill = 'Relative Search Popularity')
plot_usmap(data = dat_plot2020,
values = 'searches',
labels = T,
label_color = "black",
) + labs(title = 'Figure 10. Search Popularity by State in 2020',
fill = 'Relative Search Popularity')