I recently read an article in the December issue of Significance titled, “Does Christmas
really come earlier every year?” by Nathan Cunningham of the University
College of Dublin. His premise was that,
by using cluster analysis of Google Trends data, we can see how people have
begun thinking about the holidays earlier and earlier each year. It’s a good read: http://www.statslife.org.uk/significance/1892. I should note that Nathan graciously
answered my emails asking for clarification and saw real value in this
technique for emergency management work.
I decided to replicate his results using a FEMA-related
search term: “hurricane”.
Google Trends
Google Trends (
http://www.google.com/trends/)
allows you to view the volume of searches on particular terms.
The units are percentage of total Google
searches.
For example, the week that
Hurricane Katrina made landfall, “hurricane” scored almost 100; almost all
searches were hurricane related. If you sign-on with your Google ID, you can
also download the data to CSV.
Cunningham
used Google Trends to analyze search volumes on holiday-related terms
(“Christmas”, “Santa Claus”, etc).
Here
I’ve compared the search terms “hurricane” and “tornado”.
You can see that there is a somewhat
repetitive pattern of increase mid-year.
I wanted to explore this pattern.
Cluster Analysis
Cluster Analysis looks at data and organizes it into groups
that share similarities.
Once Cunningham
had each year’s data, he used cluster analysis to determine in which week of
the year the volume of holiday-related searches began to increase.
Similar analysis can be done on FEMA-related
search terms; a cluster analysis of the Google Trend data for the search term
“hurricane” reveals continuous periods of increased interest for the following
weeks from 2004-2014.
This was simple to
implement using R (see code below).
The
accompanying graphic shows the “shape” of the cluster; the x-axis is the week
number of the year, and the y-axis is the percentage of all Google searches for
the term “hurricane”.
In hindsight, it
is possible to find explanations for these clusters; for example, 2005 and 2012
had periods of exceptionally high interest corresponding to the hurricane
activity of those years.
2009 and 2013
had little activity (look at the y-axis) corresponding to light years.
Further Investigation
This simple example shows how cluster analysis can illustrate the behavior of data that have more
than one pattern. This could find
application in data that vary from Region to Region or JFO to JFO, or changes
with disaster type.
Although Cunningham used cluster analysis to
look at Google Trends data, it is
easy to see that the data returned also lend themselves to Time Series Analysis.
R Code used in this example
## Crow's nest Clustering example
– Tim Allen
# Adapted from
http://www.statslife.org.uk/significance/1892
# Nathan Cunningham - Does
Christmas really come earlier every year?
# Significance Magazine 11
November 2014
#
Allow multiple plots (2 rows x 6 columns)
par(mfrow=c(2,6))
#
You have to install and load the mclust package
library(mclust)
#
Calculate clusters for each year
for (yr in 2007:2013) {
# 1) load this
year's data in a matrix
observations <- span="">->as.matrix(subset(gtrends, year==yr, select=c("week","hurricane")))
# 2) find
clusters based on models' BIC
fit <- span="">->Mclust(observations, 2)
# 3) Plot the
clusters and print the model summary
plot(fit, what="classification", xlab=yr)
print(summary(fit))
}
Acknowledgement
My sincere appreciation to Nathan Cunningham of the University College of Dublin for his kind help in preparation of this article. Please read his article, "Does Christmas really come earlier every year?"