My research is interdisciplinary, encompassing social media/computational social science, web/data science, web archiving, and (local) news.


StoryGraph provides a collection of tools that analyze the news cycle. USA generates a news similarity graph every 10 minutes by computing the similarity of news stories from 17 US news sources across the partisanship spectrum (left, center, and right). In these graphs, the nodes represent news articles, and an edge between a pair of nodes represents a high degree of similarity between the nodes (similar news stories).

Slow news cycle story graph Split attention story graph Mueller report story graph
Three news similarity graphs illustrating the dynamics of the news cycle. In these graphs, a single node represents a news article, a connected component (multiple connected nodes) represents a single news story reported by the connected nodes. StoryGraph uses the average degree of the connected components to quantify the level of attention stories receive. The first graph shows what is often referred to as a slow news day; low overlap across different news media organizations. The second graph shows a scenario where the attention of the media is split across multiple news stories. The third graph, which is about the release of the [Mueller Report](, shows a major news event; high degree of overlap/connectivity across different news media organizations.


StoryGraphBot is a Twitter bot that runs every hour, tracking top news stories and creating tweet threads that report updates (rising/falling/same attention) of the stories. See also, Chronicling the life-cycle of top new stories with StoryGraphBot.

Story Attention Dynamics Graph
Story Attention Dynamics chart illustrating the life-cycle of two top news stories from May 18, 2018 -- May 19, 2018. Each line (red or blue) represents a top news story. The x-axis represents time while the y-axis represents the average degree of Connected Components (representation of story). Within our window of observation, the Santa Fe High School Shooting story received peak attention on Friday May 18, 2018 at 4:40PM, this attention waned with the lowest point coinciding with the rise of a new story, the Royal Wedding of Prince Harry and Meghan Markle.

Local Memory Project

Local Memory Project helps users and small communities discover, collect, build, archive, and share collections of stories for important local events from local sources.

Slow news cycle story graph
Split attention story graph
Mueller report story graph
Mueller report story graph


Sumgram is a Python tool that summarizes text collections with their most frequent conjoined n-grams. See also, Introducing sumgram, a tool for generating the most frequent conjoined ngrams.

Sumgrams vs. ngrams
Comparison of top 20 (first column) bigrams, top 20 (second column) six-grams, and top 20 (third column) sumgrams (conjoined ngrams) generated by sumgram for a collection of documents about the 2014 Ebola Virus Outbreak. Proper nouns of more than two words (e.g., "centers for disease control and prevention") are split when generating bigrams, sumgram strives to remedy this. Generating six-grams surfaces non-salient six-

What Did It Look Like

What Did It Look Like is a Twitter bot that replies to a tweet that contains the #whatdiditlooklike hashtag and a URL, with a Tumblr post of the yearly snapshot of what the webpage looked like.