Temporal Graph Datasets



All datasets are publicly available under CC BY-NC licence and can be accessed at Here.



Reddit


Reddit is a bipartite interaction graph, consisting of one month of posts made by users on subreddits. Users and subreddits are nodes, and egdes are interactions of users writing posts to subreddits. The text of each post is converted into LIWC-feature vector as an edge feature of length 172. This public dataset gives 366 true labels among 672,447 interactions, and those true label are ground-truth labels of banned users from Reddit.


Image description
The temporal distribution of Reddit.



Wikipedia


Wikipedia is a bipartite interaction graph, and contains one month of edits made by editors. This public dataset selects the 1,000 most edited pages as items and editors who made at least 5 edits as users over a month. Editors and pages are nodes, and edges are interactions of editors editing on pages. Edge features of length 172 are interaction edits converted into LIWC-feature vectors. Wikipedia dataset treats 217 public ground-truth labels of banned users from 157,474 interactions as positive labels.


Image description
The temporal distribution of Wikipedia.



MOOC


MOOC is a bipartite MOOC online network of students and online course content units. into LIWC-feature vectors. Students and courses are nodes, and edges with features of length 4 are interactions of user viewing a video, submitting an answer, etc. This public dataset treats 4,066 dropout events out of 411,749 interactions as positive labels.


Image description
The temporal distribution of MOOC.



LastFM


LastFM is a user-song bipartite network. Users and songs are nodes, and edges are user-listens-song interactions. This public dataset includes 1,293,103 interactions between all 1000 users and the 1000 most listened songs.


Image description
The temporal distribution of LastFM.



Enron


Enron is an email communication network that collects about half a million emails over several years. Nodes of the network are email addresses, and edges are email communication between accounts.


Image description
The temporal distribution of Enron.



SocialEvo


SocialEvo is a network in which experiments are conducted to closely track the everyday life of a whole undergraduate dormitory with mobile phones. This public dataset is collected by a cell phone application every six minutes, and contains physical proximity and location between students living in halls of residence.


Image description
The temporal distribution of SocialEvo.



UCI


UCI is an email communication network that collects about half a million emails over several years. Nodes of the network are email addresses, and edges are email communication between accounts.


Image description
The temporal distribution of UCI.



CollegeMsg


CollegeMsg is provided by the SNAP team of Stanford. This dataset is derived from the facebook-like social network introduced in dataset UCI. The SNAP team has parsed it to a temporal network. Each edge has 172 features.


Image description
The temporal distribution of CollegeMsg.



CanParl


CanParl is a Canadian parliament bill voting network extracted from open website. Nodes are members of parliament (MPs), and edges are the interactions between MPs from 2006 to 2019.


Image description
The temporal distribution of CanParl.



Contact


Contact is a temporal and weighted network of physical proximity among the participants. Nodes are participant and edges are proximity events between the study participants. Edge features indicate the physical proximity between participants.


Image description
The temporal distribution of Contact.



Flights


Flights is a weighted flight network. Nodes are airports, and edges are tracked flights. The weights of edges indicate the number of flights between two given airports within a day.


Image description
The temporal distribution of Flights.



UNTrade


UNTrade is a food and agriculture trading weighted network among 181 nations over 30 years. Nodes are countries, and edges are tradings between two countries. The weights of edges are the total sum of normalized agriculture import or export values between two given countries.


Image description
The temporal distribution of UNTrade.



USLegis


USLegis is a senate co-sponsorship network that examines the social relations of legislators in their co-sponsorship relationships on bills. Nodes are congress members, and edge weights are the number of times that two members of congress co-sponsor a bill in a given congress.


Image description
The temporal distribution of USLegis.



UNVote


UNVote is a weighted network of roll-call votes in the UN General Assembly 1946-2021. Nodes are nations, and edge weights are the number of times both nations have voted "yes" to an item.


Image description
The temporal distribution of UNVote.



Taobao


Taobao is a subset of the Taobao user behavior dataset intercepted based on the period 8:00 to 18:00 on 26 November 2017. This public dataset is a user-item bipartite network. Nodes are users and items, and edges are behaviors between users and items, such as favor, click, purchase, and add an item to shopping cart. Each edge has 4 features, corresponding to 4 different types of behaviors.


Image description
The temporal distribution of Taobao.