All datasets are publicly available under CC BY-NC licence and can be accessed at Here.
Reddit
Reddit is a bipartite interaction graph, consisting of one month of posts made by users on subreddits. Users and subreddits are
nodes, and egdes are interactions of users writing posts to subreddits. The text of each post is converted into LIWC-feature vector as an edge feature of length 172.
This public dataset gives 366 true labels among 672,447 interactions, and those true label are ground-truth labels of banned users from Reddit.
The temporal distribution of Reddit.
Wikipedia
Wikipedia is a bipartite interaction graph, and contains one month of edits made by editors.
This public dataset selects the 1,000 most edited pages as items and editors who made at least 5 edits as users over a month. Editors and pages are nodes, and edges are interactions of editors editing on pages. Edge features of length 172 are interaction edits converted into LIWC-feature vectors. Wikipedia dataset treats 217 public ground-truth labels of banned users from 157,474 interactions as positive labels.
The temporal distribution of Wikipedia.
MOOC
MOOC
is a bipartite MOOC online network
of students and online course content units.
into LIWC-feature vectors.
Students and courses are nodes, and edges with features of length 4 are interactions of user viewing a video, submitting an answer, etc.
This public dataset treats 4,066 dropout events out of 411,749 interactions as positive labels.
The temporal distribution of MOOC.
LastFM
LastFM
is a user-song bipartite network.
Users and songs are nodes, and edges are user-listens-song interactions.
This public dataset includes 1,293,103 interactions between all 1000 users and the 1000 most listened songs.
The temporal distribution of LastFM.
Enron
Enron
is an email communication network that collects about half a million emails over several years.
Nodes of the network are email addresses, and edges are email communication between accounts.
The temporal distribution of Enron.
SocialEvo
SocialEvo
is a network in which experiments are conducted to closely track the everyday life of a whole undergraduate dormitory with mobile phones.
This public dataset is collected by a cell phone application every six minutes, and contains physical proximity and location between students living in halls of residence.
The temporal distribution of SocialEvo.
UCI
UCI
is an email communication network that collects about half a million emails over several years.
Nodes of the network are email addresses, and edges are email communication between accounts.
The temporal distribution of UCI.
CollegeMsg
CollegeMsg
is provided by the SNAP team of Stanford.
This dataset is derived from the facebook-like social network introduced in dataset UCI.
The SNAP team has parsed it to a temporal network. Each edge has 172 features.
The temporal distribution of CollegeMsg.
CanParl
CanParl
is a Canadian parliament bill voting
network extracted from open website.
Nodes are members of parliament (MPs), and edges are the interactions between MPs from 2006 to 2019.
The temporal distribution of CanParl.
Contact
Contact
is a temporal and weighted network of physical proximity among the participants.
Nodes are participant and edges are proximity events between the study participants.
Edge features indicate the physical proximity between participants.
The temporal distribution of Contact.
Flights
Flights
is a weighted flight network.
Nodes are airports, and edges are tracked flights.
The weights of edges indicate the
number of flights between two given airports within a day.
The temporal distribution of Flights.
UNTrade
UNTrade
is a food and agriculture trading weighted network among 181 nations over 30 years.
Nodes are countries, and edges are tradings between two countries.
The weights of edges are the total sum of normalized
agriculture import or export values between two given countries.
The temporal distribution of UNTrade.
USLegis
USLegis
is a senate co-sponsorship network that examines the social relations of legislators in their
co-sponsorship relationships on bills.
Nodes are congress members, and edge weights are the number of times that two members of
congress co-sponsor a bill in a given congress.
The temporal distribution of USLegis.
UNVote
UNVote
is a weighted network of roll-call votes in the UN General Assembly 1946-2021.
Nodes are nations, and edge weights are the number of times both nations have voted "yes" to an item.
The temporal distribution of UNVote.
Taobao
Taobao
is a subset of the Taobao user behavior dataset intercepted based on the period 8:00 to 18:00 on 26 November 2017.
This public dataset is a user-item
bipartite network.
Nodes are users and items, and edges are behaviors between users and items, such as favor, click, purchase, and add an item to shopping cart.
Each edge has 4 features, corresponding to 4 different types of behaviors.