The datasets that have been preprocessed by BenchTemp are Here. You can directly download the datasets and then put them into the directory './data'. In addition, BenchTemp provides DataPreprocessor class for you to preprocess yours TGNNs datasets.
Class:
class DataPreprocessor(data_path: str, data_name: str)
Args:
Function:
DataPreprocessor.data_preprocess(bipartite: bool)
Args:
Returns:
ml_{data_name}.csv - The csv file of the Temporal Graph. This file have five columns with properties:
ml_{data_name}.npy - The edge features corresponding to the interactions (edges) in the the Temporal Graph..
Example:
import benchtemp as bt
processor = bt.DataPreprocessor(data_path="./data/", data_name="mooc")
# If the dataset is bipartite graph, i.e. the user (source nodes) and the item (destination nodes) are of the same type.
processor.data_preprocess(bipartite=True)
# If the dataset is non-bipartite graph.
processor.data_preprocess(bipartite=False)
Class:
class TemporalGraph(sources: numpy.array, destinations: numpy.array, timestamps: numpy.array, edge_idxs: numpy.array, labels: numpy.array)
Args:
Returns:
Example:
import pandas as pd
import numpy as np
import benchtemp as bt
graph_df = pd.read_csv("dataset_path")
sources = graph_df.u.values
destinations = graph_df.i.values
edge_idxs = graph_df.idx.values
labels = graph_df.label.values
timestamps = graph_df.ts.values
# For example, the full Temporal Graph of the dataset is full_data.
full_data = bt.TemporalGraph(sources, destinations, timestamps, edge_idxs, labels)
The DataLoader class for link prediction tasks.
In transductive link prediction, Dataloader splits the temporal graphs chronologically into 70%-15%-15% for train, validation and test sets according to edge timestamps.
In inductive link prediction, Dataloader performs the same split as the transductive setting, and randomly masks 10% nodes as unseen nodes. Any edges associated with these unseen nodes are removed from the training set. To reflect different inductive scenarios, DataLoader further generates three inductive test sets from the transductive test dataset, by filtering edges in different manners:
Class:
class lp.DataLoader(dataset_path: str, dataset_name: str, different_new_nodes_between_val_and_test: bool, randomize_features: bool)
Args:
Function:
lp.DataLoader.load()
Returns:
Example:
import benchtemp as bt
data = bt.lp.DataLoader(dataset_path="./data/", dataset_name='mooc')
node_features, edge_features, full_data, train_data, val_data, test_data, new_node_val_data, new_node_test_data, new_old_node_val_data, new_old_node_test_data, new_new_node_val_data, new_new_node_test_data, unseen_nodes_num = data.load()
BenchTemp provides the unified negative edge sampler class with a seed named RandEdgeSampler for link prediction task to sample an equal amount of negatives to the positive interactions.
Class:
RandEdgeSampler(src_list: numpy.array, dst_list: numpy.array, seed: int)
Args:
Function:
RandEdgeSampler.sample(size: int)
Args:
Returns:
Example:
import benchtemp as bt
# For example, if you are training , you should create a training RandEdgeSampler based on the training dataset.
train_rand_sampler = bt.lp.RandEdgeSampler(train_data.sources, train_data.destinations)
...
for epoch in range(args.epochs):
...
# sample an equal amount of negatives to the positive interactions.
size = len(train_data)
_, negatives_batch = train_rand_sampler.sample(size)
...
...
The DataLoader class for the node classification task. The DataLoader module sorts edges and splits the input dataset (70%-15%-15%) according to edge timestamps.
Class:
nc.DataLoader(dataset_path: str, dataset_name: str, use_validation: bool)
Args:
Function:
nc.DataLoader.load()
Returns:
Example:
import benchtemp as bt
data = bt.nc.DataLoader(dataset_path="./data/", dataset_name='mooc', use_validation=True)
node_features, edge_features, full_data, train_data, val_data, test_data = data.load()
BenchTemp provides a unified EarlyStopMonitor to improve training efficiency and save resources.
Class:
EarlyStopMonitor(max_round: int, higher_better: bool, tolerance: float)
Args:
Function:
EarlyStopMonitor.early_stop_check(curr_val:float)
Args:
Returns:
Example:
import benchtemp as bt
...
early_stopper = bt.EarlyStopMonitor(max_round=args.patience)
for epoch in range(args.epochs):
...
val_ap = model(val_datasets)
if early_stopper.early_stop_check(val_ap):
break
...
...
Different evaluation metrics are available, including Area Under the Receiver Operating Characteristic Curve (ROC AUC) and Average Precision (AP). Usually, metrics Area Under the Receiver Operating Characteristic Curve (ROC AUC) and average precision (AP) are for the link prediction task, while metrics AUC is for the node classification task.
Class:
Evaluator(task_name: str)
Args:
Function:
Evaluator.eval(pred_score: numpy.array, true_label: numpy.array)
Args:
Returns:
Example:
import benchtemp as bt
# For example, Link prediction task. Evaluation Metrics: AUC, AP.
evaluator = bt.Evaluator("LP")
...
# test data
pred_score = model(test_data)
test_auc, test_ap = evaluator.eval(pred_score, true_label)
...
import benchtemp as bt
# For example, node classification task. Evaluation Metrics: AUC.
evaluator = bt.Evaluator("NC")
...
# test data
pred_score = model(test_data)
test_auc = evaluator.eval(pred_score, true_label)
...