Reference



BenchTemp Modules

Image 1

DataPreprocessor

The datasets that have been preprocessed by BenchTemp are Here. You can directly download the datasets and then put them into the directory './data'. In addition, BenchTemp provides DataPreprocessor class for you to preprocess yours TGNNs datasets.

Class:

class DataPreprocessor(data_path: str, data_name: str)

Args:

  1. data_path: str - The path of the dataset.
  2. data_name: str - The name of the dataset.

Function:

DataPreprocessor.data_preprocess(bipartite: bool)

Args:

  1. bipartite: bool - Whether the Temporal Graph is a bipartite graph (Heterogeneous or Homogeneous)..

Returns:

  • ml_{data_name}.csv - The csv file of the Temporal Graph. This file have five columns with properties:

    • 'u': The id of the user.
    • 'i': The id of the item.
    • 'ts': The timestamp of the interaction (edge) between the user and the item.
    • 'label': The label of the interaction (edge).
    • 'idx': The index of the interaction (edge).
  • ml_{data_name}.npy - The edge features corresponding to the interactions (edges) in the the Temporal Graph..

  • ml_{data_name}_node.npy - The initialization node features of the Temporal Graph.

Example:

import benchtemp as bt

processor = bt.DataPreprocessor(data_path="./data/", data_name="mooc")
# If the dataset is bipartite graph, i.e. the user (source nodes) and the item (destination nodes) are of the same type.
processor.data_preprocess(bipartite=True)

# If the dataset is non-bipartite graph.
processor.data_preprocess(bipartite=False)

TemporalGraph

The class of a temporal graph. A temporal graph can be represented as an ordered sequence of temporal user-item interactions $$I_{r}=(u_{r}, i_{r}, t_{r}, e_{r})$$, $0 \leq t_{1} \leq \dots t_{r} \dots \leq T$. The $r$-th interaction $I_{r}$ happens at time $t_{r}$ between user $u_{r}$ and item $i_{r}$ with edge feature $e_{r}$.


Class
:

class TemporalGraph(sources: numpy.array, destinations: numpy.array, timestamps: numpy.array, edge_idxs: numpy.array, labels: numpy.array)

Args:

  • sources: numpy.array - Array of sources of Temporal Graph edges.
  • destinations: numpy.array - Array of destinations of Temporal Graph edges.
  • timestamps: numpy.array - Array of timestamps of Temporal Graph edges.
  • edge_idxs: numpy.array - Array of edge IDs of Temporal Graph edges.
  • labels: numpy.array - Array of labels of Temporal Graphe dges.

Returns:

  • benchtemp.TemporalGraph. A Temporal Graph.

Example:

import pandas as pd
import numpy as np
import benchtemp as bt


graph_df = pd.read_csv("dataset_path")

sources = graph_df.u.values
destinations = graph_df.i.values
edge_idxs = graph_df.idx.values
labels = graph_df.label.values
timestamps = graph_df.ts.values

# For example, the full Temporal Graph of the dataset is full_data.
full_data = bt.TemporalGraph(sources, destinations, timestamps, edge_idxs, labels)

lp.DataLoader

The DataLoader class for link prediction tasks.

In transductive link prediction, Dataloader splits the temporal graphs chronologically into 70%-15%-15% for train, validation and test sets according to edge timestamps.

In inductive link prediction, Dataloader performs the same split as the transductive setting, and randomly masks 10% nodes as unseen nodes. Any edges associated with these unseen nodes are removed from the training set. To reflect different inductive scenarios, DataLoader further generates three inductive test sets from the transductive test dataset, by filtering edges in different manners:

  • Inductive - selects edges with at least one unseen node.
  • Inductive New-Old - selects edges between a seen node and an unseen node.
  • Inductive New-New - selects edges between two unseen nodes.

Class:

class lp.DataLoader(dataset_path: str, dataset_name: str, different_new_nodes_between_val_and_test: bool, randomize_features: bool)

Args:

  • dataset_path: str - The path of the dataset.
  • dataset_name: str - The name of dataset.
  • different_new_nodes_between_val_and_test: bool - The new nodes are between validation set and test set.
  • randomize_features: str - Random initialization of node features.

Function:

lp.DataLoader.load()

Returns:

  • node_features: numpy.array - Array of the Node Features of the Temporal Graph.
  • edge_features: numpy.array - Array of the Edge Features of the Temporal Graph.
  • full_data: benchtemp.TemporalGraph - Full Temporal Graph dataset.
  • train_data: benchtemp.TemporalGraph - The training set.
  • val_data: benchtemp.TemporalGraph - The validation set.
  • test_data: benchtemp.TemporalGraph - The Transductive test set.
  • new_node_val_data: benchtemp.TemporalGraph - The Inductive validation set.
  • new_node_test_data: benchtemp.TemporalGraph - The Inductive test set.
  • new_old_node_val_data: benchtemp.TemporalGraph - The Inductive New-Old validation set.
  • new_old_node_test_data: benchtemp.TemporalGraph - The Inductive New-Old test set.
  • new_new_node_val_data: benchtemp.TemporalGraph - The Inductive New-New validation set.
  • new_new_node_test_data: benchtemp.TemporalGraph - The Inductive New-New test set.
  • unseen_nodes_num: int - The number of unseen nodes in inductive setting.

Example:

import benchtemp as bt

data = bt.lp.DataLoader(dataset_path="./data/", dataset_name='mooc')
  
node_features, edge_features, full_data, train_data, val_data, test_data, new_node_val_data, new_node_test_data, new_old_node_val_data, new_old_node_test_data, new_new_node_val_data, new_new_node_test_data, unseen_nodes_num = data.load()

lp.RandEdgeSampler

BenchTemp provides the unified negative edge sampler class with a seed named RandEdgeSampler for link prediction task to sample an equal amount of negatives to the positive interactions.

Class:

RandEdgeSampler(src_list: numpy.array, dst_list: numpy.array, seed: int)

Args:

  • src_list: numpy.array - Array of source nodes.
  • dst_list: numpy.array - Array of destination nodes.
  • seed: numpy.array - The seed of random.

Function:

RandEdgeSampler.sample(size: int)

Args:

  • size: int - The size of the sampling negative edges.

Returns:

  • src_list: numpy.array - Array of source nodes of the sampling negative edges.
  • dst_list: numpy.array - Array of destination nodes of the sampling negative edges.

Example:

import benchtemp as bt

# For example, if you are training , you should create a training  RandEdgeSampler based on the training dataset.
train_rand_sampler = bt.lp.RandEdgeSampler(train_data.sources, train_data.destinations)

...
for epoch in range(args.epochs):
    ...
    # sample an equal amount of negatives to the positive interactions.
    size = len(train_data)
    _, negatives_batch = train_rand_sampler.sample(size)
    ...
...

nc.DataLoader

The DataLoader class for the node classification task. The DataLoader module sorts edges and splits the input dataset (70%-15%-15%) according to edge timestamps.

Class:

nc.DataLoader(dataset_path: str, dataset_name: str, use_validation: bool)

Args:

  • dataset_path: str - The path of the dataset.
  • dataset_name: str - The name of the dataset.
  • use_validation: bool - Whether use validation dataset or not.

Function:

nc.DataLoader.load()

Returns:

  • node_features: numpy.array - Array of the Node Features of the Temporal Graph.
  • edge_features: numpy.array - Array of the Edge Features of the Temporal Graph.
  • full_data: benchtemp.TemporalGraph - Full Temporal Graph dataset for node classification task.
  • train_data: benchtemp.TemporalGraph - The training set for node classification task.
  • val_data: benchtemp.TemporalGraph - The validation set for node classification task.
  • test_data: benchtemp.TemporalGraph - The test set for node classification task.

Example:

import benchtemp as bt

data = bt.nc.DataLoader(dataset_path="./data/", dataset_name='mooc', use_validation=True)

node_features, edge_features, full_data, train_data, val_data, test_data = data.load()

EarlyStopMonitor

BenchTemp provides a unified EarlyStopMonitor to improve training efficiency and save resources.

Class:

EarlyStopMonitor(max_round: int, higher_better: bool, tolerance: float)

Args:

  • max_round: int - The number of rounds for early stop.
  • higher_better: bool - The higher the value, the better the performance.
  • tolerance: float - The tolerance of the EarlyStopMonitor.

Function:

EarlyStopMonitor.early_stop_check(curr_val:float)

Args:

  • curr_val: float - The value to check for early stop.

Returns:

  • True - If the value matches the setting of the EarlyStopMonitor.
  • False - If the value does not match the setting of the EarlyStopMonitor.

Example:

import benchtemp as bt

...
early_stopper = bt.EarlyStopMonitor(max_round=args.patience)
for epoch in range(args.epochs):
    ...
    val_ap = model(val_datasets)
    if early_stopper.early_stop_check(val_ap):
        break
    ...
...

Evaluator

Different evaluation metrics are available, including Area Under the Receiver Operating Characteristic Curve (ROC AUC) and Average Precision (AP). Usually, metrics Area Under the Receiver Operating Characteristic Curve (ROC AUC) and average precision (AP) are for the link prediction task, while metrics AUC is for the node classification task.

Class:

Evaluator(task_name: str)

Args:

  • task_name: str - the name of the task, choice in ["LP", "NC"], LP for the link prediction task and NC for the node classification task.

Function:

Evaluator.eval(pred_score: numpy.array, true_label: numpy.array)

Args:

  • pred_score: numpy.array- Array of prediction scores.
  • true_label: numpy.array - Array of true labels.

Returns:

  • AUC: float - the value of the AUC.
  • AP: float - the value of the AP.

Example:

import benchtemp as bt

# For example, Link prediction task. Evaluation Metrics: AUC, AP.
evaluator = bt.Evaluator("LP")

...
# test data
pred_score = model(test_data)
test_auc, test_ap = evaluator.eval(pred_score, true_label)
...
import benchtemp as bt

# For example, node classification task. Evaluation Metrics: AUC.
evaluator = bt.Evaluator("NC")

...
# test data
pred_score = model(test_data)
test_auc = evaluator.eval(pred_score, true_label)
...