Minimal Example

Let’s delve into the fundamental concepts of PyGHO using a minimal example. The complete code can be found in example/minimal.py. You can execute the code with the following command:

python minimal.py

This example demonstrates the implementation of a basic HOGNN model Nested Graph Neural Network (NGNN) in the paper Nested Graph Neural Network (NGNN). NGNN works by first sampling a k-hop subgraph for each node \(i\) and then applying Graph Neural Networks (GNN) on all these subgraphs simultaneously. It generates a 2-D representation \(H\in \mathbb{R}^{n\times n\times d}\), where \(H_{ij}\) represents the representation of node \(j\) in the subgraph rooted at node \(i\). The message passing within all subgraphs can be expressed as:

\[h_{ij}^{t+1} \leftarrow \sum_{k\in N_i(j)} \text{MLP}(h^t_{ik}),\]

where \(N_i(j)\) represents the set of neighbors of node \(j\) in the subgraph rooted at \(i\). After several layers of message passing, tuple representations \(H\) are pooled to generate the node representations:

\[h_i = P(\{h_{ij} | j\in V_i\}).\]

This example serves as a fundamental illustration of our work.

Dataset Preprocessing

As HOGNNs share tasks with ordinary GNNs, they can utilize datasets provided by PyG. However, NGNN still needs to sample subgraphs, equivalent to providing initial features for tuple representation \(h_{ij}\). You can achieve this transformation with the following code:

# Load an ordinary PyG dataset
from torch_geometric.datasets import ZINC
trn_dataset = ZINC("dataset/ZINC", subset=True, split="train")

# Transform it into a High-order graph dataset
from pygho.hodata import Sppretransform, ParallelPreprocessDataset
trn_dataset = ParallelPreprocessDataset(
        "dataset/ZINC_trn", trn_dataset,
        pre_transform=Sppretransform(tuplesamplers=[partial(KhopSampler, hop=3)], annotate=[""], keys=keys), num_workers=8)

The ParallelPreprocessDataset class takes a standard PyG dataset as input and performs transformations on each graph in parallel, utilizing 8 processes in this example. The tuplesamplers parameter represents functions that take a graph as input and produce a sparse tensor. In this example, we use partial(KhopSampler, hop=3), a sampler designed for NGNN, to sample a 3-hop ego-network rooted at each node. The shortest path distance to the root node serves as the tuple features. The produced SparseTensor is then saved and can be effectively used to initialize tuple representations. The keys variable is a list of strings indicating the required precomputation, which can be automatically generated after defining a model:

from pygho.honn.SpOperator import parse_precomputekey
keys = parse_precomputekey(model)

Mini-batch and DataLoader

Enabling batch training in HOGNNs requires handling graphs of varying sizes, which can be challenging. This library concatenates the SparseTensors of each graph along the diagonal of a larger tensor. For instance, in a batch of \(B\) graphs with adjacency matrices \(A_i\in \mathbb{R}^{n_i\times n_i}\), node features \(x\in \mathbb{R}^{n_i\times d}\), and tuple features \(X\in \mathbb{R}^{n_i\times n_i\times d'}\) for \(i=1,2,\ldots,B\), the features for the entire batch are represented as \(A\in \mathbb{R}^{n\times n}\), \(x\in \mathbb{R}^{n\times d}\), and \(X\in \mathbb{R}^{n\times n\times d'}\), where \(n=\sum_{i=1}^B n_i\). The concatenation is as follows:

\[\begin{split}A=\begin{bmatrix} A_1&0&0&\cdots &0\\ 0&A_2&0&\cdots &0\\ 0&0&A_3&\cdots &0\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 0&0&0&\cdots&A_B \end{bmatrix} ,x=\begin{bmatrix} x_1\\ x_2\\ x_3\\ \vdots\\ x_B \end{bmatrix} ,X=\begin{bmatrix} X_1&0&0&\cdots &0\\ 0&X_2&0&\cdots &0\\ 0&0&X_3&\cdots &0\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 0&0&0&\cdots&X_B \end{bmatrix}\end{split}\]

We provide our DataLoader as part of PygHO. It has compatible parameters with PyTorch’s DataLoader and combines sparse tensors for different graphs:

from pygho.subgdata import SpDataloader
trn_dataloader = SpDataloader(trn_dataset, batch_size=128, shuffle=True, drop_last=True)

Using this DataLoader is similar to an ordinary PyG DataLoader:

for batch in dataloader:
    batch = batch.to(device, non_blocking=True)

However, in addition to PyG batch attributes (like edge_index, x, batch), this batch also contains a SparseTensor adjacency matrix A and initial tuple feature SparseTensor X.

Learning Methods on Graphs

To execute message passing on each subgraph simultaneously, you can utilize the NGNNConv in our library:

# Definition
self.subggnns = nn.ModuleList([
            NGNNConv(hiddim, hiddim, "sum", "SS", mlp)
            for _ in range(num_layer)
        ])

...
# Forward pass
for conv in self.subggnns:
    tX = conv.forward(A, X, datadict)
    X = X.add(tX, True)

Here, A and X are SparseTensors representing the adjacency matrix and tuple representation, respectively. X.add implements a residual connection.

We also provide other convolution layers, including GNNAK, DSSGNN, SSWL, PPGN, SUN, I2GNN, in pygho.honn.Conv.