Operators

In this section, we’ll provide a detailed introduction of some basic operators on high order tensors.

Code Architecture

The code for these high-order graph neural network (HOGNN) operations is organized into three layers:

Layer 1: Backend: This layer, found in the pygho.backend module, contains basic data structures and operations focused on tensor manipulations. It lacks graph-specific learning concepts and includes the following functionalities:

Matrix multiplication: This method supports general matrix multiplication capabilities, including operations on two SparseTensors, one sparse and one MaskedTensor, and two MaskedTensors. It also handles batched matrix multiplication and offers operations that replace the sum in traditional matrix multiplication with max and mean operations.
Two matrix addition: Operations for adding two sparse or two dense matrices.
Reduce operations: These operations include sum, mean, max, and min, which reduce dimensions in tensors.
Expand operation: This operation adds new dimensions to tensors.
Tuplewise apply(func): It applies a given function to the underlying data tensor.
Diagonal apply(func): This operation applies a function to diagonal elements of tensors.

Layer 2: Graph operations: Building upon Layer 1, the pygho.honn.SpOperator and pygho.honn.MaOperator modules provide graph operations specifically tailored for Sparse and Masked Tensor structures. Additionally, the pygho.honn.TensorOp layer wraps these operators, abstracting away the differences between Sparse and Masked Tensor data structures. These operations encompass:

General message passing between tuples: Facilitating message passing between tuples of nodes.
Pooling: This operation reduces high-order tensors to lower-order ones by summing, taking the maximum, or computing the mean across specific dimensions.
Diagonal: It reduces high-order tensors to lower-order ones by extracting diagonal elements.
Unpooling: This operation extends low-order tensors to high-order ones.

Layer 3: Models: Building on Layer 2, this layer provides a collection of representative high-order GNN layers, including NGNN, GNNAK, DSSGNN, SUN, SSWL, PPGN, and I2GNN. Layer 3 offers numerous ready-to-use methods, and with Layer 2, users can design additional models using general graph operations. Layer 1 allows for the development of novel operations, expanding the library’s flexibility and utility. Now let’s explore these layers in more detail.

Layer 1: Backend

Spspmm

One of the most complex operators in this layer is sparse-sparse matrix multiplication (Spspmm). Given two sparse matrices, C and D, and assuming their output is B:

\[B_{ij} = \sum_k C_{ik} D_{kj}\]

The Spspmm operator utilizes a coo format to represent elements. Assuming element C_ik corresponds to C.values[c_ik] and D_kj corresponds to D.values[d_kj], we can create a tensor bcd of shape (3, m), where m is the number of pairs (i, j, k) where both C_ik and D_kj exist. The multiplication can be performed as follows:

B.values = zeros(...)
for i in range(m):
    B.values[bcd[0, i]] += C.values[bcd[1, i]] * D.values[bcd[2, i]]

This summation process can be efficiently implemented in parallel on GPU using torch.Tensor.scatter_reduce_. The bcd tensor can be precomputed with pygho.backend.Spspmm.spspmm_ind() and shared among matrices with the same indices.

Hadamard product between two sparse matrices can also be implemented, where \(C = A \odot B\): C can use the same indice as \(A\).

C.values[a_ij] = A.values[a_ij] * B.values[b_ij]

The tensor b2a can be defined, where b2a[b_ij] = a_ij if A has the element (i, j); otherwise, it is set to -1. Then, the Hadamard product can be computed as follows:

C.values = zeros(...)
for i in range(A.nnz):
    if b2a[i] >= 0:
        C.values[i] = A.values[i] * B.values[b2a[i]]

The operation can also be efficiently implemented in parallel on a GPU.

To compute \(A\odot (CD)\), you can define a tensor acd of shape (3, m’) where acd[0] = b2a[bcd[0]], acd[1] = bcd[1], and acd[2] = bcd[2], and remove columns i where acd[0, i] = -1. The computation can be done as follows:

ret.values = zeros(...)
for i in range(acd.shape[1]):
    ret.values[acd[0, i]] += A.values[acd[0, i]] * C.values[acd[1, i]] * D.values[acd[2, i]]

Like the previous operations, this can also be implemented efficiently in parallel on a GPU. Additionally, by setting A.values[acd[0, i]] to 1, A can act as a mask, ensuring that only elements existing in A are computed.

The overall wrapper for these functions is pygho.honn.Spspmm.spspmm(), which can perform sparse-sparse matrix multiplication with precomputed indices. pygho.honn.Spspmm.spspmpnn() provides a more complex operator that goes beyond matrix multiplication, allowing you to implement various graph operations. It can in fact implement the following framework.

\[ret_{ij} = \phi(\{(A_{ij}, B_{ik}, C_{kj})|B_{ik},C_{kj} \text{ elements exist}\})\]

where phi is a general multiset function, which is a functional parameter of spspmpnn. For example, with it, we can implement GAT on each subgraph as follows.

self.attentionnn1 = nn.Linear(hiddim, hiddim)
self.attentionnn2 = nn.Linear(hiddim, hiddim)
self.attentionnn3 = nn.Linear(hiddim, hiddim)
self.subggnn = NGNNConv(hiddim, hiddim, args.aggr,
               "SS", transfermlpparam(mlp),
               message_func=lambda a,b,c,tarid:
                  scatter_softmax(
                     self.attentionnn1(a) * b * self.attention2(c),
                     tarid)
                     * self.attentionnn3(c))

TuplewiseApply

Both Sparse and Masked Tensors have the tuplewiseapply function. The most common usage is:

mlp = ...
X.tuplewiseapply(mlp)

However, in practice, this function directly applies the values or data tensor to mlp. As linear layers, non-linearities, and layer normalization all operate on the last dimension, this operation is essentially equivalent to tuplewise apply. For batchnorm, we provide a version that not affected by this problem in pygho.honn.utils.

DiagonalApply

Both Sparse and Masked Tensors have the diagonalapply function. Unlike tuplewiseapply, this function passes both data/values and a mask indicating whether the corresponding elements are on the diagonal to the input function. A common use case is:

mlp1 = ...
mlp2 = ...
lambda x, diagonalmask: torch.where(diagonalmask, mlp1(x), mlp2(x))
X.diagonalapply(mlp)

Here, mlp1 is applied to diagonal elements, and mlp2 is applied to non-diagonal elements. You can also use torch_geometric.nn.HeteroLinear for a faster implementation.

Layer 2: Operators

pygho.honn.SpOperator and pygho.honn.MaOperator wrap the backend for SparseTensor and MaskedTensor separately. Their APIs are unified in pygho.honn.TensorOp. The basic operators include OpNodeMessagePassing (node-level message passing), OpMessagePassing (tuple-level message passing, wrapping matrix multiplication), OpPooling (reduce high-order tensors to lower-order ones by sum, mean, max), OpDiag (reduce high-order tensors to lower-order ones by extracting diagonal elements), and OpUnpooling (extend lower-order tensors to higher-order ones). Special cases are also defined.

Sparse OpMessagePassing

As described in Layer 1, the OpMessagePassing operator wraps the properties of Spspmm and is defined with parameters like op0, op1, dim1, op2, dim2, and aggr. It retrieves precomputed data from a data dictionary during the forward process using keys like f"{op0}___{op1}___{dim1}___{op2}___{dim2}". Here’s the forward method signature:

def forward(self,
            A: SparseTensor,
            B: SparseTensor,
            datadict: Dict,
            tarX: Optional[SparseTensor] = None) -> SparseTensor:

In this signature, tarX corresponds to op0, providing the target indices, while A and B correspond to op1 and op2. The datadict can be obtained from the data loader using for batch in dataloader: batch.to_dict().

Example

To illustrate how these operators work, let’s use NGNN as an example. Although our operators can handle batched data, for simplicity, we’ll focus on the single-graph case. Let H represent the representation matrix in \(\mathbb{R}^{n\times n\times d}\), and A denote the adjacency matrix in \(\mathbb{R}^{n\times n}\). The Graph Isomorphism Network (GIN) operation on all subgraphs can be defined as:

\[h_{ij} \leftarrow \sum_{k\in N_i(j)} \text{MLP}(h_{ik})\]

This operation can be represented using two steps:

Apply the MLP function to each tuple’s representation:

X' = X.tuplewiseapply(MLP)

Perform matrix multiplication to sum over neighbors:

X = X' * A^T

In the matrix multiplication step, batching is applied to the last dimension of X. This conversion may seem straightforward, but there are several key points to consider:

Optimization for induced subgraph input: The original equation involves a sum over neighbors in the subgraph, but the matrix multiplication version includes neighbors from the entire graph. However, our implementation optimizes for induced subgraph cases, where neighbors outside the subgraph are automatically handled by setting their values to zero.
Optimization for sparse output: The operation X’ * A^T may produce non-zero elements for pairs (i, j) that do not exist in the subgraph. For sparse input tensors X and A, we optimize the multiplication to avoid computing such non-existent elements.

Pooling processes can also be considered as a reduction of \(X\). For instance:

\[h_i=\sum_{j\in V_i}\text{MLP}_2(h_{ij})\]

can be implemented as follows:

# definition
self.pool = OpPoolingSubg2D(...)
...
# forward
Xn = self.pool(X.tuplewiseapply(MLP_1))

This example demonstrate how our library’s operators can be used to efficiently implement various HOGNNs.