.. _operator-label: Operators ========= In this section, we'll provide a detailed introduction of some basic operators on high order tensors. Code Architecture ----------------- The code for these high-order graph neural network (HOGNN) operations is organized into three layers: **Layer 1: Backend:** This layer, found in the :py:mod:`pygho.backend` module, contains basic data structures and operations focused on tensor manipulations. It lacks graph-specific learning concepts and includes the following functionalities: - **Matrix multiplication:** This method supports general matrix multiplication capabilities, including operations on two SparseTensors, one sparse and one MaskedTensor, and two MaskedTensors. It also handles batched matrix multiplication and offers operations that replace the sum in traditional matrix multiplication with max and mean operations. - **Two matrix addition:** Operations for adding two sparse or two dense matrices. - **Reduce operations:** These operations include sum, mean, max, and min, which reduce dimensions in tensors. - **Expand operation:** This operation adds new dimensions to tensors. - **Tuplewise apply(func):** It applies a given function to the underlying data tensor. - **Diagonal apply(func):** This operation applies a function to diagonal elements of tensors. **Layer 2: Graph operations:** Building upon Layer 1, the :py:mod:`pygho.honn.SpOperator` and :py:mod:`pygho.honn.MaOperator` modules provide graph operations specifically tailored for Sparse and Masked Tensor structures. Additionally, the :py:mod:`pygho.honn.TensorOp` layer wraps these operators, abstracting away the differences between Sparse and Masked Tensor data structures. These operations encompass: - **General message passing between tuples:** Facilitating message passing between tuples of nodes. - **Pooling:** This operation reduces high-order tensors to lower-order ones by summing, taking the maximum, or computing the mean across specific dimensions. - **Diagonal:** It reduces high-order tensors to lower-order ones by extracting diagonal elements. - **Unpooling:** This operation extends low-order tensors to high-order ones. **Layer 3: Models:** Building on Layer 2, this layer provides a collection of representative high-order GNN layers, including NGNN, GNNAK, DSSGNN, SUN, SSWL, PPGN, and I2GNN. Layer 3 offers numerous ready-to-use methods, and with Layer 2, users can design additional models using general graph operations. Layer 1 allows for the development of novel operations, expanding the library's flexibility and utility. Now let's explore these layers in more detail. Layer 1: Backend ~~~~~~~~~~~~~~~~ Spspmm ^^^^^^ One of the most complex operators in this layer is sparse-sparse matrix multiplication (Spspmm). Given two sparse matrices, C and D, and assuming their output is B: .. math:: B_{ij} = \sum_k C_{ik} D_{kj} The Spspmm operator utilizes a coo format to represent elements. Assuming element C\_ik corresponds to C.values[c\_ik] and D\_kj corresponds to D.values[d\_kj], we can create a tensor ``bcd`` of shape (3, m), where m is the number of pairs (i, j, k) where both C\_ik and D\_kj exist. The multiplication can be performed as follows: .. code:: python B.values = zeros(...) for i in range(m): B.values[bcd[0, i]] += C.values[bcd[1, i]] * D.values[bcd[2, i]] This summation process can be efficiently implemented in parallel on GPU using ``torch.Tensor.scatter_reduce_``. The ``bcd`` tensor can be precomputed with :py:func:`pygho.backend.Spspmm.spspmm_ind` and shared among matrices with the same indices. Hadamard product between two sparse matrices can also be implemented, where :math:`C = A \odot B`: C can use the same indice as :math:`A`. .. code:: python C.values[a_ij] = A.values[a_ij] * B.values[b_ij] The tensor ``b2a`` can be defined, where ``b2a[b_ij] = a_ij`` if A has the element (i, j); otherwise, it is set to -1. Then, the Hadamard product can be computed as follows: .. code:: python C.values = zeros(...) for i in range(A.nnz): if b2a[i] >= 0: C.values[i] = A.values[i] * B.values[b2a[i]] The operation can also be efficiently implemented in parallel on a GPU. To compute :math:`A\odot (CD)`, you can define a tensor ``acd`` of shape (3, m') where ``acd[0] = b2a[bcd[0]]``, ``acd[1] = bcd[1]``, and ``acd[2] = bcd[2]``, and remove columns i where ``acd[0, i] = -1``. The computation can be done as follows: .. code:: python ret.values = zeros(...) for i in range(acd.shape[1]): ret.values[acd[0, i]] += A.values[acd[0, i]] * C.values[acd[1, i]] * D.values[acd[2, i]] Like the previous operations, this can also be implemented efficiently in parallel on a GPU. Additionally, by setting ``A.values[acd[0, i]]`` to 1, A can act as a mask, ensuring that only elements existing in A are computed. The overall wrapper for these functions is :py:func:`pygho.honn.Spspmm.spspmm`, which can perform sparse-sparse matrix multiplication with precomputed indices. :py:func:`pygho.honn.Spspmm.spspmpnn` provides a more complex operator that goes beyond matrix multiplication, allowing you to implement various graph operations. It can in fact implement the following framework. .. math:: ret_{ij} = \phi(\{(A_{ij}, B_{ik}, C_{kj})|B_{ik},C_{kj} \text{ elements exist}\}) where ``phi`` is a general multiset function, which is a functional parameter of ``spspmpnn``. For example, with it, we can implement GAT on each subgraph as follows. .. code:: python self.attentionnn1 = nn.Linear(hiddim, hiddim) self.attentionnn2 = nn.Linear(hiddim, hiddim) self.attentionnn3 = nn.Linear(hiddim, hiddim) self.subggnn = NGNNConv(hiddim, hiddim, args.aggr, "SS", transfermlpparam(mlp), message_func=lambda a,b,c,tarid: scatter_softmax( self.attentionnn1(a) * b * self.attention2(c), tarid) * self.attentionnn3(c)) TuplewiseApply ^^^^^^^^^^^^^^ Both Sparse and Masked Tensors have the ``tuplewiseapply`` function. The most common usage is: .. code:: python mlp = ... X.tuplewiseapply(mlp) However, in practice, this function directly applies the values or data tensor to ``mlp``. As linear layers, non-linearities, and layer normalization all operate on the last dimension, this operation is essentially equivalent to tuplewise apply. For batchnorm, we provide a version that not affected by this problem in :py:mod:`pygho.honn.utils`. DiagonalApply ^^^^^^^^^^^^^ Both Sparse and Masked Tensors have the ``diagonalapply`` function. Unlike ``tuplewiseapply``, this function passes both data/values and a mask indicating whether the corresponding elements are on the diagonal to the input function. A common use case is: .. code:: python mlp1 = ... mlp2 = ... lambda x, diagonalmask: torch.where(diagonalmask, mlp1(x), mlp2(x)) X.diagonalapply(mlp) Here, ``mlp1`` is applied to diagonal elements, and ``mlp2`` is applied to non-diagonal elements. You can also use ``torch_geometric.nn.HeteroLinear`` for a faster implementation. Layer 2: Operators ~~~~~~~~~~~~~~~~~~ :py:mod:`pygho.honn.SpOperator` and :py:mod:`pygho.honn.MaOperator` wrap the backend for SparseTensor and MaskedTensor separately. Their APIs are unified in :py:mod:`pygho.honn.TensorOp`. The basic operators include ``OpNodeMessagePassing`` (node-level message passing), ``OpMessagePassing`` (tuple-level message passing, wrapping matrix multiplication), ``OpPooling`` (reduce high-order tensors to lower-order ones by sum, mean, max), ``OpDiag`` (reduce high-order tensors to lower-order ones by extracting diagonal elements), and ``OpUnpooling`` (extend lower-order tensors to higher-order ones). Special cases are also defined. Sparse OpMessagePassing ^^^^^^^^^^^^^^^^^^^^^^^ As described in Layer 1, the ``OpMessagePassing`` operator wraps the properties of Spspmm and is defined with parameters like ``op0``, ``op1``, ``dim1``, ``op2``, ``dim2``, and ``aggr``. It retrieves precomputed data from a data dictionary during the forward process using keys like ``f"{op0}___{op1}___{dim1}___{op2}___{dim2}"``. Here's the forward method signature: .. code:: python def forward(self, A: SparseTensor, B: SparseTensor, datadict: Dict, tarX: Optional[SparseTensor] = None) -> SparseTensor: In this signature, ``tarX`` corresponds to ``op0``, providing the target indices, while ``A`` and ``B`` correspond to ``op1`` and ``op2``. The ``datadict`` can be obtained from the data loader using ``for batch in dataloader: batch.to_dict()``. Example ------- To illustrate how these operators work, let's use NGNN as an example. Although our operators can handle batched data, for simplicity, we'll focus on the single-graph case. Let H represent the representation matrix in :math:`\mathbb{R}^{n\times n\times d}`, and A denote the adjacency matrix in :math:`\mathbb{R}^{n\times n}`. The Graph Isomorphism Network (GIN) operation on all subgraphs can be defined as: .. math:: h_{ij} \leftarrow \sum_{k\in N_i(j)} \text{MLP}(h_{ik}) This operation can be represented using two steps: 1. Apply the MLP function to each tuple's representation: .. code:: python X' = X.tuplewiseapply(MLP) 2. Perform matrix multiplication to sum over neighbors: .. code:: python X = X' * A^T In the matrix multiplication step, batching is applied to the last dimension of X. This conversion may seem straightforward, but there are several key points to consider: - Optimization for induced subgraph input: The original equation involves a sum over neighbors in the subgraph, but the matrix multiplication version includes neighbors from the entire graph. However, our implementation optimizes for induced subgraph cases, where neighbors outside the subgraph are automatically handled by setting their values to zero. - Optimization for sparse output: The operation X' \* A^T may produce non-zero elements for pairs (i, j) that do not exist in the subgraph. For sparse input tensors X and A, we optimize the multiplication to avoid computing such non-existent elements. Pooling processes can also be considered as a reduction of :math:`X`. For instance: .. math:: h_i=\sum_{j\in V_i}\text{MLP}_2(h_{ij}) can be implemented as follows: :: # definition self.pool = OpPoolingSubg2D(...) ... # forward Xn = self.pool(X.tuplewiseapply(MLP_1)) This example demonstrate how our library's operators can be used to efficiently implement various HOGNNs.