.. _hodata-label: Efficient High Order Data Processing ==================================== In this section, we'll delve into the efficient high-order data processing capabilities provided by PyGHO, particularly focusing on the handling of high-order tensors, tuple feature precomputation, and data loading strategies for both sparse and masked tensor data structures. Adding High Order Features to PyG Dataset ----------------------------------------- HOGNNs and MPNNs share common tasks, allowing us to leverage PyTorch Geometric's (PyG) data processing routines. However, to cater to the unique requirements of HOGNNs, PyGHO significantly extends these routines while maintaining compatibility with PyG. This extension ensures convenient high-order feature precomputation and preservation. Efficient High-Order Feature Precomputation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ High-order feature precomputation can be efficiently conducted in parallel using the PyGHO library. Consider the following example: .. code:: python # Ordinary PyG dataset from torch_geometric.datasets import ZINC trn_dataset = ZINC("dataset/ZINC", subset=True, split="train") # High-order graph dataset from pygho.hodata import Sppretransform, ParallelPreprocessDataset trn_dataset = ParallelPreprocessDataset( "dataset/ZINC_trn", trn_dataset, pre_transform=Sppretransform(tuplesamplers=partial(KhopSampler, hop=3), keys=keys), num_workers=8) The ``ParallelPreprocessDataset`` class takes an ordinary PyG dataset as input and performs transformations on each graph in parallel (utilizing 8 processes in this example). The ``tuplesamplers`` parameter represents functions that take a graph as input and produce a sparse tensor. You can apply multiple samplers simultaneously, and the resulting output can be assigned specific names using the ``annotate`` parameter. In this example, we utilize ``partial(KhopSampler, hop=3)``, a sampler designed for NGNN, to sample a 3-hop ego-network rooted at each node. The shortest path distance to the root node serves as the tuple features. The produced SparseTensor is then saved and can be effectively used to initialize tuple representations. Since the dataset preprocessing routine is closely related to data structures, we have designed two separate routines for sparse and dense tensors. These routines only differ in the ``pre_transform`` function. For dense tensors, we can simply use ``Mapretransform(None, tuplesamplers)``. In this case, the ``tuplesamplers`` is a function producing dense tuple features. In :py:mod:`pygho.hodata.MaTupleSampler` We provide ``spdsampler`` and ``rdsampler`` to compute shortest path distance and resisitance distance between nodes. One example is .. code:: python trn_dataset = ParallelPreprocessDataset("dataset/ZINC_trn", trn_dataset, pre_transform=Mapretransform( partial(spdsampler, hop=4)), num_worker=0) Defining Custom Tuple Samplers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In addition to the provided tuple samplers, you can define your own tuple sampler. For sparse data, a sampler is a function or callable object that takes a ``torch_geometric.data.Data`` object as input and produces a sparse tensor as output. Here's an example of a custom sparse tuple sampler that assigns ``0`` as a feature for each tuple ``(i, i)`` in the graph: .. code:: python def SparseToySampler(data: PygData) -> SparseTensor: """ Sample k-hop subgraph on a given PyG graph. Args: - data (PygData): The input PyG data. Returns: - SparseTensor for the precomputed tuple features. """ n = data.num_nodes tupleid = torch.stack((torch.arange(n), torch.arange(n))) tuplefeat = torch.zeros((n,)) ret = SparseTensor(tupleid, tuplefeat, shape=(n, n)) return ret For dense data, a sampler is a function or callable object that takes a ``torch_geometric.data.Data`` object as input and produces a tensor along with the masked shape of the features. Here's a custom dense tuple sampler that assigns ``0`` as a feature for each tuple ``(i, i)`` in the graph: .. code:: python def DenseToySampler(data: PygData) -> Tuple[Tensor, List[int]]: """ Sample k-hop subgraph on a given PyG graph. Args: - data (PygData): The input PyG data. Returns: - Tensor: The precomputed tuple features. - List[int]: The masked shape of the features. """ n = data.num_nodes val = torch.eye(n) return val, [n, n] Please note that for dense data, the function returns a tuple consisting of the value and the masked shape, as opposed to returning a MaskedTensor. This is because the mask can typically be inferred from the feature itself, making it unnecessary to explicitly include it in the returned data. In such cases, the mask can be determined as val == 1 . Using Multiple Tuple Samplers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can use multiple tuple samplers simultaneously. For instance: .. code:: python trn_dataset = ParallelPreprocessDataset( "dataset/ZINC_trn", trn_dataset, pre_transform=Sppretransform(tuplesamplers=[partial(KhopSampler, hop=1),partial(KhopSampler, hop=2)], annotate=["1hop", "2hop"], keys=keys), num_workers=8) This code precomputes two tuple features simultaneously and assigns them different annotations, "1hop" and "2hop," to distinguish between them. For dense, it works similarly .. code:: python trn_dataset = ParallelPreprocessDataset( "dataset/ZINC_trn", trn_dataset, pre_transform=Mapretransform( [partial(spdsampler,hop=1),partial(spdsampler,hop=2)], annotate=["1hop","2hop"]), num_worker=0) Sparse-Sparse Matrix Multiplication Precomputation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Efficient Sparse-Sparse Matrix Multiplication in our library can be achieved through precomputation. The ``keys`` parameter in ``Sppretransform`` is a list of strings, where each string indicates a specific precomputation. For example, consider the key: :: "X___A___1___X___0" Here, the precomputation involves sparse matrix multiplication :math:`AX`, but only computes the output elements that exist in :math:`X`. These precomputation results can be shared among matrices with the same indices. The key elements signify the following: - The first ``X`` refers to the target sparse matrix indices. - ``A`` and ``X`` represent the matrices involved in the multiplication, the adjacency matrix ``A``, and the tuple feature ``X``. - ``1`` denotes that dimension ``1`` of ``A`` will be reduced. - ``0`` signifies that dimension ``0`` of ``X`` will be reduced. You don't need to manually feed the precomputation results to the model. Converting the batch to a dictionary and using it as the ``datadict`` parameter is sufficient: .. code:: python for batch in dataloader: datadict = batch.to_dict() Dense data does not require precomputation currently. If you use annotate in transformation, for example, :: Sppretransform(tuplesamplers=partial(KhopSampler, hop=1),annotate=["1hop"], keys=keys) Then the key can be :: "X1hop___A___1___X1hop___0" More details are shown in :ref:`multi-tensor-tutorial-label` Mini-batch and DataLoader Enabling batch training in HOGNNs demands handling graphs of varying sizes, which presents a challenge. We employ different strategies for Sparse and Masked Tensor data structures. Sparse Tensor Data ~~~~~~~~~~~~~~~~~~ For Sparse Tensor data, we adopt a relatively straightforward solution. We concatenate the tensors of each graph along the diagonal of a larger tensor. For example, in a batch of :math:`B` graphs with adjacency matrices :math:`A_i\in \mathbb{R}^{n_i\times n_i}`, node features :math:`x\in \mathbb{R}^{n_i\times d}`, and tuple features :math:`X\in \mathbb{R}^{n_i\times n_i\times d'}` for :math:`i=1,2,\ldots,B`, the features for the entire batch are represented as :math:`A\in \mathbb{R}^{n\times n}`, :math:`x\in \mathbb{R}^{n\times d}`, and :math:`X\in \mathbb{R}^{n\times n\times d'}`, where :math:`n=\sum_{i=1}^B n_i`. This arrangement allows tensors in batched data to have the same number of dimensions as those of a single graph, facilitating the sharing of common operators. We provide PygHO's own DataLoader to simplify this process: .. code:: python from pygho.subgdata import SpDataloader trn_dataloader = SpDataloader(trn_dataset, batch_size=32, shuffle=True, drop_last=True) Masked Tensor Data ~~~~~~~~~~~~~~~~~~ As concatenation along the diagonal leads to a lot of non-existing elements, handling Masked Tensor data involves a different strategy for saving space. In this case, tensors are padded to the same shape and stacked along a new axis. For instance, in a batch of :math:`B` graphs with adjacency matrices :math:`A_i\in \mathbb{R}^{n_i\times n_i}`, node features :math:`x\in \mathbb{R}^{n_i\times d}`, and tuple features :math:`X\in \mathbb{R}^{n_i\times n_i\times d'}` for :math:`i=1,2,\ldots,B`, the features for the entire batch are represented as :math:`A\in \mathbb{R}^{B\times \tilde{n}\times \tilde{n}}`, :math:`x\in \mathbb{R}^{B\times \tilde{n}\times d}`, and :math:`X\in \mathbb{R}^{B\times \tilde{n}\times \tilde{n}\times d'}`, where :math:`\tilde{n}=\max\{n_i|i=1,2,\ldots,B\}`. .. math:: A=\begin{bmatrix} \begin{pmatrix} A_1&0_{n_1,\tilde n-n_1}\\ 0_{\tilde n-n_1, n_1}&0_{n_1,n_1}\\ \end{pmatrix}\\ \begin{pmatrix} A_2&0_{n_2,\tilde n-n_2}\\ 0_{\tilde n-n_2, n_2}&0_{n_2,n_2}\\ \end{pmatrix}\\ \vdots\\ \begin{pmatrix} A_B&0_{n_B,\tilde n-n_B}\\ 0_{\tilde n-n_B, n_B}&0_{n_B,n_B}\\ \end{pmatrix}\\ \end{bmatrix} ,x=\begin{bmatrix} \begin{pmatrix} x_1\\ 0_{\tilde n-n_1, d}\\ \end{pmatrix}\\ \begin{pmatrix} x_2\\ 0_{\tilde n-n_2, d}\\ \end{pmatrix}\\ \vdots\\ \begin{pmatrix} x_B\\ 0_{\tilde n-n_B, d}\\ \end{pmatrix}\\ \end{bmatrix} ,X=\begin{bmatrix} \begin{pmatrix} X_1&0_{n_1,\tilde n-n_1}\\ 0_{\tilde n-n_1, n_1}&0_{n_1,n_1}\\ \end{pmatrix}\\ \begin{pmatrix} X_2&0_{n_2,\tilde n-n_2}\\ 0_{\tilde n-n_2, n_2}&0_{n_2,n_2}\\ \end{pmatrix}\\ \vdots\\ \begin{pmatrix} X_B&0_{n_B,\tilde n-n_B}\\ 0_{\tilde n-n_B, n_B}&0_{n_B,n_B}\\ \end{pmatrix}\\ \end{bmatrix} The 0 for padding will be masked in the result MaskedTensor. We also provide a DataLoader for this purpose: .. code:: python from pygho.subgdata import MaDataloader trn_dataloader = MaDataloader(trn_dataset, batch_size=256, device=device, shuffle=True, drop_last=True) This padding and stacking strategy ensures consistent shapes across tensors, allowing for efficient processing of dense data.