Basic Data Structure
============================

In this section, we'll provide a refined explanation of the basic data
structures, MaskedTensor and SparseTensor, used in HOGNNs to address
their unique requirements.

MaskedTensor
------------

HOGNNs demand specialized data structures to handle high-order tensors
efficiently. One such structure is the **MaskedTensor**, consisting of
two components: ``data`` and ``mask``.

-  ``data`` has a shape of
   :math:`(\text{masked shape}, \text{dense shape})`, residing in
   :math:`\mathbb{R}^{n\times n\times d}`, where :math:`n` represents
   the number of nodes, and :math:`d` is the dimensionality of the data.
-  ``mask`` has a shape of :math:`(\text{masked shape})`, containing
   Boolean values, typically :math:`\{0,1\}^{n\times n}`. The element
   :math:`(i,j)` in ``mask`` is set to :math:`1` if the tuple
   :math:`(i,j)` exists in the tensor.

Unused elements in ``data`` do not affect the output of the operators in
this library. For example, when performing operations like summation,
MaskedTensor treats the non-existing elements as :math:`0`, effectively
ignoring them.

Here's an example of creating a MaskedTensor:

.. code:: python

    from pygho import MaskedTensor
    import torch

    n, d = 3, 3
    data = torch.tensor([[4, 1, 4], [4, 4, 2], [3, 4, 4]])
    mask = torch.tensor([[0, 1, 0], [0, 0, 1], [1, 0, 0]], dtype=torch.bool)
    A = MaskedTensor(data, mask)

The non-existing elements in ``data`` can be assigned arbitrary values.
The created masked tensor is as follows

.. math::


   \begin{bmatrix}
   -&1&-\\
   -&-&2\\
   3&-&-
   \end{bmatrix}

SparseTensor
------------

On the other hand, the **SparseTensor** stores only existing elements,
making it more efficient when a small ratio of valid elements is
present. A SparseTensor, with shape ``(sparse_shape, dense_shape)``,
consists of two tensors: ``indices`` and ``values``.

-  ``indices`` is an Integer Tensor with shape ``(sparse_dim, nnz)``,
   where ``sparse_dim`` represents the number of dimensions in the
   sparse shape, and ``nnz`` stands for the count of existing elements.
-  ``values`` has a shape of ``(nnz, dense_shape)``.

The columns of ``indices`` and rows of ``values`` correspond to the
non-zero elements, simplifying retrieval and manipulation of the
required information.

For instance, in the context of NGNN's representation
:math:`H\in \mathbb{R}^{n\times n\times d}`, where the total number of
nodes in subgraphs is :math:`m`, you can represent :math:`H` using
``indices`` :math:`a\in \mathbb{N}^{2\times m}` and ``values``
:math:`v\in \mathbb{R}^{m\times d}`. Specifically, for
:math:`i=1,2,\ldots,n`, :math:`H_{a_{1,i},a_{2,i}}=v_i`.

Creating a SparseTensor is illustrated in the following example:

.. code:: python

    from pygho import SparseTensor
    import torch

    n, d = 3, 3
    indices = torch.tensor([[0, 1, 2], [1, 2, 0]], dtype=torch.long)
    values = torch.tensor([1, 2, 3])
    A = SparseTensor(indices, values, shape=(3, 3))

representing the following matrix

.. math::


   \begin{bmatrix}
   -&1&-\\
   -&-&2\\
   3&-&-
   \end{bmatrix}

Please note that in the SparseTensor format, each non-zero element is
represented by ``sparse_dim`` int64 indices and the element itself. If
the tensor is not sparse enough, SparseTensor may occupy more memory
than a dense tensor.