pygho.hodata package

Submodules

pygho.hodata.MaData module

utilities for dense high order data

Bases: Data

a data class for dense high order graph data.

pygho.hodata.MaData.batch2dense(batch: Batch, batch_size: int | None = None, max_num_nodes: int | None = None, denseadj: bool = False, keys: List[str] = ['']) → Batch[source]

A main wrapper for converting and padding data in a batch object to dense forms.

Args:

batch (PygBatch): The input batch object.
batch_size (int): Batch size.
max_num_nodes (int): Maximum number of nodes in the graph.
denseadj (bool): Whether to convert adjacency to dense or sparse.
keys (List[str]): List of keys for additional attributes.

Returns:

PygBatch: The processed batch object.

pygho.hodata.MaData.ma_datapreprocess(data: Data, tuplesamplers: List[Callable[[Data], Tuple[Tensor, List[int]]]], annotate: List[str] = ['']) → MaHoData[source]

A wrapper for preprocessing dense data.

Args:

data (PygData): Input data object.
tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, List[int]]], List[Callable[[PygData], Tuple[Tensor, List[int]]]]]): Tuple samplers for extracting data.
annotate (List[str]): List of annotation strings.

Returns:

MaHoData: Preprocessed data object.

pygho.hodata.MaData.to_dense_adj(edge_index: LongTensor, edge_batch: LongTensor, edge_attr: Tensor | None = None, max_num_nodes: int | None = None, batch_size: int | None = None, filled_value: float = 0) → MaskedTensor[source]

Convert sparse adjacency to dense matrix.

Args:

edge_index (LongTensor): Coalesced edge indices of shape (2, nnz).
edge_batch (LongTensor): Batch assignments of shape (nnz).
edge_attr (Optional[Tensor]): Edge attributes of shape (nnz, *).
max_num_nodes (Optional[int]): Maximum number of nodes in the graph.
batch_size (Optional[int]): Batch size.
filled_value (float): Value to fill in the dense matrix.

Returns:

MaskedTensor: A masked dense tensor.

pygho.hodata.MaData.to_dense_tuplefeat(tuplefeat: Tensor, tupleshape: LongTensor, tuplefeatptr: LongTensor, max_tupleshape: LongTensor | None = None, batch_size: int | None = None, feat2mask: Callable[[Tensor], BoolTensor] | None = None) → MaskedTensor[source]

Convert tuple features of different subgraphs to a dense matrix.

Args:

tuplefeat (Tensor): Tuple features. (total number of tuples in batch,*denseshapeshape)
tupleshape (LongTensor): Shape of tuple features.
tuplefeatptr (LongTensor): Pointer to tuple features. tuplefeat[tuplefeatptr[i]:tuplefeatptr[i+1]] represents the tuple feature for subgraph i
max_tupleshape (Optional[LongTensor]): Maximum shape of tuple features.
batch_size (Optional[int]): Batch size.
feat2mask (Callable[[Tensor], BoolTensor]): Function to generate masks for tuple features.

Returns:

MaskedTensor: A masked dense tensor. of shape (b, n1, n2,..,*denseshapeshape), whose ret[i] is of subgraph i. (n1, n2,…) is the maximum sizes of the tuplefeat of subgraphs.

To align tuple features of different sizes, padding is applied.

pygho.hodata.MaData.to_dense_x(nodeX: Tensor, Xptr: LongTensor, max_num_nodes: int | None = None, batch_size: int | None = None, filled_value: float = 0) → MaskedTensor[source]

Convert node features of different subgraphs to a dense matrix.

Args:

nodeX (Tensor): Node features. of shape (sum of number of nodes in a batch,*denseshapeshape).
Xptr (LongTensor): Pointer to subgraphs. nodeX[Xptr[i]:Xptr[i+1]] represents the node feature for subgraph i
max_num_nodes (Optional[int]): Maximum number of nodes in a subgraph.
batch_size (Optional[int]): Batch size.
filled_value (float): Value to fill in the dense matrix.

Returns:

MaskedTensor: A masked dense tensor. of shape (b, n,*denseshapeshape).

To align graphs of different sizes, padding is applied.

pygho.hodata.MaData.to_sparse_adj(edge_index: LongTensor, edge_batch: LongTensor, edge_attr: Tensor | None = None, max_num_nodes: int | None = None, batch_size: int | None = None) → SparseTensor[source]

Convert sparse edge_index and edge_attr to a SparseTensor.

Args:

edge_index (LongTensor): Coalesced edge indices of shape (2, nnz).
edge_batch (LongTensor): Batch assignments of shape (nnz).
edge_attr (Optional[Tensor]): Edge attributes of shape (nnz, *).
max_num_nodes (Optional[int]): Maximum number of nodes in the graph.
batch_size (Optional[int]): Batch size.

Returns:

SparseTensor: A sparse tensor representation.

pygho.hodata.MaTupleSampler module

pygho.hodata.MaTupleSampler.rdsampler(data: Data) → Tuple[Tensor, List[int]][source]

compute resistance distance between nodes.

Args:

data (PygData): The input PyG dataset.
hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.

Returns:

Tensor: the precomputed tuple features.
List[int]: the masked shape of the features.

pygho.hodata.MaTupleSampler.spdsampler(data: Data, hop: int = 2) → Tuple[Tensor, List[int]][source]

sample k-hop subgraph on a given PyG graph.

Args:

data (PygData): The input PyG dataset.
hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.

Returns:

Tensor: the precomputed tuple features.
List[int]: the masked shape of the features.

pygho.hodata.ParallelPreprocess module

class pygho.hodata.ParallelPreprocess.ParallelPreprocessDataset(root: str, data_list: Iterable[Data], pre_transform: Callable[[Data], Data], num_worker: int, processedname: str | None = None, transform: Callable[[Data], Data] | None = None)[source]

Bases: InMemoryDataset

Parallelly transform a PyG dataset.

This dataset class allows parallel preprocessing of a list of PyGData or PyGDataset instances.

Args:

root (str): The directory to save processed data.
data_list (Iterable[PygData]): A list of PygData or PygDataset instances.
pre_transform (Callable[[PygData], PygData]): A function that maps PygData to PygData. It is executed only once for all data and is typically a tuple sampler.
num_worker (int): The number of processes for parallel preprocessing. It can be set to the number of available CPU cores.
processedname (Optional[str]): The name to save the processed data. If None, the name will be a hash of the pre_transform function.
transform (Optional[Callable[[PygData], PygData]]): A function to dynamically transform data during data loading.

process()[source]: Processes the dataset to the self.processed_dir folder.

property processed_dir: str

property processed_file_names: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

pygho.hodata.SpData module

utilities for sparse high order data

Bases: Data

A data class for sparse high order graph data.

pygho.hodata.SpData.batch2sparse(batch: Batch, keys: List[str] = ['']) → Batch[source]

A main wrapper for converting data in a batch object to SparseTensor.

Args:

batch (PygBatch): The batch object containing graph data.
keys (List[str]): The list of keys to convert to SparseTensor.

Returns:

PygBatch: The batch object with converted data.

pygho.hodata.SpData.parsekey(key: str) → Tuple[str, str, int, str, int][source]

Parse the operators in precomputation keys.

Args:

key (str): The precomputation key.

Returns:

Tuple[str, str, int, str, int]: A tuple containing parsed operators and dimensions.

pygho.hodata.SpData.parseop(op: str)[source]

Get the increment for a tensor when combining graphs.

Args:

op (str): The operator string.

Returns:

str or NotImplementedError: The increment information or NotImplementedError if the operator is not implemented.

pygho.hodata.SpData.sp_datapreprocess(data: Data, tuplesamplers: List[Callable[[Data], SparseTensor]], annotate: List[str] = [''], keys: List[str] = ['']) → SpHoData[source]

A wrapper for preprocessing dense data for sparse high order graphs.

Args:

data (PygData): The input dense data in PyG Data format.
tuplesamplers (Union[Callable, List[Callable]]): A single or list of tuple sampling functions.
annotate (List[str]): A list of annotation strings for tuple sampling.
keys (List[str]): A list of precomputation keys.

Returns:

SpHoData: The preprocessed sparse high order data in SpHoData format.

pygho.hodata.SpTupleSampler module

pygho.hodata.SpTupleSampler.I2Sampler(data: Data, hop: int = 3) → SparseTensor[source]

Perform subgraph sampling on a given graph for I2GNN.

Args:

data (PygData): The input PyG dataset.
hop (int, optional): The number of hops for subgraph sampling. Defaults to 3.

Returns:

SparseTensor for the precomputed tuple features.

pygho.hodata.SpTupleSampler.KhopSampler(data: Data, hop: int = 2) → SparseTensor[source]

sample k-hop subgraph on a given PyG graph.

Args:

data (PygData): The input PyG dataset.
hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.

Returns:

SparseTensor for the precomputed tuple features.

pygho.hodata.SpTupleSampler.k_hop_subgraph(node_idx: int | List[int] | LongTensor, num_hops: int, edge_index: LongTensor, relabel_nodes: bool = False, num_nodes: int | None = None, flow: str = 'source_to_target', directed: bool = False) → Tuple[Tensor, Tensor, Tensor, Tensor, Tensor][source]

Compute the k-hop subgraph around a set of nodes in an edge list.

Args:

node_idx (Union[int, List[int], LongTensor]): The root node(s) for the subgraph.
num_hops (int): The number of hops for the subgraph.
edge_index (LongTensor): The edge indices of the graph.
relabel_nodes (bool, optional): Whether to relabel node indices. Defaults to False.
num_nodes (Optional[int], optional): The total number of nodes. Defaults to None.
flow (str, optional): The direction of traversal (‘source_to_target’ or ‘target_to_source’). Defaults to ‘source_to_target’.
directed (bool, optional): Whether the graph is directed. Defaults to False.

Returns:

Tuple[Tensor, Tensor, Tensor, Tensor]: A tuple containing:

subset (Tensor): The node indices in the subgraph.

edge_index (Tensor): The edge indices of the subgraph.

inv (Tensor): The inverse mapping of node indices in the original graph to the subgraph.

edge_mask (Tensor): A mask indicating which edges are part of the subgraph.

dist (Tensor): A distance of each node to the root node.

pygho.hodata.Wrapper module

class pygho.hodata.Wrapper.IterWrapper(iterator: Iterable, batch_transform: Callable, device)[source]

Bases: object

A wrapper for the iterator of a data loader.

class pygho.hodata.Wrapper.MaDataloader(dataset: Dataset | Sequence[BaseData] | DatasetAdapter, batch_size: int = 1, shuffle: bool = False, follow_batch: List[str] | None = None, exclude_keys: List[str] | None = None, device=None, denseadj: bool = True, **kwargs)[source]

Bases: DataLoader

A data loader for sparse data that converts the inner data format to MaskedTensor.

Args:

dataset (Dataset | Sequence[BaseData] | DatasetAdapter): The input dataset or data sequence.
device (optional): The device to place the data on. Defaults to None.
denseadj (bool, optional): Whether to use dense adjacency. Defaults to True.
other kwargs: Additional keyword arguments for DataLoader. Same as Pyg dataloader

pygho.hodata.Wrapper.Mapretransform(tuplesamplers: List[Callable[[Data], MaskedTensor]] | Callable[[Data], MaskedTensor], annotate: List[str] = [''])[source]

Create a data pre-transformation function for dense data.

Args:

tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, List[int]]], List[Callable[[PygData], Tuple[Tensor, List[int]]]]]): A tuple sampler or a list of tuple samplers.
annotate (List[str], optional): A list of annotations. Defaults to [“”].

Returns:

Callable: A data pre-transformation function.

class pygho.hodata.Wrapper.SpDataloader(dataset: Dataset | Sequence[BaseData] | DatasetAdapter, batch_size: int = 1, shuffle: bool = False, follow_batch: List[str] | None = None, exclude_keys: List[str] | None = None, device=None, **kwargs)[source]

Bases: DataLoader

A data loader for sparse data that converts the inner data format to SparseTensor.

Args:

dataset (Dataset | Sequence[BaseData] | DatasetAdapter): The input dataset or data sequence.
device (optional): The device to place the data on. Defaults to None.
**kwargs: Additional keyword arguments for DataLoader. Same as Pyg Dataloader.

pygho.hodata.Wrapper.Sppretransform(tuplesamplers: List[Callable[[Data], SparseTensor]] | Callable[[Data], SparseTensor], annotate: List[str] = [''], keys: List[str] = [''])[source]

Create a data pre-transformation function for sparse data.

Args:

tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, Tensor, Union[List[int], int]]], List[Callable[[PygData], Tuple[Tensor, Tensor, Union[List[int], int]]]]]): A tuple sampler or a list of tuple samplers.
annotate (List[str], optional): A list of annotations. Defaults to [“”].
keys (List[str], optional): A list of keys. Defaults to [“”].

Returns:

Callable: A data pre-transformation function.

pygho.hodata package

Submodules

pygho.hodata.MaData module

pygho.hodata.MaTupleSampler module

pygho.hodata.ParallelPreprocess module

pygho.hodata.SpData module

pygho.hodata.SpTupleSampler module

pygho.hodata.Wrapper module

Module contents