pygho.hodata package

Submodules

pygho.hodata.MaData module

utilities for dense high order data

class pygho.hodata.MaData.MaHoData(x: Tensor | None = None, edge_index: Tensor | None = None, edge_attr: Tensor | None = None, y: Tensor | None = None, pos: Tensor | None = None, **kwargs)[source]

Bases: Data

a data class for dense high order graph data.

pygho.hodata.MaData.batch2dense(batch: Batch, batch_size: int | None = None, max_num_nodes: int | None = None, denseadj: bool = False, keys: List[str] = ['']) Batch[source]

A main wrapper for converting and padding data in a batch object to dense forms.

Args:

  • batch (PygBatch): The input batch object.

  • batch_size (int): Batch size.

  • max_num_nodes (int): Maximum number of nodes in the graph.

  • denseadj (bool): Whether to convert adjacency to dense or sparse.

  • keys (List[str]): List of keys for additional attributes.

Returns:

  • PygBatch: The processed batch object.

pygho.hodata.MaData.ma_datapreprocess(data: Data, tuplesamplers: List[Callable[[Data], Tuple[Tensor, List[int]]]], annotate: List[str] = ['']) MaHoData[source]

A wrapper for preprocessing dense data.

Args:

  • data (PygData): Input data object.

  • tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, List[int]]], List[Callable[[PygData], Tuple[Tensor, List[int]]]]]): Tuple samplers for extracting data.

  • annotate (List[str]): List of annotation strings.

Returns:

  • MaHoData: Preprocessed data object.

pygho.hodata.MaData.to_dense_adj(edge_index: LongTensor, edge_batch: LongTensor, edge_attr: Tensor | None = None, max_num_nodes: int | None = None, batch_size: int | None = None, filled_value: float = 0) MaskedTensor[source]

Convert sparse adjacency to dense matrix.

Args:

  • edge_index (LongTensor): Coalesced edge indices of shape (2, nnz).

  • edge_batch (LongTensor): Batch assignments of shape (nnz).

  • edge_attr (Optional[Tensor]): Edge attributes of shape (nnz, *).

  • max_num_nodes (Optional[int]): Maximum number of nodes in the graph.

  • batch_size (Optional[int]): Batch size.

  • filled_value (float): Value to fill in the dense matrix.

Returns:

  • MaskedTensor: A masked dense tensor.

pygho.hodata.MaData.to_dense_tuplefeat(tuplefeat: Tensor, tupleshape: LongTensor, tuplefeatptr: LongTensor, max_tupleshape: LongTensor | None = None, batch_size: int | None = None, feat2mask: Callable[[Tensor], BoolTensor] | None = None) MaskedTensor[source]

Convert tuple features of different subgraphs to a dense matrix.

Args:

  • tuplefeat (Tensor): Tuple features. (total number of tuples in batch,*denseshapeshape)

  • tupleshape (LongTensor): Shape of tuple features.

  • tuplefeatptr (LongTensor): Pointer to tuple features. tuplefeat[tuplefeatptr[i]:tuplefeatptr[i+1]] represents the tuple feature for subgraph i

  • max_tupleshape (Optional[LongTensor]): Maximum shape of tuple features.

  • batch_size (Optional[int]): Batch size.

  • feat2mask (Callable[[Tensor], BoolTensor]): Function to generate masks for tuple features.

Returns:

  • MaskedTensor: A masked dense tensor. of shape (b, n1, n2,..,*denseshapeshape), whose ret[i] is of subgraph i. (n1, n2,…) is the maximum sizes of the tuplefeat of subgraphs.

To align tuple features of different sizes, padding is applied.

pygho.hodata.MaData.to_dense_x(nodeX: Tensor, Xptr: LongTensor, max_num_nodes: int | None = None, batch_size: int | None = None, filled_value: float = 0) MaskedTensor[source]

Convert node features of different subgraphs to a dense matrix.

Args:

  • nodeX (Tensor): Node features. of shape (sum of number of nodes in a batch,*denseshapeshape).

  • Xptr (LongTensor): Pointer to subgraphs. nodeX[Xptr[i]:Xptr[i+1]] represents the node feature for subgraph i

  • max_num_nodes (Optional[int]): Maximum number of nodes in a subgraph.

  • batch_size (Optional[int]): Batch size.

  • filled_value (float): Value to fill in the dense matrix.

Returns:

  • MaskedTensor: A masked dense tensor. of shape (b, n,*denseshapeshape).

To align graphs of different sizes, padding is applied.

pygho.hodata.MaData.to_sparse_adj(edge_index: LongTensor, edge_batch: LongTensor, edge_attr: Tensor | None = None, max_num_nodes: int | None = None, batch_size: int | None = None) SparseTensor[source]

Convert sparse edge_index and edge_attr to a SparseTensor.

Args:

  • edge_index (LongTensor): Coalesced edge indices of shape (2, nnz).

  • edge_batch (LongTensor): Batch assignments of shape (nnz).

  • edge_attr (Optional[Tensor]): Edge attributes of shape (nnz, *).

  • max_num_nodes (Optional[int]): Maximum number of nodes in the graph.

  • batch_size (Optional[int]): Batch size.

Returns:

  • SparseTensor: A sparse tensor representation.

pygho.hodata.MaTupleSampler module

pygho.hodata.MaTupleSampler.rdsampler(data: Data) Tuple[Tensor, List[int]][source]

compute resistance distance between nodes.

Args:

  • data (PygData): The input PyG dataset.

  • hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.

Returns:

  • Tensor: the precomputed tuple features.

  • List[int]: the masked shape of the features.

pygho.hodata.MaTupleSampler.spdsampler(data: Data, hop: int = 2) Tuple[Tensor, List[int]][source]

sample k-hop subgraph on a given PyG graph.

Args:

  • data (PygData): The input PyG dataset.

  • hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.

Returns:

  • Tensor: the precomputed tuple features.

  • List[int]: the masked shape of the features.

pygho.hodata.ParallelPreprocess module

class pygho.hodata.ParallelPreprocess.ParallelPreprocessDataset(root: str, data_list: Iterable[Data], pre_transform: Callable[[Data], Data], num_worker: int, processedname: str | None = None, transform: Callable[[Data], Data] | None = None)[source]

Bases: InMemoryDataset

Parallelly transform a PyG dataset.

This dataset class allows parallel preprocessing of a list of PyGData or PyGDataset instances.

Args:

  • root (str): The directory to save processed data.

  • data_list (Iterable[PygData]): A list of PygData or PygDataset instances.

  • pre_transform (Callable[[PygData], PygData]): A function that maps PygData to PygData. It is executed only once for all data and is typically a tuple sampler.

  • num_worker (int): The number of processes for parallel preprocessing. It can be set to the number of available CPU cores.

  • processedname (Optional[str]): The name to save the processed data. If None, the name will be a hash of the pre_transform function.

  • transform (Optional[Callable[[PygData], PygData]]): A function to dynamically transform data during data loading.

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_dir: str
property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

pygho.hodata.SpData module

utilities for sparse high order data

class pygho.hodata.SpData.SpHoData(x: Tensor | None = None, edge_index: Tensor | None = None, edge_attr: Tensor | None = None, y: Tensor | None = None, pos: Tensor | None = None, **kwargs)[source]

Bases: Data

A data class for sparse high order graph data.

pygho.hodata.SpData.batch2sparse(batch: Batch, keys: List[str] = ['']) Batch[source]

A main wrapper for converting data in a batch object to SparseTensor.

Args:

  • batch (PygBatch): The batch object containing graph data.

  • keys (List[str]): The list of keys to convert to SparseTensor.

Returns:

  • PygBatch: The batch object with converted data.

pygho.hodata.SpData.parsekey(key: str) Tuple[str, str, int, str, int][source]

Parse the operators in precomputation keys.

Args:

  • key (str): The precomputation key.

Returns:

  • Tuple[str, str, int, str, int]: A tuple containing parsed operators and dimensions.

pygho.hodata.SpData.parseop(op: str)[source]

Get the increment for a tensor when combining graphs.

Args:

  • op (str): The operator string.

Returns:

  • str or NotImplementedError: The increment information or NotImplementedError if the operator is not implemented.

pygho.hodata.SpData.sp_datapreprocess(data: Data, tuplesamplers: List[Callable[[Data], SparseTensor]], annotate: List[str] = [''], keys: List[str] = ['']) SpHoData[source]

A wrapper for preprocessing dense data for sparse high order graphs.

Args:

  • data (PygData): The input dense data in PyG Data format.

  • tuplesamplers (Union[Callable, List[Callable]]): A single or list of tuple sampling functions.

  • annotate (List[str]): A list of annotation strings for tuple sampling.

  • keys (List[str]): A list of precomputation keys.

Returns:

  • SpHoData: The preprocessed sparse high order data in SpHoData format.

pygho.hodata.SpTupleSampler module

pygho.hodata.SpTupleSampler.I2Sampler(data: Data, hop: int = 3) SparseTensor[source]

Perform subgraph sampling on a given graph for I2GNN.

Args:

  • data (PygData): The input PyG dataset.

  • hop (int, optional): The number of hops for subgraph sampling. Defaults to 3.

Returns:

SparseTensor for the precomputed tuple features.

pygho.hodata.SpTupleSampler.KhopSampler(data: Data, hop: int = 2) SparseTensor[source]

sample k-hop subgraph on a given PyG graph.

Args:

  • data (PygData): The input PyG dataset.

  • hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.

Returns:

SparseTensor for the precomputed tuple features.

pygho.hodata.SpTupleSampler.k_hop_subgraph(node_idx: int | List[int] | LongTensor, num_hops: int, edge_index: LongTensor, relabel_nodes: bool = False, num_nodes: int | None = None, flow: str = 'source_to_target', directed: bool = False) Tuple[Tensor, Tensor, Tensor, Tensor, Tensor][source]

Compute the k-hop subgraph around a set of nodes in an edge list.

Args:

  • node_idx (Union[int, List[int], LongTensor]): The root node(s) for the subgraph.

  • num_hops (int): The number of hops for the subgraph.

  • edge_index (LongTensor): The edge indices of the graph.

  • relabel_nodes (bool, optional): Whether to relabel node indices. Defaults to False.

  • num_nodes (Optional[int], optional): The total number of nodes. Defaults to None.

  • flow (str, optional): The direction of traversal (‘source_to_target’ or ‘target_to_source’). Defaults to ‘source_to_target’.

  • directed (bool, optional): Whether the graph is directed. Defaults to False.

Returns:

Tuple[Tensor, Tensor, Tensor, Tensor]: A tuple containing:
  • subset (Tensor): The node indices in the subgraph.

  • edge_index (Tensor): The edge indices of the subgraph.

  • inv (Tensor): The inverse mapping of node indices in the original graph to the subgraph.

  • edge_mask (Tensor): A mask indicating which edges are part of the subgraph.

  • dist (Tensor): A distance of each node to the root node.

pygho.hodata.Wrapper module

class pygho.hodata.Wrapper.IterWrapper(iterator: Iterable, batch_transform: Callable, device)[source]

Bases: object

A wrapper for the iterator of a data loader.

class pygho.hodata.Wrapper.MaDataloader(dataset: Dataset | Sequence[BaseData] | DatasetAdapter, batch_size: int = 1, shuffle: bool = False, follow_batch: List[str] | None = None, exclude_keys: List[str] | None = None, device=None, denseadj: bool = True, **kwargs)[source]

Bases: DataLoader

A data loader for sparse data that converts the inner data format to MaskedTensor.

Args:

  • dataset (Dataset | Sequence[BaseData] | DatasetAdapter): The input dataset or data sequence.

  • device (optional): The device to place the data on. Defaults to None.

  • denseadj (bool, optional): Whether to use dense adjacency. Defaults to True.

  • other kwargs: Additional keyword arguments for DataLoader. Same as Pyg dataloader

pygho.hodata.Wrapper.Mapretransform(tuplesamplers: List[Callable[[Data], MaskedTensor]] | Callable[[Data], MaskedTensor], annotate: List[str] = [''])[source]

Create a data pre-transformation function for dense data.

Args:

  • tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, List[int]]], List[Callable[[PygData], Tuple[Tensor, List[int]]]]]): A tuple sampler or a list of tuple samplers.

  • annotate (List[str], optional): A list of annotations. Defaults to [“”].

Returns:

  • Callable: A data pre-transformation function.

class pygho.hodata.Wrapper.SpDataloader(dataset: Dataset | Sequence[BaseData] | DatasetAdapter, batch_size: int = 1, shuffle: bool = False, follow_batch: List[str] | None = None, exclude_keys: List[str] | None = None, device=None, **kwargs)[source]

Bases: DataLoader

A data loader for sparse data that converts the inner data format to SparseTensor.

Args:

  • dataset (Dataset | Sequence[BaseData] | DatasetAdapter): The input dataset or data sequence.

  • device (optional): The device to place the data on. Defaults to None.

  • **kwargs: Additional keyword arguments for DataLoader. Same as Pyg Dataloader.

pygho.hodata.Wrapper.Sppretransform(tuplesamplers: List[Callable[[Data], SparseTensor]] | Callable[[Data], SparseTensor], annotate: List[str] = [''], keys: List[str] = [''])[source]

Create a data pre-transformation function for sparse data.

Args:

  • tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, Tensor, Union[List[int], int]]], List[Callable[[PygData], Tuple[Tensor, Tensor, Union[List[int], int]]]]]): A tuple sampler or a list of tuple samplers.

  • annotate (List[str], optional): A list of annotations. Defaults to [“”].

  • keys (List[str], optional): A list of keys. Defaults to [“”].

Returns:

  • Callable: A data pre-transformation function.

Module contents