pygho.hodata package
Submodules
pygho.hodata.MaData module
utilities for dense high order data
- class pygho.hodata.MaData.MaHoData(x: Tensor | None = None, edge_index: Tensor | None = None, edge_attr: Tensor | None = None, y: Tensor | None = None, pos: Tensor | None = None, **kwargs)[source]
Bases:
Data
a data class for dense high order graph data.
- pygho.hodata.MaData.batch2dense(batch: Batch, batch_size: int | None = None, max_num_nodes: int | None = None, denseadj: bool = False, keys: List[str] = ['']) Batch [source]
A main wrapper for converting and padding data in a batch object to dense forms.
Args:
batch (PygBatch): The input batch object.
batch_size (int): Batch size.
max_num_nodes (int): Maximum number of nodes in the graph.
denseadj (bool): Whether to convert adjacency to dense or sparse.
keys (List[str]): List of keys for additional attributes.
Returns:
PygBatch: The processed batch object.
- pygho.hodata.MaData.ma_datapreprocess(data: Data, tuplesamplers: List[Callable[[Data], Tuple[Tensor, List[int]]]], annotate: List[str] = ['']) MaHoData [source]
A wrapper for preprocessing dense data.
Args:
data (PygData): Input data object.
tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, List[int]]], List[Callable[[PygData], Tuple[Tensor, List[int]]]]]): Tuple samplers for extracting data.
annotate (List[str]): List of annotation strings.
Returns:
MaHoData: Preprocessed data object.
- pygho.hodata.MaData.to_dense_adj(edge_index: LongTensor, edge_batch: LongTensor, edge_attr: Tensor | None = None, max_num_nodes: int | None = None, batch_size: int | None = None, filled_value: float = 0) MaskedTensor [source]
Convert sparse adjacency to dense matrix.
Args:
edge_index (LongTensor): Coalesced edge indices of shape (2, nnz).
edge_batch (LongTensor): Batch assignments of shape (nnz).
edge_attr (Optional[Tensor]): Edge attributes of shape (nnz, *).
max_num_nodes (Optional[int]): Maximum number of nodes in the graph.
batch_size (Optional[int]): Batch size.
filled_value (float): Value to fill in the dense matrix.
Returns:
MaskedTensor: A masked dense tensor.
- pygho.hodata.MaData.to_dense_tuplefeat(tuplefeat: Tensor, tupleshape: LongTensor, tuplefeatptr: LongTensor, max_tupleshape: LongTensor | None = None, batch_size: int | None = None, feat2mask: Callable[[Tensor], BoolTensor] | None = None) MaskedTensor [source]
Convert tuple features of different subgraphs to a dense matrix.
Args:
tuplefeat (Tensor): Tuple features. (total number of tuples in batch,*denseshapeshape)
tupleshape (LongTensor): Shape of tuple features.
tuplefeatptr (LongTensor): Pointer to tuple features. tuplefeat[tuplefeatptr[i]:tuplefeatptr[i+1]] represents the tuple feature for subgraph i
max_tupleshape (Optional[LongTensor]): Maximum shape of tuple features.
batch_size (Optional[int]): Batch size.
feat2mask (Callable[[Tensor], BoolTensor]): Function to generate masks for tuple features.
Returns:
MaskedTensor: A masked dense tensor. of shape (b, n1, n2,..,*denseshapeshape), whose ret[i] is of subgraph i. (n1, n2,…) is the maximum sizes of the tuplefeat of subgraphs.
To align tuple features of different sizes, padding is applied.
- pygho.hodata.MaData.to_dense_x(nodeX: Tensor, Xptr: LongTensor, max_num_nodes: int | None = None, batch_size: int | None = None, filled_value: float = 0) MaskedTensor [source]
Convert node features of different subgraphs to a dense matrix.
Args:
nodeX (Tensor): Node features. of shape (sum of number of nodes in a batch,*denseshapeshape).
Xptr (LongTensor): Pointer to subgraphs. nodeX[Xptr[i]:Xptr[i+1]] represents the node feature for subgraph i
max_num_nodes (Optional[int]): Maximum number of nodes in a subgraph.
batch_size (Optional[int]): Batch size.
filled_value (float): Value to fill in the dense matrix.
Returns:
MaskedTensor: A masked dense tensor. of shape (b, n,*denseshapeshape).
To align graphs of different sizes, padding is applied.
- pygho.hodata.MaData.to_sparse_adj(edge_index: LongTensor, edge_batch: LongTensor, edge_attr: Tensor | None = None, max_num_nodes: int | None = None, batch_size: int | None = None) SparseTensor [source]
Convert sparse edge_index and edge_attr to a SparseTensor.
Args:
edge_index (LongTensor): Coalesced edge indices of shape (2, nnz).
edge_batch (LongTensor): Batch assignments of shape (nnz).
edge_attr (Optional[Tensor]): Edge attributes of shape (nnz, *).
max_num_nodes (Optional[int]): Maximum number of nodes in the graph.
batch_size (Optional[int]): Batch size.
Returns:
SparseTensor: A sparse tensor representation.
pygho.hodata.MaTupleSampler module
- pygho.hodata.MaTupleSampler.rdsampler(data: Data) Tuple[Tensor, List[int]] [source]
compute resistance distance between nodes.
Args:
data (PygData): The input PyG dataset.
hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.
Returns:
Tensor: the precomputed tuple features.
List[int]: the masked shape of the features.
- pygho.hodata.MaTupleSampler.spdsampler(data: Data, hop: int = 2) Tuple[Tensor, List[int]] [source]
sample k-hop subgraph on a given PyG graph.
Args:
data (PygData): The input PyG dataset.
hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.
Returns:
Tensor: the precomputed tuple features.
List[int]: the masked shape of the features.
pygho.hodata.ParallelPreprocess module
- class pygho.hodata.ParallelPreprocess.ParallelPreprocessDataset(root: str, data_list: Iterable[Data], pre_transform: Callable[[Data], Data], num_worker: int, processedname: str | None = None, transform: Callable[[Data], Data] | None = None)[source]
Bases:
InMemoryDataset
Parallelly transform a PyG dataset.
This dataset class allows parallel preprocessing of a list of PyGData or PyGDataset instances.
Args:
root (str): The directory to save processed data.
data_list (Iterable[PygData]): A list of PygData or PygDataset instances.
pre_transform (Callable[[PygData], PygData]): A function that maps PygData to PygData. It is executed only once for all data and is typically a tuple sampler.
num_worker (int): The number of processes for parallel preprocessing. It can be set to the number of available CPU cores.
processedname (Optional[str]): The name to save the processed data. If None, the name will be a hash of the pre_transform function.
transform (Optional[Callable[[PygData], PygData]]): A function to dynamically transform data during data loading.
- property processed_dir: str
- property processed_file_names
The name of the files in the
self.processed_dir
folder that must be present in order to skip processing.
pygho.hodata.SpData module
utilities for sparse high order data
- class pygho.hodata.SpData.SpHoData(x: Tensor | None = None, edge_index: Tensor | None = None, edge_attr: Tensor | None = None, y: Tensor | None = None, pos: Tensor | None = None, **kwargs)[source]
Bases:
Data
A data class for sparse high order graph data.
- pygho.hodata.SpData.batch2sparse(batch: Batch, keys: List[str] = ['']) Batch [source]
A main wrapper for converting data in a batch object to SparseTensor.
Args:
batch (PygBatch): The batch object containing graph data.
keys (List[str]): The list of keys to convert to SparseTensor.
Returns:
PygBatch: The batch object with converted data.
- pygho.hodata.SpData.parsekey(key: str) Tuple[str, str, int, str, int] [source]
Parse the operators in precomputation keys.
Args:
key (str): The precomputation key.
Returns:
Tuple[str, str, int, str, int]: A tuple containing parsed operators and dimensions.
- pygho.hodata.SpData.parseop(op: str)[source]
Get the increment for a tensor when combining graphs.
Args:
op (str): The operator string.
Returns:
str or NotImplementedError: The increment information or NotImplementedError if the operator is not implemented.
- pygho.hodata.SpData.sp_datapreprocess(data: Data, tuplesamplers: List[Callable[[Data], SparseTensor]], annotate: List[str] = [''], keys: List[str] = ['']) SpHoData [source]
A wrapper for preprocessing dense data for sparse high order graphs.
Args:
data (PygData): The input dense data in PyG Data format.
tuplesamplers (Union[Callable, List[Callable]]): A single or list of tuple sampling functions.
annotate (List[str]): A list of annotation strings for tuple sampling.
keys (List[str]): A list of precomputation keys.
Returns:
SpHoData: The preprocessed sparse high order data in SpHoData format.
pygho.hodata.SpTupleSampler module
- pygho.hodata.SpTupleSampler.I2Sampler(data: Data, hop: int = 3) SparseTensor [source]
Perform subgraph sampling on a given graph for I2GNN.
Args:
data (PygData): The input PyG dataset.
hop (int, optional): The number of hops for subgraph sampling. Defaults to 3.
Returns:
SparseTensor for the precomputed tuple features.
- pygho.hodata.SpTupleSampler.KhopSampler(data: Data, hop: int = 2) SparseTensor [source]
sample k-hop subgraph on a given PyG graph.
Args:
data (PygData): The input PyG dataset.
hop (int, optional): The number of hops for subgraph sampling. Defaults to 2.
Returns:
SparseTensor for the precomputed tuple features.
- pygho.hodata.SpTupleSampler.k_hop_subgraph(node_idx: int | List[int] | LongTensor, num_hops: int, edge_index: LongTensor, relabel_nodes: bool = False, num_nodes: int | None = None, flow: str = 'source_to_target', directed: bool = False) Tuple[Tensor, Tensor, Tensor, Tensor, Tensor] [source]
Compute the k-hop subgraph around a set of nodes in an edge list.
Args:
node_idx (Union[int, List[int], LongTensor]): The root node(s) for the subgraph.
num_hops (int): The number of hops for the subgraph.
edge_index (LongTensor): The edge indices of the graph.
relabel_nodes (bool, optional): Whether to relabel node indices. Defaults to False.
num_nodes (Optional[int], optional): The total number of nodes. Defaults to None.
flow (str, optional): The direction of traversal (‘source_to_target’ or ‘target_to_source’). Defaults to ‘source_to_target’.
directed (bool, optional): Whether the graph is directed. Defaults to False.
Returns:
- Tuple[Tensor, Tensor, Tensor, Tensor]: A tuple containing:
subset (Tensor): The node indices in the subgraph.
edge_index (Tensor): The edge indices of the subgraph.
inv (Tensor): The inverse mapping of node indices in the original graph to the subgraph.
edge_mask (Tensor): A mask indicating which edges are part of the subgraph.
dist (Tensor): A distance of each node to the root node.
pygho.hodata.Wrapper module
- class pygho.hodata.Wrapper.IterWrapper(iterator: Iterable, batch_transform: Callable, device)[source]
Bases:
object
A wrapper for the iterator of a data loader.
- class pygho.hodata.Wrapper.MaDataloader(dataset: Dataset | Sequence[BaseData] | DatasetAdapter, batch_size: int = 1, shuffle: bool = False, follow_batch: List[str] | None = None, exclude_keys: List[str] | None = None, device=None, denseadj: bool = True, **kwargs)[source]
Bases:
DataLoader
A data loader for sparse data that converts the inner data format to MaskedTensor.
Args:
dataset (Dataset | Sequence[BaseData] | DatasetAdapter): The input dataset or data sequence.
device (optional): The device to place the data on. Defaults to None.
denseadj (bool, optional): Whether to use dense adjacency. Defaults to True.
other kwargs: Additional keyword arguments for DataLoader. Same as Pyg dataloader
- pygho.hodata.Wrapper.Mapretransform(tuplesamplers: List[Callable[[Data], MaskedTensor]] | Callable[[Data], MaskedTensor], annotate: List[str] = [''])[source]
Create a data pre-transformation function for dense data.
Args:
tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, List[int]]], List[Callable[[PygData], Tuple[Tensor, List[int]]]]]): A tuple sampler or a list of tuple samplers.
annotate (List[str], optional): A list of annotations. Defaults to [“”].
Returns:
Callable: A data pre-transformation function.
- class pygho.hodata.Wrapper.SpDataloader(dataset: Dataset | Sequence[BaseData] | DatasetAdapter, batch_size: int = 1, shuffle: bool = False, follow_batch: List[str] | None = None, exclude_keys: List[str] | None = None, device=None, **kwargs)[source]
Bases:
DataLoader
A data loader for sparse data that converts the inner data format to SparseTensor.
Args:
dataset (Dataset | Sequence[BaseData] | DatasetAdapter): The input dataset or data sequence.
device (optional): The device to place the data on. Defaults to None.
**kwargs: Additional keyword arguments for DataLoader. Same as Pyg Dataloader.
- pygho.hodata.Wrapper.Sppretransform(tuplesamplers: List[Callable[[Data], SparseTensor]] | Callable[[Data], SparseTensor], annotate: List[str] = [''], keys: List[str] = [''])[source]
Create a data pre-transformation function for sparse data.
Args:
tuplesamplers (Union[Callable[[PygData], Tuple[Tensor, Tensor, Union[List[int], int]]], List[Callable[[PygData], Tuple[Tensor, Tensor, Union[List[int], int]]]]]): A tuple sampler or a list of tuple samplers.
annotate (List[str], optional): A list of annotations. Defaults to [“”].
keys (List[str], optional): A list of keys. Defaults to [“”].
Returns:
Callable: A data pre-transformation function.