PyTorch Geometric Temporal Dataset

Datasets

class ChickenpoxDatasetLoader(index=False)

A dataset of county level chicken pox cases in Hungary between 2004 and 2014. We made it public during the development of PyTorch Geometric Temporal. The underlying graph is static - vertices are counties and edges are neighbourhoods. Vertex features are lagged weekly counts of the chickenpox cases (we included 4 lags). The target is the weekly number of cases for the upcoming week (signed integers). Our dataset consist of more than 500 snapshots (weeks).

Parameters:

index (bool, optional) – If True, initializes the dataloader to use index-based batching. Defaults to False.

get_dataset(lags: int = 4) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returning the Chickenpox Hungary data iterator.

Args types:
  • lags (int) - The number of time lags.

Return types:
  • dataset (StaticGraphTemporalSignal) - The Chickenpox Hungary dataset.

get_index_dataset(lags=4, batch_size=4, shuffle=False, allGPU=-1, ratio=(0.7, 0.1, 0.2), dask_batching=False)

Returns torch dataloaders using index batching for Chickenpox Hungary dataset.

Parameters:
  • lags (int, optional) – The number of time lags. Defaults to 4.

  • batch_size (int, optional) – Batch size. Defaults to 4.

  • shuffle (bool, optional) – If the data should be shuffled. Defaults to False.

  • allGPU (int, optional) – GPU device ID for performing preprocessing in GPU memory. If -1, computation is done on CPU. Defaults to -1.

  • ratio (tuple of float, optional) – The desired train, validation, and test split ratios, respectively.

Returns:

A 5-tuple containing:
  • train_dataLoader (torch.utils.data.DataLoader): Dataloader for the training set.

  • val_dataLoader (torch.utils.data.DataLoader): Dataloader for the validation set.

  • test_dataLoader (torch.utils.data.DataLoader): Dataloader for the test set.

  • edges (torch.Tensor): The graph edges as a 2D matrix, shape [2, num_edges].

  • edge_weights (torch.Tensor): Each graph edge’s weight, shape [num_edges].

Return type:

Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor]

class PedalMeDatasetLoader

A dataset of PedalMe Bicycle deliver orders in London between 2020 and 2021. We made it public during the development of PyTorch Geometric Temporal. The underlying graph is static - vertices are localities and edges are spatial_connections. Vertex features are lagged weekly counts of the delivery demands (we included 4 lags). The target is the weekly number of deliveries the upcoming week. Our dataset consist of more than 30 snapshots (weeks).

get_dataset(lags: int = 4) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returning the PedalMe London demand data iterator.

Args types:
  • lags (int) - The number of time lags.

Return types:
  • dataset (StaticGraphTemporalSignal) - The PedalMe dataset.

class WikiMathsDatasetLoader

A dataset of vital mathematics articles from Wikipedia. We made it public during the development of PyTorch Geometric Temporal. The underlying graph is static - vertices are Wikipedia pages and edges are links between them. The graph is directed and weighted. Weights represent the number of links found at the source Wikipedia page linking to the target Wikipedia page. The target is the daily user visits to the Wikipedia pages between March 16th 2019 and March 15th 2021 which results in 731 periods.

get_dataset(lags: int = 8) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returning the Wikipedia Vital Mathematics data iterator.

Args types:
  • lags (int) - The number of time lags.

Return types:
  • dataset (StaticGraphTemporalSignal) - The Wiki Maths dataset.

class WindmillOutputLargeDatasetLoader(raw_data_dir=os.path.join(os.getcwd(), 'data'), index=False)

Hourly energy output of windmills from a European country for more than 2 years. Vertices represent 319 windmills and weighted edges describe the strength of relationships. The target variable allows for regression tasks.

Parameters:

index (bool, optional) – If True, initializes the dataloader to use index-based batching. Defaults to False.

get_dataset(lags: int = 8) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returning the Windmill Output data iterator.

Args types:
  • lags (int) - The number of time lags.

Return types:
  • dataset (StaticGraphTemporalSignal) - The Windmill Output dataset.

get_index_dataset(lags=8, batch_size=64, shuffle=False, allGPU=-1, ratio=(0.7, 0.1, 0.2), dask_batching=False)

Returns torch dataloaders using index batching for WindmillLarge dataset.

Parameters:
  • lags (int, optional) – The number of time lags. Defaults to 8.

  • batch_size (int, optional) – Batch size. Defaults to 64.

  • shuffle (bool, optional) – If the data should be shuffled. Defaults to False.

  • allGPU (int, optional) – GPU device ID for performing preprocessing in GPU memory. If -1, computation is done on CPU. Defaults to -1.

  • ratio (tuple of float, optional) – The desired train, validation, and test split ratios, respectively.

Returns:

A 7-tuple containing:
  • train_dataLoader (torch.utils.data.DataLoader): Dataloader for the training set.

  • val_dataLoader (torch.utils.data.DataLoader): Dataloader for the validation set.

  • test_dataLoader (torch.utils.data.DataLoader): Dataloader for the test set.

  • edges (torch.Tensor): The graph edges as a 2D matrix, shape [2, num_edges].

  • edge_weights (torch.Tensor): Each graph edge’s weight, shape [num_edges].

  • means (torch.Tensor): The means of each feature dimension.

  • stds (torch.Tensor): The standard deviations of each feature dimension.

Return type:

Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

class WindmillOutputMediumDatasetLoader

Hourly energy output of windmills from a European country for more than 2 years. Vertices represent 26 windmills and weighted edges describe the strength of relationships. The target variable allows for regression tasks.

get_dataset(lags: int = 8) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returning the Windmill Output data iterator.

Args types:
  • lags (int) - The number of time lags.

Return types:
  • dataset (StaticGraphTemporalSignal) - The Windmill Output dataset.

class WindmillOutputSmallDatasetLoader

Hourly energy output of windmills from a European country for more than 2 years. Vertices represent 11 windmills and weighted edges describe the strength of relationships. The target variable allows for regression tasks.

get_dataset(lags: int = 8) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returning the Windmill Output data iterator.

Args types:
  • lags (int) - The number of time lags.

Return types:
  • dataset (StaticGraphTemporalSignal) - The Windmill Output dataset.

class METRLADatasetLoader(raw_data_dir=os.path.join(os.getcwd(), 'data'), index: bool = False)

A traffic forecasting dataset based on Los Angeles Metropolitan traffic conditions. The dataset contains traffic readings collected from 207 loop detectors on highways in Los Angeles County in aggregated 5 minute intervals for 4 months between March 2012 to June 2012.

For further details on the version of the sensor network and discretization see: “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting”

get_dataset(num_timesteps_in: int = 12, num_timesteps_out: int = 12) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returns data iterator for METR-LA dataset as an instance of the static graph temporal signal class.

Return types:
  • dataset (StaticGraphTemporalSignal) - The METR-LA traffic

    forecasting dataset.

get_index_dataset(lags: int = 12, batch_size: int = 64, shuffle: bool = False, allGPU: int = -1, ratio: Tuple[float, float, float] = (0.7, 0.1, 0.2), world_size: int = -1, ddp_rank: int = -1, dask_batching: bool = False) Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

Returns torch dataloaders using index batching for Metr-LA dataset.

Parameters:
  • lags (int, optional) – The number of time lags. Defaults to 12.

  • batch_size (int, optional) – Batch size. Defaults to 64.

  • shuffle (bool, optional) – If the data should be shuffled. Defaults to False.

  • allGPU (int, optional) – GPU device ID for performing preprocessing in GPU memory. If -1, computation is done on CPU. Defaults to -1.

  • world_size (int, optional) – The number of workers if DDP is being used. Defaults to -1.

  • ddp_rank (int, optional) – The DDP rank of the worker if DDP is being used. Defaults to -1.

  • ratio (tuple of float, optional) – The desired train, validation, and test split ratios, respectively.

Returns: Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:

A 7-tuple containing:
  • train_dataLoader (torch.utils.data.DataLoader): Dataloader for the training set.

  • val_dataLoader (torch.utils.data.DataLoader): Dataloader for the validation set.

  • test_dataLoader (torch.utils.data.DataLoader): Dataloader for the test set.

  • edges (torch.Tensor): The graph edges as a 2D matrix, shape [2, num_edges].

  • edge_weights (torch.Tensor): Each graph edge’s weight, shape [num_edges].

  • means (torch.Tensor): The means of each feature dimension.

  • stds (torch.Tensor): The standard deviations of each feature dimension.

class PemsBayDatasetLoader(raw_data_dir: str = os.path.join(os.getcwd(), 'data'), index: bool = False)

A traffic forecasting dataset as described in Diffusion Convolution Layer Paper.

This traffic dataset is collected by California Transportation Agencies (CalTrans) Performance Measurement System (PeMS). It is represented by a network of 325 traffic sensors in the Bay Area with 6 months of traffic readings ranging from Jan 1st 2017 to May 31th 2017 in 5 minute intervals.

For details see: “Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting”

Parameters:
  • raw_data_dir (string, optional) – The directory to download the PeMS-Bay files to. Defaults to “data/”.

  • index (bool, optional) – If True, initializes the dataloader to use index-based batching. Defaults to False.

get_dataset(num_timesteps_in: int = 12, num_timesteps_out: int = 12) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returns data iterator for PEMS-BAY dataset as an instance of the static graph temporal signal class.

Return types:
  • dataset (StaticGraphTemporalSignal) - The PEMS-BAY traffic

    forecasting dataset.

get_index_dataset(lags: int = 12, batch_size: int = 64, shuffle: bool = False, allGPU: int = -1, ratio: Tuple[float, float, float] = (0.7, 0.1, 0.2), world_size: int = -1, ddp_rank: int = -1, dask_batching: bool = False) Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

Returns torch dataloaders using index batching for PeMSBay dataset.

Parameters:
  • lags (int, optional) – The number of time lags. Defaults to 12.

  • batch_size (int, optional) – Batch size. Defaults to 64.

  • shuffle (bool, optional) – If the data should be shuffled. Defaults to False.

  • allGPU (int, optional) – GPU device ID for performing preprocessing in GPU memory. If -1, computation is done on CPU. Defaults to -1.

  • world_size (int, optional) – The number of workers if DDP is being used. Defaults to -1.

  • ddp_rank (int, optional) – The DDP rank of the worker if DDP is being used. Defaults to -1.

  • ratio (tuple of float, optional) – The desired train, validation, and test split ratios, respectively.

Returns:

A 7-tuple containing:
  • train_dataLoader (torch.utils.data.DataLoader): Dataloader for the training set.

  • val_dataLoader (torch.utils.data.DataLoader): Dataloader for the validation set.

  • test_dataLoader (torch.utils.data.DataLoader): Dataloader for the test set.

  • edges (torch.Tensor): The graph edges as a 2D matrix, shape [2, num_edges].

  • edge_weights (torch.Tensor): Each graph edge’s weight, shape [num_edges].

  • means (torch.Tensor): The means of each feature dimension.

  • stds (torch.Tensor): The standard deviations of each feature dimension.

Return type:

Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

class PemsAllLADatasetLoader(raw_data_dir=os.path.join(os.getcwd(), 'data'), index=False)

A traffic forecasting dataset that covers Los Angeles. This traffic dataset is collected by California Transportation Agencies (CalTrans) Performance Measurement System (PeMS).

For details see: “Graph-partitioning-based diffusion convolutional recurrent neural network for large-scale traffic forecasting”

Parameters:
  • raw_data_dir (string, optional) – The directory to download the PeMS-All-LA files to. Defaults to “data/”.

  • index (bool, optional) – If True, initializes the dataloader to use index-based batching. Defaults to False.

get_index_dataset(lags: int = 12, batch_size: int = 64, shuffle: bool = False, allGPU: int = -1, ratio: Tuple[float, float, float] = (0.7, 0.1, 0.2), world_size: int = -1, ddp_rank: int = -1, dask_batching: bool = False) Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

Returns torch dataloaders using index batching for PeMS dataset.

Parameters:
  • lags (int, optional) – The number of time lags. Defaults to 12.

  • batch_size (int, optional) – Batch size. Defaults to 64.

  • shuffle (bool, optional) – If the data should be shuffled. Defaults to False.

  • allGPU (int, optional) – GPU device ID for performing preprocessing in GPU memory. If -1, computation is done on CPU. Defaults to -1.

  • world_size (int, optional) – The number of workers if DDP is being used. Defaults to -1.

  • ddp_rank (int, optional) – The DDP rank of the worker if DDP is being used. Defaults to -1.

  • ratio (tuple of float, optional) – The desired train, validation, and test split ratios, respectively.

Returns: Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:

A 7-tuple containing:
  • train_dataLoader (torch.utils.data.DataLoader): Dataloader for the training set.

  • val_dataLoader (torch.utils.data.DataLoader): Dataloader for the validation set.

  • test_dataLoader (torch.utils.data.DataLoader): Dataloader for the test set.

  • edges (torch.Tensor): The graph edges as a 2D matrix, shape [2, num_edges].

  • edge_weights (torch.Tensor): Each graph edge’s weight, shape [num_edges].

  • means (torch.Tensor): The means of each feature dimension.

  • stds (torch.Tensor): The standard deviations of each feature dimension.

class PemsDatasetLoader(raw_data_dir=os.path.join(os.getcwd(), 'data'), index=False)

A traffic forecasting dataset for the entirety of California. This traffic dataset is collected by California Transportation Agencies (CalTrans) Performance Measurement System (PeMS).

For details see: “Graph-partitioning-based diffusion convolutional recurrent neural network for large-scale traffic forecasting”

Parameters:
  • raw_data_dir (string, optional) – The directory to download the PeMS files to. Defaults to “data/”.

  • index (bool, optional) – If True, initializes the dataloader to use index-based batching. Defaults to False.

get_index_dataset(lags: int = 12, batch_size: int = 64, shuffle: bool = False, allGPU: int = -1, ratio: Tuple[float, float, float] = (0.7, 0.1, 0.2), world_size: int = -1, ddp_rank: int = -1, dask_batching: bool = False) Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

Returns torch dataloaders using index batching for PeMS dataset.

Parameters:
  • lags (int, optional) – The number of time lags. Defaults to 12.

  • batch_size (int, optional) – Batch size. Defaults to 64.

  • shuffle (bool, optional) – If the data should be shuffled. Defaults to False.

  • allGPU (int, optional) – GPU device ID for performing preprocessing in GPU memory. If -1, computation is done on CPU. Defaults to -1.

  • world_size (int, optional) – The number of workers if DDP is being used. Defaults to -1.

  • ddp_rank (int, optional) – The DDP rank of the worker if DDP is being used. Defaults to -1.

  • ratio (tuple of float, optional) – The desired train, validation, and test split ratios, respectively.

Returns: Tuple[torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:

A 7-tuple containing:
  • train_dataLoader (torch.utils.data.DataLoader): Dataloader for the training set.

  • val_dataLoader (torch.utils.data.DataLoader): Dataloader for the validation set.

  • test_dataLoader (torch.utils.data.DataLoader): Dataloader for the test set.

  • edges (torch.Tensor): The graph edges as a 2D matrix, shape [2, num_edges].

  • edge_weights (torch.Tensor): Each graph edge’s weight, shape [num_edges].

  • means (torch.Tensor): The means of each feature dimension.

  • stds (torch.Tensor): The standard deviations of each feature dimension.

class EnglandCovidDatasetLoader

A dataset of mobility and history of reported cases of COVID-19 in England NUTS3 regions, from 3 March to 12 of May. The dataset is segmented in days and the graph is directed and weighted. The graph indicates how many people moved from one region to the other each day, based on Facebook Data For Good disease prevention maps. The node features correspond to the number of COVID-19 cases in the region in the past window days. The task is to predict the number of cases in each node after 1 day. For details see this paper: “Transfer Graph Neural Networks for Pandemic Forecasting.”

get_dataset(lags: int = 8) torch_geometric_temporal.signal.DynamicGraphTemporalSignal

Returning the England COVID19 data iterator.

Args types:
  • lags (int) - The number of time lags.

Return types:
  • dataset (StaticGraphTemporalSignal) - The England Covid dataset.

class MontevideoBusDatasetLoader

A dataset of inflow passenger at bus stop level from Montevideo city. This dataset comprises hourly inflow passenger data at bus stop level for 11 bus lines during October 2020 from Montevideo city (Uruguay). The bus lines selected are the ones that carry people to the center of the city and they load more than 25% of the total daily inflow traffic. Vertices are bus stops, edges are links between bus stops when a bus line connects them and the weight represent the road distance. The target is the passenger inflow. This is a curated dataset made from different data sources of the Metropolitan Transportation System (STM) of Montevideo. These datasets are freely available to anyone in the National Catalog of Open Data from the government of Uruguay (https://catalogodatos.gub.uy/).

get_dataset(lags: int = 4, target_var: str = 'y', feature_vars: List[str] = ['y']) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returning the MontevideoBus passenger inflow data iterator.

Parameters:
  • lags (int, optional) – The number of time lags, by default 4.

  • target_var (str, optional) – Target variable name, by default “y”.

  • feature_vars (List[str], optional) – List of feature variables, by default [“y”].

Returns:

The MontevideoBus dataset.

Return type:

StaticGraphTemporalSignal

class MTMDatasetLoader

A dataset of Methods-Time Measurement-1 (MTM-1) motions, signalled as consecutive video frames of 21 3D hand keypoints, acquired via MediaPipe Hands from RGB-Video material. Vertices are the finger joints of the human hand and edges are the bones connecting them. The targets are manually labeled for each frame, according to one of the five MTM-1 motions (classes \(C\)): Grasp, Release, Move, Reach, Position plus a negative class for frames without graph signals (no hand present). This is a classification task where \(T\) consecutive frames need to be assigned to the corresponding class \(C\). The data x is returned in shape (3, 21, T), the target is returned one-hot-encoded in shape (T, 6).

get_dataset(frames: int = 16) torch_geometric_temporal.signal.StaticGraphTemporalSignal

Returning the MTM-1 motion data iterator.

Args types:
  • frames (int) - The number of consecutive frames T, default 16.

Return types:
  • dataset (StaticGraphTemporalSignal) - The MTM-1 dataset.