Example Datasets#
pgmpy ships a curated collection of benchmark datasets for causal discovery and graphical modeling. Each dataset includes not only the data table, but also metadata such as ground-truth graphs, expert knowledge, and dataset characteristics — ready to use without any manual download or preparation.
Tip
When to use this vs. Example Models: Example Datasets provide data tables for learning graph structures and parameters. Example Models provide pre-built graph structures (and optionally parameters) for inference, simulation, and benchmarking.
At a Glance#
Unified API: Discover datasets with
list_datasets(...)and load them withload_dataset(...).Rich Filtering: Filter datasets by type, size, and available metadata before loading.
Bundled Metadata: Each dataset carries its ground-truth graph, expert knowledge, and tags alongside the data.
API#
The dataset API has two entry points — discover, then load:
from pgmpy.datasets import list_datasets, load_dataset
matches = list_datasets(is_discrete=True, has_ground_truth=True)
dataset = load_dataset(matches[0])
print(dataset.name)
print(dataset.data.shape)
print(dataset.tags)
The loaded object exposes the data as a pandas.DataFrame and keeps all metadata on
the same object for downstream workflows.
Filtering#
list_datasets(**filters) narrows the catalog before loading. Supported filters include
data type (is_discrete, is_continuous, is_mixed), available metadata
(has_ground_truth, has_expert_knowledge), size (n_variables, n_samples), and
data origin (is_simulated, is_interventional).
Bundled Metadata#
Each loaded dataset provides structured access to:
data: The tabular data as apandas.DataFrame, ready forestimator.fit(data).ground_truth: The true causal graph (when available), for use with supervised metrics.expert_knowledge: Domain constraints (when available), for use with discovery algorithms.tags: Dictionary of dataset properties and characteristics.
See Also#
See also
Causal Discovery — Use loaded datasets to learn causal graphs.
Example Models — Pre-built graph structures to pair with these datasets.
API Reference#
For the full list of available datasets: