Expert In The Loop

class pgmpy.estimators.ExpertInLoop(data=None, **kwargs)[source]
estimate(pval_threshold=0.05, effect_size_threshold=0.05, use_llm=True, llm_model='gemini/gemini-1.5-flash', variable_descriptions=None, show_progress=True, **kwargs)[source]

Estimates a DAG from the data by utilizing expert knowledge.

The method iteratively adds and removes edges between variables (similar to Greedy Equivalence Search algorithm) based on a score metric that improves the model’s fit to the data the most. The score metric used is based on conditional independence testing. When adding an edge to the model, the method asks for expert knowledge to decide the orientation of the edge. Alternatively, an LLM can used to decide the orientation of the edge.

Parameters:
  • pval_threshold (float) – The p-value threshold to use for the test to determine whether there is a significant association between the variables or not.

  • effect_size_threshold (float) – The effect size threshold to use to suggest a new edge. If the conditional effect size between two variables is greater than the threshold, the algorithm would suggest to add an edge between them. And if the effect size for an edge is less than the threshold, would suggest to remove the edge.

  • use_llm (bool) – Whether to use a Large Language Model for edge orientation. If False, prompts the user to specify the direction between the edges.

  • llm_model (str (default: gemini/gemini-1.5-flash)) – The LLM model to use. Please refer to litellm documentation (https://docs.litellm.ai/docs/providers) for available model options. Default is gemini-1.5-flash

  • variable_descriptions (dict) – A dict of the form {var: description} giving a text description of each variable in the model.

  • show_progress (bool (default: True)) – If True, prints info of the running status.

  • kwargs (kwargs) – Any additional parameters to pass to litellm.completion method. Please refer documentation at: https://docs.litellm.ai/docs/completion/input#input-params-1

Returns:

pgmpy.base.DAG

Return type:

A DAG representing the learned causal structure.

Examples

>>> from pgmpy.utils import get_example_model
>>> from pgmpy.estimators import ExpertInLoop
>>> model = get_example_model('cancer')
>>> df = model.simulate(int(1e3))
>>> variable_descriptions = {
...     "Smoker": "A binary variable representing whether a person smokes or not.",
...     "Cancer": "A binary variable representing whether a person has cancer. ",
...     "Xray": "A binary variable representing the result of an X-ray test.",
...     "Pollution": "A binary variable representing whether the person is in a high-pollution area or not."
...     "Dyspnoea": "A binary variable representing whether a person has shortness of breath. "}
>>> dag = ExpertInLoop(df).estimate(
...                 effect_size_threshold=0.0001,
...                 use_llm=True,
...                 variable_descriptions=variable_descriptions)
>>> dag.edges()
OutEdgeView([('Smoker', 'Cancer'), ('Cancer', 'Xray'), ('Cancer', 'Dyspnoea'), ('Pollution', 'Cancer')])
test_all(dag)[source]

Runs CI tests on all possible combinations of variables in dag.

Parameters:

dag (pgmpy.base.DAG) – The DAG on which to run the tests.

Returns:

pd.DataFrame

Return type:

The results with p-values and effect sizes of all the tests.