Reading and Writing from pgmpy file formats¶
pgmpy is a python library for creation, manipulation and implementation of Probabilistic graph models. There are various standard file formats for representing PGM data. PGM data basically consists of graph, a distribution assoicated to each node and a few other attributes of a graph.
pgmpy has a functionality to read networks from and write networks to these standard file formats. Currently pgmpy supports 5 file formats ProbModelXML, PomDPX, XMLBIF, XMLBeliefNetwork and UAI file formats. Using these modules, models can be specified in a uniform file format and readily converted to bayesian or markov model objects.
Now, Let’s read a ProbModel XML File and get the corresponding model instance of the probmodel.
[55]:
from pgmpy.readwrite import ProbModelXMLReader
[56]:
reader_string = ProbModelXMLReader('../files/example.pgmx')
Now to get the corresponding model instance we need get_model
[57]:
model = reader_string.get_model()
Now we can query this model accoring to our requirements. It is an instance of BayesianModel or MarkovModel depending on the type of the model which is given.
Suppose we want to know all the nodes in the given model, we can do:
[58]:
print(model.nodes())
['X-ray', 'Bronchitis', 'Dyspnea', 'VisitToAsia', 'Smoker', 'LungCancer', 'Tuberculosis', 'TuberculosisOrCancer']
To get all the edges we can use model.edges
method.
[59]:
model.edges()
[59]:
OutEdgeView([('Bronchitis', 'Dyspnea'), ('VisitToAsia', 'Tuberculosis'), ('Smoker', 'Bronchitis'), ('Smoker', 'LungCancer'), ('LungCancer', 'TuberculosisOrCancer'), ('Tuberculosis', 'TuberculosisOrCancer'), ('TuberculosisOrCancer', 'Dyspnea'), ('TuberculosisOrCancer', 'X-ray')])
To get all the cpds of the given model we can use model.get_cpds
and to get the corresponding values we can iterate over each cpd and call the corresponding get_cpd
method.
[60]:
cpds = model.get_cpds()
for cpd in cpds:
print(cpd)
+----------------------+-------------------------+-------------------------+
| TuberculosisOrCancer | TuberculosisOrCancer(0) | TuberculosisOrCancer(1) |
+----------------------+-------------------------+-------------------------+
| X-ray(0) | 0.95 | 0.05 |
+----------------------+-------------------------+-------------------------+
| X-ray(1) | 0.02 | 0.98 |
+----------------------+-------------------------+-------------------------+
+---------------+-----------+-----------+
| Smoker | Smoker(0) | Smoker(1) |
+---------------+-----------+-----------+
| Bronchitis(0) | 0.7 | 0.3 |
+---------------+-----------+-----------+
| Bronchitis(1) | 0.4 | 0.6 |
+---------------+-----------+-----------+
+----------------------+-------------------------+-------------------------+-------------------------+-------------------------+
| TuberculosisOrCancer | TuberculosisOrCancer(0) | TuberculosisOrCancer(0) | TuberculosisOrCancer(1) | TuberculosisOrCancer(1) |
+----------------------+-------------------------+-------------------------+-------------------------+-------------------------+
| Bronchitis | Bronchitis(0) | Bronchitis(1) | Bronchitis(0) | Bronchitis(1) |
+----------------------+-------------------------+-------------------------+-------------------------+-------------------------+
| Dyspnea(0) | 0.9 | 0.1 | 0.3 | 0.7 |
+----------------------+-------------------------+-------------------------+-------------------------+-------------------------+
| Dyspnea(1) | 0.2 | 0.8 | 0.1 | 0.9 |
+----------------------+-------------------------+-------------------------+-------------------------+-------------------------+
+----------------+------+
| VisitToAsia(0) | 0.99 |
+----------------+------+
| VisitToAsia(1) | 0.01 |
+----------------+------+
+-----------+-----+
| Smoker(0) | 0.5 |
+-----------+-----+
| Smoker(1) | 0.5 |
+-----------+-----+
+---------------+-----------+-----------+
| Smoker | Smoker(0) | Smoker(1) |
+---------------+-----------+-----------+
| LungCancer(0) | 0.99 | 0.01 |
+---------------+-----------+-----------+
| LungCancer(1) | 0.9 | 0.1 |
+---------------+-----------+-----------+
+-----------------+----------------+----------------+
| VisitToAsia | VisitToAsia(0) | VisitToAsia(1) |
+-----------------+----------------+----------------+
| Tuberculosis(0) | 0.99 | 0.01 |
+-----------------+----------------+----------------+
| Tuberculosis(1) | 0.95 | 0.05 |
+-----------------+----------------+----------------+
+-------------------------+-----------------+-----------------+-----------------+-----------------+
| LungCancer | LungCancer(0) | LungCancer(0) | LungCancer(1) | LungCancer(1) |
+-------------------------+-----------------+-----------------+-----------------+-----------------+
| Tuberculosis | Tuberculosis(0) | Tuberculosis(1) | Tuberculosis(0) | Tuberculosis(1) |
+-------------------------+-----------------+-----------------+-----------------+-----------------+
| TuberculosisOrCancer(0) | 1.0 | 0.0 | 0.0 | 1.0 |
+-------------------------+-----------------+-----------------+-----------------+-----------------+
| TuberculosisOrCancer(1) | 0.0 | 1.0 | 0.0 | 1.0 |
+-------------------------+-----------------+-----------------+-----------------+-----------------+
pgmpy not only allows us to read from the specific file format but also helps us to write the given model into the specific file format. Let’s write a sample model into Probmodel XML file.
For that first define our data for the model.
[61]:
import numpy as np
edges_list = [('VisitToAsia', 'Tuberculosis'),
('LungCancer', 'TuberculosisOrCancer'),
('Smoker', 'LungCancer'),
('Smoker', 'Bronchitis'),
('Tuberculosis', 'TuberculosisOrCancer'),
('Bronchitis', 'Dyspnea'),
('TuberculosisOrCancer', 'Dyspnea'),
('TuberculosisOrCancer', 'X-ray')]
nodes = {'Smoker': {'States': {'no': {}, 'yes': {}},
'role': 'chance',
'type': 'finiteStates',
'Coordinates': {'y': '52', 'x': '568'},
'AdditionalProperties': {'Title': 'S', 'Relevance': '7.0'}},
'Bronchitis': {'States': {'no': {}, 'yes': {}},
'role': 'chance',
'type': 'finiteStates',
'Coordinates': {'y': '181', 'x': '698'},
'AdditionalProperties': {'Title': 'B', 'Relevance': '7.0'}},
'VisitToAsia': {'States': {'no': {}, 'yes': {}},
'role': 'chance',
'type': 'finiteStates',
'Coordinates': {'y': '58', 'x': '290'},
'AdditionalProperties': {'Title': 'A', 'Relevance': '7.0'}},
'Tuberculosis': {'States': {'no': {}, 'yes': {}},
'role': 'chance',
'type': 'finiteStates',
'Coordinates': {'y': '150', 'x': '201'},
'AdditionalProperties': {'Title': 'T', 'Relevance': '7.0'}},
'X-ray': {'States': {'no': {}, 'yes': {}},
'role': 'chance',
'AdditionalProperties': {'Title': 'X', 'Relevance': '7.0'},
'Coordinates': {'y': '322', 'x': '252'},
'Comment': 'Indica si el test de rayos X ha sido positivo',
'type': 'finiteStates'},
'Dyspnea': {'States': {'no': {}, 'yes': {}},
'role': 'chance',
'type': 'finiteStates',
'Coordinates': {'y': '321', 'x': '533'},
'AdditionalProperties': {'Title': 'D', 'Relevance': '7.0'}},
'TuberculosisOrCancer': {'States': {'no': {}, 'yes': {}},
'role': 'chance',
'type': 'finiteStates',
'Coordinates': {'y': '238', 'x': '336'},
'AdditionalProperties': {'Title': 'E', 'Relevance': '7.0'}},
'LungCancer': {'States': {'no': {}, 'yes': {}},
'role': 'chance',
'type': 'finiteStates',
'Coordinates': {'y': '152', 'x': '421'},
'AdditionalProperties': {'Title': 'L', 'Relevance': '7.0'}}}
edges = {'LungCancer': {'TuberculosisOrCancer': {'directed': 'true'}},
'Smoker': {'LungCancer': {'directed': 'true'},
'Bronchitis': {'directed': 'true'}},
'Dyspnea': {},
'X-ray': {},
'VisitToAsia': {'Tuberculosis': {'directed': 'true'}},
'TuberculosisOrCancer': {'X-ray': {'directed': 'true'},
'Dyspnea': {'directed': 'true'}},
'Bronchitis': {'Dyspnea': {'directed': 'true'}},
'Tuberculosis': {'TuberculosisOrCancer': {'directed': 'true'}}}
cpds = [{'Values': np.array([[0.95, 0.05], [0.02, 0.98]]),
'Variables': {'X-ray': ['TuberculosisOrCancer']}},
{'Values': np.array([[0.7, 0.3], [0.4, 0.6]]),
'Variables': {'Bronchitis': ['Smoker']}},
{'Values': np.array([[0.9, 0.1, 0.3, 0.7], [0.2, 0.8, 0.1, 0.9]]),
'Variables': {'Dyspnea': ['TuberculosisOrCancer', 'Bronchitis']}},
{'Values': np.array([[0.99], [0.01]]),
'Variables': {'VisitToAsia': []}},
{'Values': np.array([[0.5], [0.5]]),
'Variables': {'Smoker': []}},
{'Values': np.array([[0.99, 0.01], [0.9, 0.1]]),
'Variables': {'LungCancer': ['Smoker']}},
{'Values': np.array([[0.99, 0.01], [0.95, 0.05]]),
'Variables': {'Tuberculosis': ['VisitToAsia']}},
{'Values': np.array([[1, 0, 0, 1], [0, 1, 0, 1]]),
'Variables': {'TuberculosisOrCancer': ['LungCancer', 'Tuberculosis']}}]
Now let’s create a BayesianModel
for this data.
[62]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
model = BayesianModel(edges_list)
for node in nodes:
for key, value in nodes[node].items():
model.nodes[node][key] = value
for u in edges.keys():
for v in edges[u].keys():
#import pdb; pdb.set_trace()
for key, value in edges[u][v].items():
model.edges[(u, v)][key] = value
tabular_cpds = []
for cpd in cpds:
var = list(cpd['Variables'].keys())[0]
evidence = cpd['Variables'][var]
values = cpd['Values']
states = len(nodes[var]['States'])
evidence_card = [len(nodes[evidence_var]['States'])
for evidence_var in evidence]
tabular_cpds.append(
TabularCPD(var, states, values, evidence, evidence_card))
model.add_cpds(*tabular_cpds)
[63]:
from pgmpy.readwrite import ProbModelXMLWriter, get_probmodel_data
To get the data which we need to give to the ProbModelXMLWriter to get the corresponding fileformat we need to use the method get_probmodel_data. This method is only specific to ProbModelXML file, for other file formats we would directly pass the model to the given Writer Class.
[64]:
model_data = get_probmodel_data(model)
writer = ProbModelXMLWriter(model_data=model_data)
print(writer.__str__().decode('utf-8'))
<ProbModelXML formatVersion="1.0">
<ProbNet type="BayesianNetwork">
<Variables>
<Variable name="Bronchitis" role="chance" type="finiteStates">
<Coordinates x="698" y="181" />
<Property name="Relevance" value="7.0" />
<Property name="Title" value="B" />
<States>
<State name="no">
<AdditionalProperties />
</State>
<State name="yes">
<AdditionalProperties />
</State>
</States>
</Variable>
<Variable name="Dyspnea" role="chance" type="finiteStates">
<Coordinates x="533" y="321" />
<Property name="Relevance" value="7.0" />
<Property name="Title" value="D" />
<States>
<State name="no">
<AdditionalProperties />
</State>
<State name="yes">
<AdditionalProperties />
</State>
</States>
</Variable>
<Variable name="LungCancer" role="chance" type="finiteStates">
<Coordinates x="421" y="152" />
<Property name="Relevance" value="7.0" />
<Property name="Title" value="L" />
<States>
<State name="no">
<AdditionalProperties />
</State>
<State name="yes">
<AdditionalProperties />
</State>
</States>
</Variable>
<Variable name="Smoker" role="chance" type="finiteStates">
<Coordinates x="568" y="52" />
<Property name="Relevance" value="7.0" />
<Property name="Title" value="S" />
<States>
<State name="no">
<AdditionalProperties />
</State>
<State name="yes">
<AdditionalProperties />
</State>
</States>
</Variable>
<Variable name="Tuberculosis" role="chance" type="finiteStates">
<Coordinates x="201" y="150" />
<Property name="Relevance" value="7.0" />
<Property name="Title" value="T" />
<States>
<State name="no">
<AdditionalProperties />
</State>
<State name="yes">
<AdditionalProperties />
</State>
</States>
</Variable>
<Variable name="TuberculosisOrCancer" role="chance" type="finiteStates">
<Coordinates x="336" y="238" />
<Property name="Relevance" value="7.0" />
<Property name="Title" value="E" />
<States>
<State name="no">
<AdditionalProperties />
</State>
<State name="yes">
<AdditionalProperties />
</State>
</States>
</Variable>
<Variable name="VisitToAsia" role="chance" type="finiteStates">
<Coordinates x="290" y="58" />
<Property name="Relevance" value="7.0" />
<Property name="Title" value="A" />
<States>
<State name="no">
<AdditionalProperties />
</State>
<State name="yes">
<AdditionalProperties />
</State>
</States>
</Variable>
<Variable name="X-ray" role="chance" type="finiteStates">
<Comment>Indica si el test de rayos X ha sido positivo</Comment>
<Coordinates x="252" y="322" />
<Property name="Relevance" value="7.0" />
<Property name="Title" value="X" />
<States>
<State name="no">
<AdditionalProperties />
</State>
<State name="yes">
<AdditionalProperties />
</State>
</States>
</Variable>
</Variables>
<Links>
<Link directed="true" var1="Bronchitis" var2="Dyspnea">
<AdditionalProperties />
</Link>
<Link directed="true" var1="LungCancer" var2="TuberculosisOrCancer">
<AdditionalProperties />
</Link>
<Link directed="true" var1="Smoker" var2="Bronchitis">
<AdditionalProperties />
</Link>
<Link directed="true" var1="Smoker" var2="LungCancer">
<AdditionalProperties />
</Link>
<Link directed="true" var1="Tuberculosis" var2="TuberculosisOrCancer">
<AdditionalProperties />
</Link>
<Link directed="true" var1="TuberculosisOrCancer" var2="Dyspnea">
<AdditionalProperties />
</Link>
<Link directed="true" var1="TuberculosisOrCancer" var2="X-ray">
<AdditionalProperties />
</Link>
<Link directed="true" var1="VisitToAsia" var2="Tuberculosis">
<AdditionalProperties />
</Link>
</Links>
<Potentials>
<Potential role="conditionalProbability" type="Table">
<Variables>
<Variable name="X-ray" />
<Variable name="TuberculosisOrCancer" />
</Variables>
<Values>0.95 0.05 0.02 0.98 </Values>
</Potential>
<Potential role="conditionalProbability" type="Table">
<Variables>
<Variable name="Bronchitis" />
<Variable name="Smoker" />
</Variables>
<Values>0.7 0.3 0.4 0.6 </Values>
</Potential>
<Potential role="conditionalProbability" type="Table">
<Variables>
<Variable name="Dyspnea" />
<Variable name="Bronchitis" />
<Variable name="TuberculosisOrCancer" />
</Variables>
<Values>0.9 0.1 0.3 0.7 0.2 0.8 0.1 0.9 </Values>
</Potential>
<Potential role="conditionalProbability" type="Table">
<Variables>
<Variable name="VisitToAsia" />
</Variables>
<Values>0.99 0.01 </Values>
</Potential>
<Potential role="conditionalProbability" type="Table">
<Variables>
<Variable name="Smoker" />
</Variables>
<Values>0.5 0.5 </Values>
</Potential>
<Potential role="conditionalProbability" type="Table">
<Variables>
<Variable name="LungCancer" />
<Variable name="Smoker" />
</Variables>
<Values>0.99 0.01 0.9 0.1 </Values>
</Potential>
<Potential role="conditionalProbability" type="Table">
<Variables>
<Variable name="Tuberculosis" />
<Variable name="VisitToAsia" />
</Variables>
<Values>0.99 0.01 0.95 0.05 </Values>
</Potential>
<Potential role="conditionalProbability" type="Table">
<Variables>
<Variable name="TuberculosisOrCancer" />
<Variable name="LungCancer" />
<Variable name="Tuberculosis" />
</Variables>
<Values>1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 </Values>
</Potential>
</Potentials>
<AdditionalConstraints />
<AdditionalProperties />
<DecisionCriteria />
</ProbNet>
</ProbModelXML>
To write the xml data into the file we can use the method write_file
of the given Writer class.
[65]:
writer.write_file('probmodelxml.pgmx')
General WorkFlow of the readwrite module¶
pgmpy.readwrite.[fileformat]Reader is base class for reading the given file format. Replace file format with the desired fileforamt from which you want to read the file. In this base class there are different methods defined to parse the given file. For example for XMLBelief Network various methods which are defined are as follows:
[66]:
from pgmpy.readwrite.XMLBeliefNetwork import XBNReader
reader = XBNReader('../files/xmlbelief.xml')
get_model
: It returns an instance of the given model, for ex, BayesianModel in cases of XMLBelief format.
[67]:
model = reader.get_model()
print(model.nodes())
print(model.edges())
['a', 'b', 'c', 'd', 'e']
[('a', 'b'), ('a', 'c'), ('b', 'd'), ('c', 'd'), ('c', 'e')]
pgmpy.readwrite.[fileformat]Writer is base class for writing the model into the given file format. It takes a model as an argument which can be an instance of BayesianModel, MarkovModel. Replace file fomat with the desired fileforamt from which you want to read the file. In this base class there are different methods defined to set the contents of the new file to be created from the given model. For example for XMLBelief Network various methods such as set_analysisnotebook, etc are defined which helps to set up the network data.
[53]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
import numpy as np
nodes = {'c': {'STATES': ['Present', 'Absent'],
'DESCRIPTION': '(c) Brain Tumor',
'YPOS': '11935',
'XPOS': '15250',
'TYPE': 'discrete'},
'a': {'STATES': ['Present', 'Absent'],
'DESCRIPTION': '(a) Metastatic Cancer',
'YPOS': '10465',
'XPOS': '13495',
'TYPE': 'discrete'},
'b': {'STATES': ['Present', 'Absent'],
'DESCRIPTION': '(b) Serum Calcium Increase',
'YPOS': '11965',
'XPOS': '11290',
'TYPE': 'discrete'},
'e': {'STATES': ['Present', 'Absent'],
'DESCRIPTION': '(e) Papilledema',
'YPOS': '13240',
'XPOS': '17305',
'TYPE': 'discrete'},
'd': {'STATES': ['Present', 'Absent'],
'DESCRIPTION': '(d) Coma',
'YPOS': '12985',
'XPOS': '13960',
'TYPE': 'discrete'}}
model = BayesianModel([('b', 'd'), ('a', 'b'), ('a', 'c'), ('c', 'd'), ('c', 'e')])
cpd_distribution = {'a': {'TYPE': 'discrete', 'DPIS': np.array([[0.2, 0.8]])},
'e': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
[0.6, 0.4]]), 'CONDSET': ['c'], 'CARDINALITY': [2]},
'b': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
[0.2, 0.8]]), 'CONDSET': ['a'], 'CARDINALITY': [2]},
'c': {'TYPE': 'discrete', 'DPIS': np.array([[0.2, 0.8],
[0.05, 0.95]]), 'CONDSET': ['a'], 'CARDINALITY': [2]},
'd': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
[0.9, 0.1],
[0.7, 0.3],
[0.05, 0.95]]), 'CONDSET': ['b', 'c'], 'CARDINALITY': [2, 2]}}
tabular_cpds = []
for var, values in cpd_distribution.items():
evidence = values['CONDSET'] if 'CONDSET' in values else []
cpd = values['DPIS']
evidence_card = values['CARDINALITY'] if 'CARDINALITY' in values else []
states = nodes[var]['STATES']
cpd = TabularCPD(var, len(states), cpd,
evidence=evidence,
evidence_card=evidence_card)
tabular_cpds.append(cpd)
model.add_cpds(*tabular_cpds)
for var, properties in nodes.items():
for key, value in properties.items():
model.nodes[var][key] = value
[54]:
from pgmpy.readwrite.XMLBeliefNetwork import XBNWriter
writer = XBNWriter(model = model)