Data can be locally loaded/saved when you use JavaBayes as an application. Note that applets cannot load/save data (they are forbidden by the browsers)!
Applications and applets can read Bayesian networks through the Internet; this opens the possibility that JavaBayes be used to help process and organize the huge amounts of data and knowledge in the Internet.
This section contains a detailed description of the formats that can be manipulated by JavaBayes. If you have no interest on this kind of information (if you are not reading/writing files for JavaBayes), you can skip this section entirely.
There are three different formats, and all three are supported by JavaBayes in the sense that JavaBayes can read files written on them.
The Bayesian Interchange Format version 0.1 (BIF 0.1) is a simple format, that has been succesfully used to represent a variety of networks. But BIF 0.1 had certain problems, and has been replaced by BIF version 0.15. BIF 0.15 is a more mature format and should work for most applications.
XMLBIF 0.3 is an experimental format, based on the new XML specification. The best way to understand it is to read about BIF 0.15, then read something about XML, then read the description of XMLBIF 0.3.
Because BIF 0.15 supercedes BIF0.1, JavaBayes does not save files in BIF 0.1 anymore. You can choose between XMLBIF 0.3 and BIF 0.15 in the Options menu.
Note that no format supports Noisy functions (since JavaBayes does not support those functions yet). The BIF formats also use the general concept of a property; implementations of the BIF format can use specific properties. JavaBayes handles some properties, such as observed, explanation and credal-set, which are explained later on.
For files, any extension is possible, but the extension bif is recommended for BIF 0.15, and the extension xml is tentatively used for XMLBIF 0.3.
It is important to understand how the JavaBayes formats handle the specification of probability values. All distributions are specified as arrays of real numbers, and the meaning of the numbers depends on the definition of the distribution. Note that the same representation is used in internal arrays to store and manipulate probability values.
The distribution p(f) in the example above
can be specified as follows:
A more complicated example would be a function p(A|B,C) where A has
3 values, B has 2 values and C has 4 values. The function is
represented as:
IMPORTANT: Notice that there is some redundancy in the values, because all probability functions must add up to one. Right now the BayesianNetworks package does not attempt to fill blanks or ensure consistency; the user has to provide the data in the correct format (it has to have the correct number of values, has to add to one, etc).
White spaces, tabs and newlines are ignored; the C/C++ style of comments is adopted. The ``,'' character is also ignored when it occurs between tokens.
The basic unit of information is a block: a piece of text which starts with a keyword and ends with the end of an attribute list (to be explained later). Arbitrary characters are allowed between blocks. This allows the user to insert arbitrarily long comments outside the blocks. It also allows user-specific blocks and commands to be placed outside the standard blocks.
Other than blocks, the BIF 0.15 refers to three entities: words, non-negative integers and non-negative reals.
A word is a contiguous sequence of characters, with the restriction that the first character be a letter. Characters are letters plus numbers plus the underline symbol (_) plus the dash symbol (-).
A non-negative number is a sequence of numeric characters, containing a decimal point or an exponent or both.
A block is a unit of information. The general format of a block is:
block-type block-name { attribute-name attribute-value; attribute-name attribute-value; attribute-name attribute-value; }with as many attributes as necessary. The closing semicolon is mandatory after each attribute.
There are three possible blocks: network, variable and probability blocks.
network "Robot-Planning" { property version 1.1; property author Nobody; }
variable Leg { type discrete[2] { long, short }; property temporary yes; }
probability ( "Leg" | "Arm" ) { table 0.1 0.9 0.9 0.1; }
The blocks must be placed in the following order:
Several attributes are defined at this point: property, type, table, default and entry attributes (the entry attribute is not associated with any keyword).
The attribute property can appear in all types of blocks. A property is just a string of arbitrary text to be associated with a block. Examples of properties:
property "size 12"; property "name Trial number ten";Any text is valid in the string following keyword property. The idea is to store information that is specific to a particular system or network in the properties. Any number of property attributes can appear in a block.
The type attribute is specific to variable blocks. The property type lists the values of a discrete variable:
type discrete[ number-of-values ] { list-of-values };The number-of-values token is a non-negative integer which indicates how many different values this variable may assume (the size of the list-of-values). The list-of-values is a sequence of words, each one the name of a variable value.
There are attributes that are specific to probability blocks (these attributes are discussed in the next section):
JavaBayes uses a number of properties to load and save information about Bayesian networks:
variable "light-on" {//2 values type discrete[2] { "true" "false" }; property "position = (218, 195)" ; }and you want to indicate that variable light-on is observed with value true (i.e., light-on = true is the evidence). You do this with the observed property:
variable "light-on" {//2 values property "observed true"; type discrete[2] { "true" "false" }; property "position = (218, 195)" ; }You can set as many variables as you want as observed; the syntax is simple:
property observed [ observed-value ];
variable "light-on" {//2 values type discrete[2] { "true" "false" }; property "position = (218, 195)" ; }and you want to indicate that variable light-on is to be estimated. You can set light-on as a explanation variable, i.e., a variable which will be estimated. The meaning of a explanatory variable is that you would like to know which value for the variable would produce the highest probability or expectation. It is not necessarily true that you can operate on the variable and change it at will; it is just that you want to know which value would be best in the face of evidence. You do set explanatory variables with the explanation property:
variable light-on{//2 values property "explanation"; type discrete[2] { "true" "false" }; property "position = (218, 195)" ; }If you request JavaBayes to produce the ``best'' configuration for the explanation variables, JavaBayes will only process the variables that are marked through an explanation property. You can set as many variables as you want as explanation variables; the syntax is simple:
property "explanation";
There are also properties that are related to robustness analysis in JavaBayes. Since robustness analysis is still an ongoing research project, the support for it is minimal. If you want to use robustness analysis now, please send me email. The properties related to robustness analysis always start with the keyword credal-set; if you are defining your own properties, please do not use this keyword.
Probability blocks are used to define the actual network topology and conditional probability tables.
An example of a standard probability block is:
probability("GasGauge" | "Gas", "BatteryPower") { ("yes", "high") 0.999 0.001; ("yes", "low") 0.850 0.150; ("yes", "medium") 0.000 1.000; ("no", "high") 0.000 1.000; ("no", "low") 0.000 1.000; ("no", "medium") 0.000 1.000; }As explained before, the symbol `,'' is ignored between tokens so it does not affect the list of variables given after the keyword probability. The variables however must be enclosed by parenthesis.
The example above uses the entry attribute, which is different from the other attributes in that it has no keyword. It simply starts with an opening parenthesis, and has a list of values for all the conditioning variables. After the closing parenthesis, a list of probability values for the first variable is given (the user must provide numbers that add to 1, but this is not mandatory).
The probability vectors can be listed in any order, since the names in parentheses uniquely identify the parent instantiation.
In addition to the entry attribute, the BIF 0.15 supports the concept of a default entry. So the above CPT could have been specified equivalently as:
probability("GasGauge" | "Gas", "BatteryPower") { default 0.000 1.000; ("yes", "low") 0.850 0.150; ("no", "medium") 0.000 1.000; }Note that each number is a separate token, so we can use ``,'' between numbers.
Another way to define a probability distribution is through the table attribute. The body of such attribute is a sequence of non-negative real numbers, in the counting order of the declared variables (if all variables were binary, we would say binary counting with least significant digit in the right). So, for the example above, we could simply say:
probability("GasGauge" | "Gas", "BatteryPower") { table 0.999 0.850 0.0 0.0 0.0 0.0 0.001 0.15 1.0 1.0 1.0 1.0; }
There are some subtle rules that regulate these declarations.
Here are some of the available examples:
Here is the dog-problem.bif network:
// Bayesian Network in the Interchange Format // Produced by BayesianNetworks package in JavaBayes // Output created Sun Nov 02 17:49:49 GMT+00:00 1997 // Bayesian network network "Dog-Problem" { //5 variables and 5 probability distributions property "credal-set constant-density-bounded 1.1" ; } variable "light-on" { //2 values type discrete[2] { "true" "false" }; property "position = (218, 195)" ; } variable "bowel-problem" { //2 values type discrete[2] { "true" "false" }; property "position = (335, 99)" ; } variable "dog-out" { //2 values type discrete[2] { "true" "false" }; property "position = (300, 195)" ; } variable "hear-bark" { //2 values type discrete[2] { "true" "false" }; property "position = (296, 268)" ; } variable "family-out" { //2 values type discrete[2] { "true" "false" }; property "position = (257, 99)" ; } probability ( "light-on" "family-out" ) { //2 variable(s) and 4 values table 0.6 0.05 0.4 0.95 ; } probability ( "bowel-problem" ) { //1 variable(s) and 2 values table 0.01 0.99 ; } probability ( "dog-out" "bowel-problem" "family-out" ) { //3 variable(s) and 8 values table 0.99 0.97 0.9 0.3 0.01 0.03 0.1 0.7 ; } probability ( "hear-bark" "dog-out" ) { //2 variable(s) and 4 values table 0.7 0.01 0.3 0.99 ; } probability ( "family-out" ) { //1 variable(s) and 2 values table 0.15 0.85 ; }
White spaces, tabs and newlines are ignored; the C/C++ style of comments is adopted. Two other characters are also ignored when they occur between tokens: ``,'' and ``|''. These characters can be used to separate variables in the definition of a probability distribution.
The basic unit of information is a block: a piece of text which starts with a keyword and ends with the end of an attribute list (to be explained later). Arbitrary characters are allowed between blocks. This allows the user to insert arbitrarily long comments outside the blocks. It also allows user-specific blocks and commands to be placed outside the standard blocks.
Other than blocks, the BIF 0.1 refers to three entities: words, non-negative integers and non-negative reals.
A word is a contiguous sequence of characters, with the restriction that the first character be a letter. Characters are letters plus numbers plus the underline symbol (_) plus the dash symbol (-).
A non-negative number is a sequence of numeric characters, containing a decimal point or an exponent or both.
A block is a unit of information. The general format of a block is:
block-type block-name { attribute-name attribute-value; attribute-name attribute-value; attribute-name attribute-value; }with as many attributes as necessary. The closing semicolon is mandatory after each attribute.
There are three possible blocks: network, variable and probability blocks.
network Robot-Planning { property version 1.1; property author Nobody; }
variable Leg { type discrete[2] { long, short }; property temporary yes; }
probability ( Leg | Arm ) { table 0.1 0.9 0.9 0.1; }
The blocks must be placed in the following order:
Several attributes are defined at this point: property, type, table, default and entry attributes (the entry attribute is not associated with any keyword).
The attribute property can appear in all types of blocks. A property is just a string of arbitrary text to be associated with a block. Examples of properties:
property size 12; property name "Trial number ten";Any text is valid between the keyword property and the ending semicolon. The idea is to store information that is specific to a particular system or network in the properties. Any number of property attributes can appear in a block.
The type attribute is specific to variable blocks. The property type lists the values of a discrete variable:
type discrete[ number-of-values ] { list-of-values };The number-of-values token is a non-negative integer which indicates how many different values this variable may assume (the size of the list-of-values). The list-of-values is a sequence of words, each one the name of a variable value.
There are attributes that are specific to probability blocks (these attributes are discussed in the next section):
JavaBayes uses a number of properties to load and save information about Bayesian networks:
variable light-on{//2 values type discrete[2] { true false }; property position = (218, 195) ; }and you want to indicate that variable light-on is observed with value true (i.e., light-on = true is the evidence). You do this with the observed property:
variable light-on{//2 values property observed true; type discrete[2] { true false }; property position = (218, 195) ; }You can set as many variables as you want as observed; the syntax is simple:
property observed [ observed-value ];
variable light-on{//2 values type discrete[2] { true false }; property position = (218, 195) ; }and you want to indicate that variable light-on is to be estimated. You can set light-on as a explanation variable, i.e., a variable which will be estimated. The meaning of a explanatory variable is that you would like to know which value for the variable would produce the highest probability or expectation. It is not necessarily true that you can operate on the variable and change it at will; it is just that you want to know which value would be best in the face of evidence. You do set explanatory variables with the explanation property:
variable light-on{//2 values property explanation; type discrete[2] { true false }; property position = (218, 195) ; }If you request JavaBayes to produce the ``best'' configuration for the explanation variables, JavaBayes will only process the variables that are marked through an explanation property. You can set as many variables as you want as explanation variables; the syntax is simple:
property explanation;
There are also properties that are related to robustness analysis in JavaBayes. Since robustness analysis is still an ongoing research project, the support for it is minimal. If you want to use robustness analysis now, please send me email. The properties related to robustness analysis always start with the keyword credal-set; if you are defining your own properties, please do not use this keyword.
Probability blocks are used to define the actual network topology and conditional probability tables.
An example of a standard probability block is:
probability(GasGauge | Gas, BatteryPower) { (yes, high) 0.999 0.001; (yes, low) 0.850 0.150; (yes, medium) 0.000 1.000; (no, high) 0.000 1.000; (no, low) 0.000 1.000; (no, medium) 0.000 1.000; }As explained before, the symbols ``|'' and ``,'' are ignored between tokens so they do not affect the list of variables given after the keyword probability. The variables however must be enclosed by parenthesis.
The example above uses the entry attribute, which is different from the other attributes in that it has no keyword. It simply starts with an opening parenthesis, and has a list of values for all the conditioning variables. After the closing parenthesis, a list of probability values for the first variable is given (the user must provide numbers that add to 1, but this is not mandatory).
The probability vectors can be listed in any order, since the names in parentheses uniquely identify the parent instantiation.
In addition to the entry attribute, the BIF 0.1 supports the concept of a default entry. So the above CPT could have been specified equivalently as:
probability(GasGauge | Gas, BatteryPower) { default 0.000 1.000; (yes, low) 0.850 0.150; (no, medium) 0.000 1.000; }Note that each number is a separate token, so we can use ``,'' and ``|'' between numbers; these symbols are ignored.
Another way to define a probability distribution is through the table attribute. The body of such attribute is a sequence of non-negative real numbers, in the counting order of the declared variables (if all variables were binary, we would say binary counting with least significant digit in the right). So, for the example above, we could simply say:
probability(GasGauge | Gas, BatteryPower) { table 0.999 0.850 0.0 0.0 0.0 0.0 0.001 0.15 1.0 1.0 1.0 1.0; }
There are some subtle rules that regulate these declarations.
Here is the dog-problem.bif network in BIF0.1:
// Bayesian Network in the Interchange Format // Produced by BayesianNetworks package in JavaBayes // Output created Tue Feb 25 12:55:25 1997 // Bayesian network network Internal-Network{ //5 variables and 5 probability distributions } variable light-on{//2 values type discrete[2] { true false }; property position = (218, 195) ; } variable bowel-problem{//2 values type discrete[2] { true false }; property position = (335, 99) ; } variable dog-out{//2 values type discrete[2] { true false }; property position = (300, 195) ; } variable hear-bark{//2 values type discrete[2] { true false }; property position = (296, 268) ; } variable family-out{//2 values type discrete[2] { true false }; property position = (257, 99) ; } probability ( light-on family-out ) { //2 variable(s) and 4 values table 0.6 0.05 0.4 0.95 ; } probability ( bowel-problem ) { //1 variable(s) and 2 values table 0.01 0.99 ; } probability ( dog-out bowel-problem family-out ) { //3 variable(s) and 8 values table 0.99 0.97 0.9 0.3 0.01 0.03 0.1 0.7 ; } probability ( hear-bark dog-out ) { //2 variable(s) and 4 values table 0.7 0.01 0.3 0.99 ; } probability ( family-out ) { //1 variable(s) and 2 values table 0.15 0.85 ; }
The XMLBIF format provides a different perspective for the storage and manipulation of Bayesian networks. Instead of focusing on a readable and simplified description of Bayesian networks, the XMLBIF format emphasizes ease of distribution through wide area networks. The XMLBIF format is defined through XML, a dialect of SGML that is used to specify formats. The advantage of XML is that it has industry-wide support, and many software developers plan to introduce parsers, search-engines, and browsers for XML. The power of XML is that it is a standard language for editing formats, and XMLBIF attempts to use XML to reduce to a minimum the burden of distributing graphical models to a large audience.
The XMLBIF format is actually quite similar to BIF 0.15, but it is stated in a manner that is XML-compliant. Note the similarity of XMLBIF to HTML; this happens because both HTML and XML are dialects of SGML.
White spaces, tabs and newlines are ignored. The XML style of comments and declarations is used to detect text that should be ignored: any character between <! and > is ignored. Note that XML comments should be enclosed by <!- and ->.
The XMLBIF format is defined by a set of XML-compliant tags. Other than XML tags, the XMLBIF 0.3 refers to three entities: words, non-negative integers and non-negative reals.
A word is a contiguous sequence of characters, with the restriction that the first character be a letter. Characters are letters plus numbers plus the underline symbol (_) plus the dash symbol (-).
A non-negative number is a sequence of numeric characters, containing a decimal point or an exponent or both.
Note that every XML file starts with the expression <?xml version="1.0"?>, indicating the XML version. Other attributes and directives can be contained within this tag; for example, the tag <?xml version="1.0" encoding="US-ASCII"?> specifies the file encoding. This initial tag is followed by any XML definitions and statements that define the DTD for the document (the DTD is always optional in XML).
The first tag of a XMLBIF 0.3 file is the <BIF> tag; the last tag is the closing </BIF> tag. All the information about the model is contained between these tags. There are three basic units of information: network, variable and probability densities.
A network is defined by its name, followed by a list of properties (optional), followed by a list of variables and probability densities. For example, a network may be defined as:
<BIF VERSION="0.3"> <NETWORK> <NAME>Dog-Problem</NAME> <PROPERTY>date Sunday, 19 July, 1998</PROPERTY> <PROPERTY>author John</PROPERTY> variables and probabilities go here </NETWORK> </BIF>The VERSION attribute in the BIF tag is mandatory.
Variables are defined by their names, types and properties:
<VARIABLE TYPE="chance"> <NAME>light-on</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>position = (73, 165)</PROPERTY> </VARIABLE>
Conditional probability densities can be specified in various ways inside the DEFINITION tag. One example is:
<DEFINITION> <FOR>hear-bark</FOR> <GIVEN>dog-out</GIVEN> <TABLE>0.7 0.01 0.3 0.99 </TABLE> </DEFINITION>
There is no mandatory order of variable and probability blocks.
A property is just a string of arbitrary text to be associated with a block. Examples of properties:
<PROPERTY>size 12</PROPERTY> <PROPERTY>comment Trial number ten</PROPERTY>Any text is valid in the string inside the PROPERTY opening and closing tags. The idea is to store information that is specific to a particular system or network in the properties. Any number of property attributes can appear in a block.
A variable is defined by a NAME tag (with the TYPE attribute), and its possible OUTCOMES:
<VARIABLE TYPE="chance"> <NAME>light-on</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>position = (73, 165)</PROPERTY> </VARIABLE>Currently the content of a TYPE attribute must be the keyword either ``chance'' or ``decision'' or ``utility''.
The TABLE tag is specific to the DEFINITION block (note that a definition can be a probability distribution, a set of decision values or a set of utility values, depending on the TYPE attributes of the referred variable). DEFINITION blocks are used to define the actual network topology, by specifying conditional probability tables.
An example of a standard probability block is:
<DEFINITION> <FOR>GasGauge</FOR> <GIVEN>BatteryPower</GIVEN> <GIVEN>GasInTank</GIVEN> <TABLE>1.0 0.0 0.2 0.0 0.0 1.0 0.8 1.0 </TABLE> </DEFINITION>for a variable GasGauge that is defined with TYPE equal to ``chance''. The body of the TABLE tag is a sequence of non-negative real numbers, in the counting order of the declared variables (if all variables were binary, we would say binary counting with least significant digit in the right). If multiple table declarations exist, only the last one is valid.
JavaBayes uses a number of properties to load and save information about Bayesian networks:
<VARIABLE TYPE="chance"> <NAME>light-on</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>position = (73, 165)</PROPERTY></VARIABLE>and you want to indicate that variable light-on is observed with value true (i.e., light-on = true is the evidence). You do this with the observed property:
<VARIABLE TYPE="chance"> <NAME>light-on</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>observed true</PROPERTY> <PROPERTY>position = (73, 165)</PROPERTY></VARIABLE>You can set as many variables as you want as observed; the syntax is simple:
<PROPERTY>observed (observed-value)</PROPERTY>
<VARIABLE TYPE="chance"> <NAME>light-on</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>explanation</PROPERTY> <PROPERTY>position = (73, 165)</PROPERTY></VARIABLE> </VARIABLE>If you request JavaBayes to produce the ``best'' configuration for the explanation variables, JavaBayes will only process the variables that are marked through an explanation property.
There are also properties that are related to robustness analysis in JavaBayes. Since robustness analysis is still an ongoing research project, the support for it is minimal. If you want to use robustness analysis now, please send me email. The properties related to robustness analysis always start with the keyword credal-set; if you are defining your own properties, please do not use this keyword.
Here are some of the available examples:
Here is the dog-problem.xml network:
<?xml version="1.0" encoding="US-ASCII"?> <!-- Bayesian network in XMLBIF v0.3 (BayesNet Interchange Format) Produced by JavaBayes (http://www.cs.cmu.edu/~javabayes/ Output created Wed Aug 12 21:16:40 GMT+01:00 1998 --> <!-- DTD for the XMLBIF 0.3 format --> <!DOCTYPE BIF [ <!ELEMENT BIF ( NETWORK )*> <!ATTLIST BIF VERSION CDATA #REQUIRED> <!ELEMENT NETWORK ( NAME, ( PROPERTY | VARIABLE | DEFINITION )* )> <!ELEMENT NAME (#PCDATA)> <!ELEMENT VARIABLE ( NAME, ( OUTCOME | PROPERTY )* ) > <!ATTLIST VARIABLE TYPE (chance|decision|utility) "chance"> <!ELEMENT OUTCOME (#PCDATA)> <!ELEMENT DEFINITION ( FOR | GIVEN | TABLE | PROPERTY )* > <!ELEMENT FOR (#PCDATA)> <!ELEMENT GIVEN (#PCDATA)> <!ELEMENT TABLE (#PCDATA)> <!ELEMENT PROPERTY (#PCDATA)> ]> <BIF VERSION="0.3"> <NETWORK> <NAME>Dog-Problem</NAME> <!-- Variables --> <VARIABLE TYPE="chance"> <NAME>light-on</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>position = (73, 165)</PROPERTY> </VARIABLE> <VARIABLE TYPE="chance"> <NAME>bowel-problem</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>position = (190, 69)</PROPERTY> </VARIABLE> <VARIABLE TYPE="chance"> <NAME>dog-out</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>position = (155, 165)</PROPERTY> </VARIABLE> <VARIABLE TYPE="chance"> <NAME>hear-bark</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>position = (154, 241)</PROPERTY> </VARIABLE> <VARIABLE TYPE="chance"> <NAME>family-out</NAME> <OUTCOME>true</OUTCOME> <OUTCOME>false</OUTCOME> <PROPERTY>position = (112, 69)</PROPERTY> </VARIABLE> <!-- Probability distributions --> <DEFINITION> <FOR>light-on</FOR> <GIVEN>family-out</GIVEN> <TABLE>0.6 0.05 0.4 0.95 </TABLE> </DEFINITION> <DEFINITION> <FOR>bowel-problem</FOR> <TABLE>0.01 0.99 </TABLE> </DEFINITION> <DEFINITION> <FOR>dog-out</FOR> <GIVEN>bowel-problem</GIVEN> <GIVEN>family-out</GIVEN> <TABLE>0.99 0.97 0.9 0.3 0.01 0.03 0.1 0.7 </TABLE> </DEFINITION> <DEFINITION> <FOR>hear-bark</FOR> <GIVEN>dog-out</GIVEN> <TABLE>0.7 0.01 0.3 0.99 </TABLE> </DEFINITION> <DEFINITION> <FOR>family-out</FOR> <TABLE>0.15 0.85 </TABLE> </DEFINITION> </NETWORK> </BIF>