Both libraries (libbbrc and liblast) within the Fminer2 package are now usable in a multinomial context. The efficient pruning technique specific to libbbrc has been generalized to this setting.

Correlation to Multiple Target Classes

In short terms, statistical metric pruning works as follows in the binary case: Any pattern (here: subgraph) may occur in instances of class 1 and instances of class 2.

Two Target Classes

Two Target Classes

Pattern ’s value is thus determined by and , referred to as . Importantly, any pattern larger than, i.e. supergraph of, can never have a value higher than the maximum of and (referred to as upper bound). This is due to the convexity of the function and allows to prune the search once the upper bound falls below a certain treshold (for details see the paper).

Now consider the multinomial case (three classes for the sake of presentation):

Multinomial Setting (three classes)

Multinomial Setting (three classes)

The setting is analoguous: No refinement of pattern with occurrences can have a value larger than the maximum value associated with the red vertices (vertice (0,0,y3) not shown, but must be also considered).

One would have to check more (actually many, here ) values  to get the maximum red vertice, so this is feasible for a low number of classes only.

Usage from the Command Line

Fminer supports currently up to five different classes. You can use it from ruby, python, java, and C++ as those APIs are integrated into the source and supported by the Makefile. Download and install the fminer2 package according to the BBRC instructions and LAST-PM instructions, respectively.

Run fminer from the main directory with the standard front-end application.

fminer/fminer libbbrc/ -f5  liblast/test/hamster_carcinogenicity.smi \

You will receive output such as the following:

- [ "[#6&a]:[#6&a](-[#8&A])(:[#6&a])", 0.9961, [ 20674, 20911 ], [], [ 21212, 21219, 21250 ] ]

This YAML string denotes fragment [#6&a]:[#6&a](-[#8&A])(:[#6&a]) in SMARTS notation, having a p-value of 0.9961, as inferred from the statistics, and occurring two times in the first class (in molecules 20674, 20911), zero times in the second, and three times in the third class (the classes are ordered in alphanumeric descending order). YAML is the standard output format of fminer, please check out the fminer README.

The occurrences (molecules where the fragments occur) have the following SMILES codes (you can check that from the .smi file):


To verify the output, import the occurrences and the SMARTS pattern into the depictmatch-application at Daylight like so:



and hit the “Depict” button:

Occurrences of SMARTS in Molecules.

Occurrences of SMARTS in Molecules.

Integration into Opentox-Webservices

The Fminer functionality for multinomial environments is also available as a web service within Opentox. No special switches are necessary, the detection works automatically. A demo dataset with three class labels is available (excerpt from ISS cancer database).

# Upload Data
curl -X POST \
-F "file=@ISSCAN-multi.csv;type=text/csv" \
# retrieve dataset URI from task (not shown)

# Run Fminer on dataset URI
curl -X POST \
--data-urlencode "dataset_uri=..." \
--data-urlencode "prediction_feature=..." \
# retrieve dataset from task (not shown)

In the output, you will find the fragments as usual (including, e.g. p-values and occurrences in compounds as feature values), however, the effect value is now distributed on all three classes, instead of two, e.g.

# Effect can range across all class labels (here 0-2) "2" # "[#8&A]=[#16&A]-[#6&a]" 0.9927

Note: This example dataset was used just for demonstration. You can use all Opentox compliant dataset URIs (e.g. from Ambit). Moreover, BBRC was used for demonstration, but LAST-PM supports the same functionality (please read the post about calling BBRC and LAST-PM).

This entry was posted in Opentox. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *