Both libraries (libbbrc and liblast) within the Fminer2 package are now usable in a multinomial context. The efficient pruning technique specific to libbbrc has been generalized to this setting.
Correlation to Multiple Target Classes
In short terms, statistical metric pruning works as follows in the binary case: Any pattern (here: subgraph) may occur in instances of class 1 and instances of class 2.
Pattern ’s value is thus determined by and , referred to as . Importantly, any pattern larger than, i.e. supergraph of, can never have a value higher than the maximum of and (referred to as upper bound). This is due to the convexity of the function and allows to prune the search once the upper bound falls below a certain treshold (for details see the paper).
Now consider the multinomial case (three classes for the sake of presentation):
The setting is analoguous: No refinement of pattern with occurrences can have a value larger than the maximum value associated with the red vertices (vertice (0,0,y3) not shown, but must be also considered).
One would have to check more (actually many, here ) values to get the maximum red vertice, so this is feasible for a low number of classes only.
Usage from the Command Line
Fminer supports currently up to five different classes. You can use it from ruby, python, java, and C++ as those APIs are integrated into the source and supported by the Makefile. Download and install the fminer2 package according to the BBRC instructions and LAST-PM instructions, respectively.
Run fminer from the main directory with the standard front-end application.
unset FMINER_LAZAR export FMINER_PVALUES=1 export FMINER_SMARTS=1 export FMINER_P_VALUES=1 fminer/fminer libbbrc/libbbrc.so -f5 liblast/test/hamster_carcinogenicity.smi \ liblast/test/hamster_carcinogenicity-multinomial.class
You will receive output such as the following:
- [ "[#6&a]:[#6&a](-[#8&A])(:[#6&a])", 0.9961, [ 20674, 20911 ], , [ 21212, 21219, 21250 ] ]
This YAML string denotes fragment [#6&a]:[#6&a](-[#8&A])(:[#6&a]) in SMARTS notation, having a p-value of 0.9961, as inferred from the statistics, and occurring two times in the first class (in molecules 20674, 20911), zero times in the second, and three times in the third class (the classes are ordered in alphanumeric descending order). YAML is the standard output format of fminer, please check out the fminer README.
The occurrences (molecules where the fragments occur) have the following SMILES codes (you can check that from the .smi file):
To verify the output, import the occurrences and the SMARTS pattern into the depictmatch-application at Daylight like so:
and hit the “Depict” button:
Integration into Opentox-Webservices
The Fminer functionality for multinomial environments is also available as a web service within Opentox. No special switches are necessary, the detection works automatically. A demo dataset with three class labels is available (excerpt from ISS cancer database).
# Upload Data curl -X POST \ -F "file=@ISSCAN-multi.csv;type=text/csv" \ "http://ot-test.in-silico.ch/dataset" # retrieve dataset URI from task (not shown) # Run Fminer on dataset URI curl -X POST \ --data-urlencode "dataset_uri=..." \ --data-urlencode "prediction_feature=..." \ "http://ot-test.in-silico.ch/algorithm/fminer/bbrc" # retrieve dataset from task (not shown)
In the output, you will find the fragments as usual (including, e.g. p-values and occurrences in compounds as feature values), however, the effect value is now distributed on all three classes, instead of two, e.g.
# Effect can range across all class labels (here 0-2) http://www.opentox.org/api/1.1#effect: "2" # http://www.opentox.org/api/1.1#smarts: "[#8&A]=[#16&A]-[#6&a]" http://www.opentox.org/api/1.1#pValue: 0.9927
Note: This example dataset was used just for demonstration. You can use all Opentox compliant dataset URIs (e.g. from Ambit). Moreover, BBRC was used for demonstration, but LAST-PM supports the same functionality (please read the post about calling BBRC and LAST-PM).