libbbrc
LibBbrc (libbbrc)

LibBbrc

This is the Bbrc library, available at http://github.com/amaunz/fminer2/tree/master , subdirectory libbbrc (see below for download and build instructions).
The Fminer frontend application is available from http://github.com/amaunz/fminer2/tree/master , subdirectory fminer.
You may download the scientific documentation from http://cs.maunz.de . The paper is entitled "Large Scale Graph Mining using Backbone Refinement Classes".
Supporting information is available here: http://bbrc.maunz.de .
Note that scientific evaluation was only done for the settings given in the paper. Thus, for other settings no results can be guaranteed. The switches for the leave-one-out crossvalidation in the paper were:

  • Minimum frequency of 6
  • Disabled aromatic perception

There is a package that you can use to automatically reproduce the KDD 2009 results on BBRC features, see http://github.com/amaunz/fminer2-transition.

Contents

Contact details are located at the end of this page.



Abstract

We present a new approach to large-scale graph mining based on so-called backbone refinement classes. The method efficiently mines tree-shaped subgraph descriptors under minimum frequency and significance constraints, using classes of fragments to reduce feature set size and running times. The classes are defined in terms of fragments sharing a common backbone. The method is able to optimize structural inter-feature entropy as opposed to occurrences, which is characteristic for open or closed fragment mining. In the experiments, the proposed method reduces feature set sizes by >90 % and >30 % compared to complete tree mining and open tree mining, respectively. Evaluation using crossvalidation runs shows that their classification accuracy is similar to the complete set of trees but significantly better than that of open trees. Compared to open or closed fragment mining, a large part of the search space can be pruned due to an improved statistical constraint (dynamic upper bound adjustment), which is also confirmed in the experiments in lower running times compared to ordinary (static) upper bound pruning. Further analysis using large-scale datasets yields insight into important properties of the proposed descriptors, such as the dataset coverage and the class size represented by each descriptor. A final cross-validation run confirms that the novel descriptors render large training sets feasible which previously might have been intractable.

Euclidean embedding of a Backbone Refinement Class Descriptors and Salmonella Mutagenicity data
Co-occurrence-based 2D embedding of molecules and backbone refinement class features showing close to perfect separation of target classes along top left to bottom right. (De)activating features are (red) green, (In)active instances (salmon) blue. Data: CPDB salmonella mutagenicity; Euclidean embedding: Schulz et. al,.
Click here for a flash-animated version, indicating occurrences.

License

LibBbrc is licensed under the terms of the GNU General Public License (GPL, see LICENSE). LibBbrc is derived from (i.e. includes code from) the following project, licensed under GPL:

LibBbrc uses (i.e. links to) the following projects, also licensed under GPL:

These licensing conditions mean essentially that your published program may only use (i.e., link to) and/or derive code from LibBbrc under the condition that your source code is also freely available. This is to secure public availability and freedom of use.



Installation

LibBbrc is a library, written in C++. It dynamically links to OpenBabel and GSL libraries. This section describes the installation of both the library and the frontend application for Linux.

Compiling from source

Linux SO: install development tools (gcc, g++, GNU make) and GSL as well as OpenBabel development package, then compile LibBbrc. On Ubuntu, you can e.g. do it like this:

  • Install build tools and GSL:
        apt-get install build-essential             # development tools
        apt-get install libgsl0-dev                 # GSL binary lib and headers
    
  • OpenBabel: follow the installation instrucations at http://openbabel.org/wiki/Install_(source_code) to build yourself after doing:
        apt-get build-dep libopenbabel-dev          # build dependencies for OB
        apt-get source libopenbabel-dev             # extracts OB source to the current dir
    
    or try the repository version:
        apt-get install libopenbabel-dev            # OB binary lib and headers
    
  • Download the library source code by clicking on "Download Source" (with git: git clone git://github.com/amaunz/fminer2.git) and cd to libbbrc subdirectory. Use ./configure to configure the Makefile automatically or, in the Makefile, adjust the include (-I) and linker (-L) flags. Run make.
  • Cd to fminer subdirectory. Use ./configure to configure the Makefile automatically or, in the Makefile, adjust the include (-I) and linker (-L) flags. Run make.
  • To create this documentation with doxygen, type 'make doc'. The documentation explains API, constructor usage and options.

Language Portability

The API can be made available to other languages.

The Makefile features a target that creates ruby bindings. On Ubuntu, you can e.g. do this:

  • sudo apt-get install ruby1.8-dev
  • Use ./configure <version> to configure the Makefile automatically or, adjust the include flags (-I) in the Makefile in the line INCLUDE_RB = ... so that the directory contains file ruby.h. Also, let RUBY = ... point to the right executable.
  • Run make ruby. Use make rbtest to test. The configuration was tested with swig 1.3.40 and Ruby 1.8.

The Makefile features a target that creates python bindings. On Ubuntu, you can e.g. do this:

  • sudo apt-get install python2.7-dev
  • Adjust the include flags (-I) in the Makefile in the line INCLUDE_PY = ... so that the directory contains file Python.h. Also, let PYTHON = ... point to the right executable.
  • Run make python. Use make pytest to test. The configuration was tested with Python 2.7.

The Makefile features a target that creates java bindings. On Ubuntu, you can e.g. do this:

  • sudo apt-get install openjdk-6-jdk
  • Adjust the include flags (-I) in the Makefile in the line INCLUDE_JAVA = ... so that the directory contains file jni.h. Also, make sure that the paths in LD_PRELOAD=... in the jtest line are correct.
  • Run make java. Use make jtest to test. The configuration was tested with OpenJDK 1.6.

Important: There are swig interface files (*.i) and pre-configured swig output files (*.cxx). You need to re-create those output files if you are deploying for newer versions of the target languages, and you can find the necessary swig calls in the Makefile (commented out).

Guidance on Using (Lib)Bbrc

Bbrc descriptors are a sparse collection of structurally dissimilar, class-correlated descriptors. You must provide input molecules in SMILES format and a target class for every molecule (see examples below), maximally five different classes. You can also provide numeric values instead of target classes. In that case you must use SetRegression(true).
Note: Always do SetRegression(true) first, before adding any numeric value.

Most setting are sensible by default, see description of constructors and objects below. I would suggest to manipulate the minimum frequency only at first. The number of fragments output should not be more than a multitude of the number of input graphs. For minimum frequency, LibBbrc does not support percentage values. You will have to calculate absolute numbers.

Environment Variables

FMINER_SMARTS: Produce output in SMARTS format. In this case, each line is a YAML sequence, containing SMARTS fragment, p-value, and sequences denoting class occurrences (line numbers in Smiles file):

  - [ smarts,    p_chisq,    occ_list_class1,    occ_list_class2,    ... ]

Documentation for YAML can be found at: http://yaml.org/spec/cvs/current.html# (e.g. export FMINER_SMARTS=1).
FMINER_LAZAR : Produce output in linfrag format which can be used as input to Lazar (e.g. export FMINER_LAZAR=1).
FMINER_PVALUES : Produce p-values instead of chi-square values (e.g. export FMINER_PVALUES=1).
FMINER_NO_AROMATIC_WC: Disallow aromatic wildcard bonds on aliphatic bonds, when aromatic perception was switched off ('-a') (e.g. export FMINER_NO_AROMATIC_WC=1).
FMINER_SILENT : Redirect STDERR (debug output) of fminer to local file 'fminer_debug.txt'
FMINER_NR_HITS : Display (in the occurrence lists) the number of times each fragment occurs in a molecule.

Note: The value you set the environment variables to is irrelevant. Use unset to disable the environment variables, e.g. unset FMINER_LAZAR.

Examples using the LibBbrc API

LibBbrc uses the 'singleton' design pattern known from software engineering, i.e., class instantiation is restricted to one object. To empty the database after a run and before feeding new compounds, use the Bbrc::Reset() routine.

The following code demonstrate the use of the Bbrc API from C++, python, and ruby. It feeds a set of class-labelled molecules in SMILES format (the API currently allows no gSpan input, use the frontend application for that) and calculates a vector of fragments along with statistical relevance and occurrences and prints them out. Every root node corresponds to a single chemical element. The output consists of gSpan graphs.

C++

This example uses libBbrc in a C++ program. The example assumes that you have created the C++ library using make.

 #include "bbrc.h"
 #include <iostream>
 #include <string.h>
 using namespace std;

 Bbrc* MyFminer; // global singleton instance
 int main(int argc, char *argv[], char *envp) {
   MyFminer= new Bbrc();
   // Toy example: special settings for mining all fragments
   MyFminer->SetChisqSig(0); // use no significance constraint
   MyFminer->SetRefineSingles(true); // refine structures with support 1
   MyFminer->SetConsoleOut(false);
   // Add compounds below. IMPORTANT! Do not change settings after adding compounds!
   MyFminer->AddCompound ("COC1=CC=C(C=C1)C2=NC(=C([NH]2)C3=CC=CC=C3)C4=CC=CC=C4" , 1);
   MyFminer->AddCompound ("O=C1NC(=S)NC(=O)C1C(=O)NC2=CC=CC=C2" , 2);
      // ... continue adding compounds
   MyFminer->AddActivity((bool) 1.0, 1); // 1.0 denotes one class in this example,
   MyFminer->AddActivity((bool) 0.0, 2); // 0.0 denotes the other class (you can use more than two classes, max 5).
      // ... continue adding activities (true for active, false for inactive)
   cerr << MyFminer->GetNoCompounds() << " compounds" << endl;
   // gather results for every root node in vector instead of immediate output
   for ( int j = 0; j < (int) MyFminer->GetNoRootNodes(); j++ ) {
      vector<string>* result = MyFminer->MineRoot(j);
      for( int i = 0; i < result->size(); i++) {
        cout << (*result)[i] << endl;
      }
   }
   delete MyFminer; // or call Reset() and start over.
 }

Ruby

This example assumes that you have created ruby bindings using make ruby.

 require 'bbrc'
 MyFminer = Bbrc::Bbrc.new() # global singleton instance
 # Toy example: special settings for mining all fragments
 # use no significance constraint
 MyFminer.SetChisqSig(0) 
 # refine structures with support 1
 MyFminer.SetRefineSingles(true) 
 # gather results for every root node in vector instead of immediate output
 MyFminer.SetConsoleOut(false)
   # Add compounds below. IMPORTANT! Do not change settings after adding compounds!
 MyFminer.AddCompound("COC1=CC=C(C=C1)C2=NC(=C([NH]2)C3=CC=CC=C3)C4=CC=CC=C4" , 1)
 MyFminer.AddCompound("O=C1NC(=S)NC(=O)C1C(=O)NC2=CC=CC=C2" , 2)
    # ... continue adding compounds
 MyFminer.AddActivity(1,0, 1) # 1.0 denotes one class in this example...
 MyFminer.AddActivity(0.0, 2) # 0.0 denotes the other class (you can use more than two classes, max 5).
    # ... continue adding activities (true for active, false for inactive)
 print MyFminer.GetNoCompounds()  
 puts " compounds"
 (0 .. MyFminer.GetNoRootNodes()-1).each do |j|
    result = MyFminer.MineRoot(j)
    puts "Results"
    result.each do |res|
        puts res
   end
 end
 # Call Reset() to start over.

Python

This example assumes that you have created python bindings using make python.

 import bbrc
 MyFminer = bbrc.Bbrc() # global singleton instance
 # Toy example: special settings for mining all fragments
 # use no significance constraint
 MyFminer.SetChisqSig(0) 
 # refine structures with support 1
 MyFminer.SetRefineSingles(1) 
 MyFminer.SetConsoleOut(0)
   # Add compounds below. IMPORTANT! Do not change settings after adding compounds!
 MyFminer.AddCompound("COC1=CC=C(C=C1)C2=NC(=C([NH]2)C3=CC=CC=C3)C4=CC=CC=C4" , 1)
 MyFminer.AddCompound("O=C1NC(=S)NC(=O)C1C(=O)NC2=CC=CC=C2" , 2)
 # ... continue adding compounds
 MyFminer.AddActivity(1.0, 1) # 1.0 denotes one class in this example...
 MyFminer.AddActivity(0.0, 2) # 0.0 denotes the other class (you can use more than two classes, max 5).
 # ... continue adding activities (true for active, false for inactive)
 print repr(MyFminer.GetNoCompounds()) + ' compounds.'
 # gather results for every root node in vector instead of immediate output
 for j in range(0, MyFminer.GetNoRootNodes()):
      result = MyFminer.MineRoot(j);
      for i in range(0, result.size()):
                  print result[i];
 # Call Reset() to start over.

Java

This example assumes that you have created java bindings using make java.

 public class test {
     public static void main(String args[]) {
        System.loadLibrary("bbrc");
        Bbrc MyFminer;
        MyFminer = new Bbrc();
        // Toy example: special settings for mining all fragments
        MyFminer.SetChisqSig(0); // use no significance constraint
        MyFminer.SetRefineSingles(true); // refine structures with support 1
        MyFminer.SetConsoleOut(false);
        // Add compounds below. IMPORTANT! DO NOT CHANGE SETTINGS AFTER ADDING COMPOUNDS!
        MyFminer.AddCompound ("COC1=CC=C(C=C1)C2=NC(=C([NH]2)C3=CC=CC=C3)C4=CC=CC=C4", 1);
        MyFminer.AddCompound ("O=C1NC(=S)NC(=O)C1C(=O)NC2=CC=CC=C2", 2);
        // ... continue adding compounds
        MyFminer.AddActivity(1.0F, 1);
        MyFminer.AddActivity(0.0F, 2);
        // ... continue adding activities (1.0F for active, 0.0F for inactive)
        System.out.println(MyFminer.GetNoCompounds() + " compounds");
        // gather results for every root node in vector instead of immediate output
        for (int j = 0; j < (int) MyFminer.GetNoRootNodes(); j++)
        {
           SVector result = MyFminer.MineRoot(j);
           for(int i = 0; i < result.size(); i++)
           {
             System.out.println(result.get(i));
           }
        }
        MyFminer = null;
     }
 }

Description of Constructors and Options

For the purpose of demonstration we used a toy database of two compounds and an unusual parameter configuration. Please note, that in general the defaults set by the standard constructor are sensible for most databases. They switch on BBRC mining with upper bound pruning for 95% significance and a minimum frequency of 2. The complete standard settings are:

  • Minimum frequency: 2
  • Feature type: Trees
  • Mine BBRCs: true
  • Dynamic upper bound: true
  • Significance level: 95%
  • Console output: false
  • Aromatic perception: true
  • Refine Singles: false
  • Do Output: true
  • Separate BBRCs in output by blank line/vector: false
  Bbrc ();

There also exist more flexible constructors:

 Bbrc (int type, unsigned int minfreq);
 Bbrc (int type, unsigned int minfreq, float chisq_val, bool do_backbone);

It is recommended to increase minimum frequency as a first step when too many features are generated.



Contact

Dipl.-Inf. Andreas Maunz
Institute for Physics
Hermann-Herder-Str. 3
79104 Freiburg, Germany
Email: maunza@fdm.uni-freiburg.de
Web: http://cs.maunz.de

Author:
Andreas Maunz, 2010