liblast
LibLast (liblast)

LibLast

This is the Last library, available at http://github.com/amaunz/fminer2/tree/master , subdirectory liblast (see below for download and build instructions).
The Fminer frontend application is available from http://github.com/amaunz/fminer2/tree/master , subdirectory fminer.
Supporting information is available here: http://last-pm.maunz.de .

Contents

Contact details are located at the end of this page.



Abstract

Pattern mining methods for graph data have largely been restricted to ground features, such as frequent or correlated subgraphs. Kazius et al. have demonstrated the use of elaborate patterns in the biochemical domain, summarizing several ground features at once. Such patterns bear the potential to reveal latent information not present in any individual ground feature. However, those patterns were handcrafted by chemical experts. In this paper, we present a data-driven bottom-up method for pattern generation that takes advantage of the embedding relationships among individual ground features. The method works fully automatically and does not require data preprocessing (e.g., to introduce abstract node or edge labels). Controlling the process of generating ground features, it is possible to align them canonically and merge (stack) them, yielding a weighted edge graph. In a subsequent step, the subgraph features are compressed by singular value decomposition (SVD). Our experiments show that the resulting features are chemically meaningful and that they can enable substantial performance improvements on chemical datasets that have been problematic so far for graph mining approaches.

License

LibLast is licensed under the terms of the GNU General Public License (GPL, see LICENSE). LibLast is derived from (i.e. includes code from) the following project, licensed under GPL:

LibLast uses (i.e. links to) the following projects, also licensed under GPL:

These licensing conditions mean essentially that your published program may only use (i.e., link to) and/or derive code from LibLast under the condition that your source code is also freely available. This is to secure public availability and freedom of use.



Installation

LibLast is a library, written in C++. It dynamically links to OpenBabel and GSL libraries. This section describes the installation of both the library and the frontend application for Linux.

Compiling from source

Linux SO: install development tools (gcc, g++, GNU make) and GSL as well as OpenBabel development package, then compile LibLast. On Ubuntu, you can e.g. do it like this:

  • Install build tools and GSL:
        apt-get install build-essential             # development tools
        apt-get install libgsl0-dev                 # GSL binary lib and headers
    
  • OpenBabel: follow the installation instrucations to build yourself after doing:
        apt-get build-dep libopenbabel-dev          # build dependencies for OB
        apt-get source libopenbabel-dev             # extracts OB source to the current dir
    
    or try the repository version:
        apt-get install libopenbabel-dev            # OB binary lib and headers
    
    Note: you will need the former if you want to use LAST-UTILS, see the INSTALL there.
  • Download the library source code by clicking on "Download Source". (with git: git clone git://github.com/amaunz/fminer2.git) and cd to liblast subdirectory. Use ./configure to configure the Makefile automatically or, in the Makefile, adjust the include (-I) and linker (-L) flags. Run make.
  • Cd to fminer subdirectory. Use ./configure to configure the Makefile automatically or, in the Makefile, adjust the include (-I) and linker (-L) flags. Run make.
  • To create this documentation with doxygen, type 'make doc'. The documentation explains API, constructor usage and options.

Language Portability

The API can be made available to other languages.

The Makefile features a target that creates ruby bindings. On Ubuntu, you can e.g. do this:

  • sudo apt-get install ruby1.8-dev
  • Use ./configure <version> to configure the Makefile automatically or, adjust the include flags (-I) in the Makefile in the line INCLUDE_RB = ... so that the directory contains file ruby.h. Also, let RUBY = ... point to the right executable.
  • Run make ruby. Use make rbtest to test. The configuration was tested with Ruby 1.8.

The Makefile features a target that creates python bindings. On Ubuntu, you can e.g. do this:

  • sudo apt-get install python2.7-dev
  • Adjust the include flags (-I) in the Makefile in the line INCLUDE_PY = ... so that the directory contains file Python.h. Also, let PYTHON = ... point to the right executable.
  • Run make python. Use make pytest to test. The configuration was tested with Python 2.7.

The Makefile features a target that creates java bindings. On Ubuntu, you can e.g. do this:

  • sudo apt-get install openjdk-6-jdk
  • Adjust the include flags (-I) in the Makefile in the line INCLUDE_JAVA = ... so that the directory contains file jni.h. Also, make sure that the paths in LD_PRELOAD=... in the jtest line are correct.
  • Run make java. Use make jtest to test. The configuration was tested with OpenJDK 1.6.

Important: There are swig interface files (*.i) and pre-configured swig output files (*.cxx). You need to re-create those output files if you are deploying for newer versions of the target languages, and you can find the necessary swig calls in the Makefile (commented out).

Guidance on Using (Lib)Last

Last-PM descriptors are a sparse collection of latent (hidden), class-correlated motifs in the data. You must provide input molecules in SMILES format and a target class for every molecule (see examples below), maximally five different classes. You can also provide numeric values instead of target classes. In that case you must use SetRegression(true).
Note: Always do SetRegression(true) first, before adding any numeric value.

Most setting are sensible by default, see description of constructors and objects below. I would suggest to manipulate the minimum frequency only at first. The number of fragments output should not be more than a multitude of the number of input graphs. For minimum frequency, LibLast does not support percentage values. You will have to calculate absolute numbers.

Examples using the LibLast API

LibLast uses the 'singleton' design pattern known from software engineering, i.e., class instantiation is restricted to one object. To empty the database after a run and before feeding new compounds, use the Last::Reset() routine.

The following code demonstrate the use of the Last API from C++, python, and ruby. It feeds a set of class-labelled molecules in SMILES format (the API currently allows no gSpan input, use the frontend application for that) and calculates a set of latent fragments and prints them out. Every root node corresponds to a single chemical element. The output consists of GraphML which can be postprocessed to SMARTS patterns using the LAST-UTILS.

Environment Variables

FMINER_SILENT : Redirect STDERR (debug output) of fminer to local file 'fminer_debug.txt'

Note: The value you set the environment variables to is irrelevant. Use unset to disable the environment variables, e.g. unset FMINER_LAZAR.

C++

This example uses libLast in a C++ program. The example assumes that you have created the C++ library using make.

 #include "last.h"
 #include <iostream>
 #include <string.h>
 using namespace std;

 Last* MyFminer; // global singleton instance.
 int main(int argc, char *argv[], char *envp[]) {
     MyFminer= new Last();
     MyFminer = Last::Last.new();
     MyFminer->SetConsoleOut(false);
     // Add compounds below. IMPORTANT! Do not change settings after adding compounds!
     MyFminer->AddCompound ("O=C(C(C(C=C3)=CC=C3O)=CO2)C1=C2C=C(O)C=C1O" , 1);
     MyFminer->AddCompound ("Oc1ccc(cc1)[C@@H]2Cc3ccc(O)cc3OC2" , 2);
     MyFminer->AddCompound ("O=C1C(C3=CC=C(O)C=C3)=COC2=C1C=CC(O)=C2" , 3);
     MyFminer->AddCompound ("O=C1C(C3=CC=C(OC)C=C3)=COC2=C1C=CC(O)=C2" , 4);
     MyFminer->AddCompound ("OC1=CC=C(CCCCCCCC)C=C1" , 5);
     MyFminer->AddCompound ("C1(C=CC=CC=1C(=C(Cl)Cl)C2=CC=C(C=C2)Cl)Cl" , 6);
     MyFminer->AddCompound ("O=C(C1=C(C=CC=C1)C(=O)OCC(CCCC)CC)OCC(CCCC)CC" , 7);
     MyFminer->AddCompound ("Oc1cc(O)cc2CCCCC[C@@H](O)CCC[C@H](C)OC(=O)c12" , 8);
     MyFminer->AddCompound ("O=C1C2=C(C=C(C=C2O)O)OC(=C1O)C3=CC(=C(C=C3)O)O" , 9);
     MyFminer->AddCompound ("C1(=C(C(=O)C2=C(O1)C=C(C=C2)O)O)C3=CC(O)=C(C=C3)O" , 10);
     //... continue adding compounds
     MyFminer->AddActivity((bool) 1.0, 1); // 1.0 denotes one class in this example,
     MyFminer->AddActivity((bool) 1.0, 2);
     MyFminer->AddActivity((bool) 1.0, 3);
     MyFminer->AddActivity((bool) 1.0, 4);
     MyFminer->AddActivity((bool) 1.0, 5);
     MyFminer->AddActivity((bool) 1.0, 6);
     MyFminer->AddActivity((bool) 1.0, 7);
     MyFminer->AddActivity((bool) 1.0, 8);
     MyFminer->AddActivity((bool) 0.0, 9); // 0.0 the other class (you can use more than two classes, max 5).
     MyFminer->AddActivity((bool) 0.0, 10);
     //... continue adding activities (1.0 for active, 0.0 for inactive)
     cerr << MyFminer->GetNoCompounds() << " compounds" << endl;
     // gather results for every root node in vector instead of immediate output
     for ( int j = 0; j < (int) MyFminer->GetNoRootNodes(); j++ ) {
        vector<string>* result = MyFminer->MineRoot(j);
        for ( int i = 0; i < result->size(); i++) {
           cout << (*result)[i] << endl;
        }
     }
     delete MyFminer; // or call Reset() and start over.
 }

Ruby

This example assumes that you have created ruby bindings using make ruby.

 require 'last'
 MyFminer = Last::Last.new()
 MyFminer.SetConsoleOut(false)
 # Add compounds below. IMPORTANT! Do not change settings after adding compounds!
 MyFminer.AddCompound("O=C(C(C(C=C3)=CC=C3O)=CO2)C1=C2C=C(O)C=C1O" , 1)
 MyFminer.AddCompound("Oc1ccc(cc1)[C@@H]2Cc3ccc(O)cc3OC2" , 2)
 MyFminer.AddCompound("O=C1C(C3=CC=C(O)C=C3)=COC2=C1C=CC(O)=C2" , 3)
 MyFminer.AddCompound("O=C1C(C3=CC=C(OC)C=C3)=COC2=C1C=CC(O)=C2" , 4)
 MyFminer.AddCompound("OC1=CC=C(CCCCCCCC)C=C1" , 5)
 MyFminer.AddCompound("C1(C=CC=CC=1C(=C(Cl)Cl)C2=CC=C(C=C2)Cl)Cl" , 6)
 MyFminer.AddCompound("O=C(C1=C(C=CC=C1)C(=O)OCC(CCCC)CC)OCC(CCCC)CC" , 7)
 MyFminer.AddCompound("Oc1cc(O)cc2CCCCC[C@@H](O)CCC[C@H](C)OC(=O)c12" , 8)
 MyFminer.AddCompound("O=C1C2=C(C=C(C=C2O)O)OC(=C1O)C3=CC(=C(C=C3)O)O" , 9)
 MyFminer.AddCompound("C1(=C(C(=O)C2=C(O1)C=C(C=C2)O)O)C3=CC(O)=C(C=C3)O" , 10)
 # ... continue adding compounds
 MyFminer.AddActivity(1.0, 1) # 1.0 denotes one class in this example,
 MyFminer.AddActivity(1.0, 2)
 MyFminer.AddActivity(1.0, 3)
 MyFminer.AddActivity(1.0, 4)
 MyFminer.AddActivity(1.0, 5)
 MyFminer.AddActivity(1.0, 6)
 MyFminer.AddActivity(1.0, 7)
 MyFminer.AddActivity(1.0, 8)
 MyFminer.AddActivity(0.0, 9) # 0.0 the other class (you can use more than two classes, max 5).
 MyFminer.AddActivity(0.0, 10)
 # ... continue adding activities (true (1.0) for active, false (0.0) for inactive)
 print MyFminer.GetNoCompounds()  
 puts " compounds"
 # gather results for every root node in vector instead of immediate output
 (0 .. MyFminer.GetNoRootNodes()-1).each do |j|
    result = MyFminer.MineRoot(j)
    puts "Results"
    result.each do |res|
        puts res
   end
 end
 # call MyFminer.Reset() to start over.

Python

This example assumes that you have created python bindings using make python.

 import liblast
 MyFminer = liblast.Last() # global singleton instance.
 MyFminer.SetConsoleOut(0)
 # Add compounds below. IMPORTANT! Do not change settings after adding compounds!
 MyFminer.AddCompound("O=C(C(C(C=C3)=CC=C3O)=CO2)C1=C2C=C(O)C=C1O" , 1)
 MyFminer.AddCompound("Oc1ccc(cc1)[C@@H]2Cc3ccc(O)cc3OC2" , 2)
 MyFminer.AddCompound("O=C1C(C3=CC=C(O)C=C3)=COC2=C1C=CC(O)=C2" , 3)
 MyFminer.AddCompound("O=C1C(C3=CC=C(OC)C=C3)=COC2=C1C=CC(O)=C2" , 4)
 MyFminer.AddCompound("OC1=CC=C(CCCCCCCC)C=C1" , 5)
 MyFminer.AddCompound("C1(C=CC=CC=1C(=C(Cl)Cl)C2=CC=C(C=C2)Cl)Cl" , 6)
 MyFminer.AddCompound("O=C(C1=C(C=CC=C1)C(=O)OCC(CCCC)CC)OCC(CCCC)CC" , 7)
 MyFminer.AddCompound("Oc1cc(O)cc2CCCCC[C@@H](O)CCC[C@H](C)OC(=O)c12" , 8)
 MyFminer.AddCompound("O=C1C2=C(C=C(C=C2O)O)OC(=C1O)C3=CC(=C(C=C3)O)O" , 9)
 MyFminer.AddCompound("C1(=C(C(=O)C2=C(O1)C=C(C=C2)O)O)C3=CC(O)=C(C=C3)O" , 10)
 # ... continue adding compounds
 MyFminer.AddActivity(1.0, 1) # 1.0 denotes one class in this example,
 MyFminer.AddActivity(1.0, 2)
 MyFminer.AddActivity(1.0, 3)
 MyFminer.AddActivity(1.0, 4)
 MyFminer.AddActivity(1.0, 5)
 MyFminer.AddActivity(1.0, 6)
 MyFminer.AddActivity(1.0, 7)
 MyFminer.AddActivity(1.0, 8)
 MyFminer.AddActivity(0.0, 9) # 0.0 the other class (you can use more than two classes, max 5).
 MyFminer.AddActivity(0.0, 10)
 # ... continue adding activities (true (1.0) for active, false (0.0) for inactive)
 print repr(MyFminer.GetNoCompounds()) + ' compounds'
 # gather results for every root node in vector instead of immediate output
 for j in range(0, MyFminer.GetNoRootNodes()):
    result = MyFminer.MineRoot(j);
    for i in range(0, result.size()):
        print result[i];
 # call MyFminer.Reset() to start over.

Java

This example assumes that you have created java bindings using make java.

 public class test {
     public static void main(String args[]) {
        System.loadLibrary("last");
        Last MyFminer;
        MyFminer = new Last();
        // Toy example: special settings for mining all fragments
        MyFminer.SetMaxHops(25);
        MyFminer.SetConsoleOut(false);
        // Add compounds below. IMPORTANT! DO NOT CHANGE SETTINGS AFTER ADDING COMPOUNDS!
        MyFminer.AddCompound("O=C(C(C(C=C3)=CC=C3O)=CO2)C1=C2C=C(O)C=C1O" , 1);
        MyFminer.AddCompound("Oc1ccc(cc1)[C@@H]2Cc3ccc(O)cc3OC2" , 2);
        MyFminer.AddCompound("O=C1C(C3=CC=C(O)C=C3)=COC2=C1C=CC(O)=C2" , 3);
        MyFminer.AddCompound("O=C1C(C3=CC=C(OC)C=C3)=COC2=C1C=CC(O)=C2" , 4);
        MyFminer.AddCompound("OC1=CC=C(CCCCCCCC)C=C1" , 5);
        MyFminer.AddCompound("C1(C=CC=CC=1C(=C(Cl)Cl)C2=CC=C(C=C2)Cl)Cl" , 6);
        MyFminer.AddCompound("O=C(C1=C(C=CC=C1)C(=O)OCC(CCCC)CC)OCC(CCCC)CC" , 7);
        MyFminer.AddCompound("Oc1cc(O)cc2CCCCC[C@@H](O)CCC[C@H](C)OC(=O)c12" , 8);
        MyFminer.AddCompound("O=C1C2=C(C=C(C=C2O)O)OC(=C1O)C3=CC(=C(C=C3)O)O" , 9);
        MyFminer.AddCompound("C1(=C(C(=O)C2=C(O1)C=C(C=C2)O)O)C3=CC(O)=C(C=C3)O" , 10);
        // ... continue adding compounds
        MyFminer.AddActivity(1.0F, 1);
        MyFminer.AddActivity(1.0F, 2);
        MyFminer.AddActivity(1.0F, 3);
        MyFminer.AddActivity(1.0F, 4);
        MyFminer.AddActivity(1.0F, 5);
        MyFminer.AddActivity(1.0F, 6);
        MyFminer.AddActivity(1.0F, 7);
        MyFminer.AddActivity(1.0F, 8);
        MyFminer.AddActivity(0.0F, 9);
        MyFminer.AddActivity(0.0F, 10);
        // ... continue adding activities (1.0F for active, 0.0F for inactive)
        System.out.println(MyFminer.GetNoCompounds() + " compounds");
        // gather results for every root node in vector instead of immediate output
        for (int j = 0; j < (int) MyFminer.GetNoRootNodes(); j++)
        {
           SVector result = MyFminer.MineRoot(j);
           for(int i = 0; i < result.size(); i++)
           {
             System.out.println(result.get(i));
           }
        }
        MyFminer = null;
     }
 }

Description of Constructors and Options

For the purpose of demonstration we used a toy database of two compounds and an unusual parameter configuration. Please note, that in general the defaults set by the standard constructor are sensible for most databases. They switch on LAST-PM for 95% significance and a minimum frequency of 2. The complete standard settings are:

  • Minimum frequency: 2
  • Feature type: Trees
  • Console output: true
  • Aromatic perception: true
  • Refine Singles: false
  • Do Output: true
  • Maximum hops: 25
  Last ();

It is recommended to set the maximum number of hops as a first step when too many features are generated or calculation takes too long.



Contact

Dipl.-Inf. Andreas Maunz
Institute for Physics
Hermann-Herder-Str. 3
79104 Freiburg, Germany
Email: maunza@fdm.uni-freiburg.de
Web: http://cs.maunz.de

Author:
Andreas Maunz, 2010