cs.maunz.de  

Computer Science

Euclidean Embedding of Molecules and Backbone Refinement Class Features based on Co-Occurrence and Entropy. (Schulz et al.)

The 2-D embedding reflects feature-feature and instance-feature co-occurrence as well as entropy of the features with respect to the target classes.

Usage: (De)activating features are (red) green, (In)active instances (salmon) blue. Point your mouse to a feature (instance). The matching instances (features) will be marked. The brighter a feature, the more significantly it is correlated with the endpoint. Use the mouse wheel to zoom in and out and the left mouse button to drag.

Observations: Separation of target classes along top left to bottom right, many highly descriptive features. Good distribution of features and instances, and well-characterized groups of instances in the outer parts. The data seems suitable for classification tasks.

Links to the datasets used in this study (the thresholds used and switches for fminer are given below):


Data: CPDB salmonella mutagenicity
Thresholds: 95% significance, minimum frequency 6 (Switches: -f 6)

Euclidean embedding of a Backbone Refinement Class Descriptors and Salmonella Mutagenicity data

Click here for the static version (opens in a separate window).

Click here for the animation (opens in a separate window).

Data: CPDB Multicell Call
Thresholds: 95% significance, minimum frequency 6 (Switches: -f 6)

Euclidean embedding of a Backbone Refinement Class Descriptors and Multicell call data

Click here for the static version (opens in a separate window).

Click here for the animation (opens in a separate window).

Data: Mouse Carcinogenicity
Thresholds: 95% significance, minimum frequency 6 (Switches: -f 6)

Euclidean embedding of a Backbone Refinement Class Descriptors and Mouse Carcinogenicity data

Click here for the static version (opens in a separate window).

Click here for the animation (opens in a separate window).

Data: Rat Carcinogenicity
Thresholds: 95% significance, minimum frequency 6 (Switches: -f 6)

Euclidean embedding of a Backbone Refinement Class Descriptors and Rat Carcinogenicity data

Click here for the static version (opens in a separate window).

Click here for the animation (opens in a separate window).


BBRC Feature Validation

CPDB and Large-Scale Datasheet, comparison to Open Trees, Maximal Trees (large scale), Static vs Upper Bound Pruning.

Excel Sheet (.xls) for MS Office 2000/2003/2007 (132K)

  • Runtime
  • Feature Count
  • Leave-One-Out Crossvalidation Accuracy
  • Large-Scale Analysis

Repeatability: the thresholds used and switches for fminer are given in the file.


Contact

Dipl.-Inf. Andreas Maunz
Machine Learning Lab
University Freiburg
Georges-Köhler-Allee 79
79110 Freiburg, Germany
Phone: +49761/203-8442, Fax: +49761/203-7700
Email: maunza@fdm.uni-freiburg.de
Web: http://cs.maunz.de