1. What MOLE db - Molecular Descriptors Data Base is for?

The MOLE db - Molecular Descriptors Data Base is a free on-line database constituted of 1124 molecular descriptors calculated on 234773 molecules. The MOLE db - Molecular Descriptors Data Base is dedicated to all the scientists who are interested in molecular descriptors and/or apply molecular descriptors in scientific research. Without any doubt, in these last years molecular descriptors have become the explanatory key of the researches in QSAR/QSPR, toxicology, pharmacology, environmental problems, analytical chemistry, food chemistry, material science. Consequently, the aim of this database is:

- to collect the information related to several molecular descriptors, thus helping researchers in their daily work
- to provide the values of these descriptors on a huge database of molecules
- to facilitate the use of molecular descriptors in research applications

The molecules that constitute the MOLE db - Molecular Descriptors Data Base are mainly collected from the NCI database, while the molecular descriptors have been calculated by means of DRAGON software. Basically, the MOLE db - Molecular Descriptors Data Base allows you to:

- search for a specific group of molecules and analyse the corresponding values of molecular descriptors
- save in an output file the values of a block of molecular descriptors calculated on a group of molecules.


2. Warranty and conditions

In short, no guarantees, whatsoever, are given for the quality of MOLE db - Molecular Descriptors Data Base or for the consequences of its use. It is inevitable that there can be some bugs, but we have tried to test the database thoroughly.

This service is not intended to offer medical advice on any specific compound, or in any other way. The MOLE db - Molecular Descriptors Data Base is intended as a research and teaching tool; consequently, it can be used for non-commercial use only and for research from which any resulting intellectual property remains in the public domain.

Consider that a restriction in the maximum number of molecules (1000) to be searched in each query is present in the database.

First release of MOLE db - Molecular Descriptors Data Base: March, 2008


3. Molecules

The MOLE db - Molecular Descriptors Data Base is constituted of 234773 molecules, mainly collected from the NCI database. The molecules are numbered with two different counters:

- NCI Number is available for great part of the molecules inserted in the database, as they were mainly taken from the NCI database.
- MC Number is the enumeration we gave to the molecules inserted in the database. Basically, MC number = 20.000 + NCI number, since the first 20000 positions of the database are reserved to other molecules, external to the NCI database, that will be added in the next future.


4. Molecular descriptors

The molecules have been provided as SDF files by the Enhanced NCI database browser. On these files molecular descriptors have been calculated by means of DRAGON software. Not all the descriptors calculated by DRAGON have been considered; a subset of 1124 descriptors has been inserted in the database, so that the complete dataset can be seen as a matrix of 234773 rows (the molecules) and 1124 columns (molecular descriptors). These are the descriptor blocks included in the data base:
- constitutional descriptors (DRAGON block number 1)
- topological descriptors (DRAGON block number 2)
- connectivity indices (DRAGON block number 4)
- information indices (DRAGON block number 5)
- 2D autocorrelations (DRAGON block number 6)
- Burden eigenvalues descriptors (DRAGON block number 8)
- eigenvalue-based indices (DRAGON block number 10)
- geometrical descriptors (DRAGON block number 12)
- WHIM descriptors (DRAGON block number 15)
- GETAWAY descriptors (DRAGON block number 16)
- functional group counts (DRAGON block number 17)
- atom-centred fragments (DRAGON block number 18)
- molecular properties (DRAGON block number 20)

For a complete list of the included molecular descriptors, look the descriptor list here.

Some descriptors could not be calculated on all the molecules. These missing values are reported in the database as "n.a." and with a numerical code equal to -999 when exporting the query results in text files. See, for example, descriptor U-105 (belonging to the atom-centred fragments group) for the molecule 2-hexylpiperidine (nci number: 14).

You can consult the following book as a complete reference of the included molecular descriptors: Roberto Todeschini and Viviana Consonni, Handbook of Molecular Descriptors, WILEY - VCH, pp. 667, 2000

Moreover, an useful website dedicated to molecular descriptors is www.moleculardescriptors.eu, where tutorials, software, books, links, events and news related to molecular descriptors are collected.


5. Credits

The MOLE db - Molecular Descriptors Data Base has been implemented by Milano Chemometrics and QSAR Research Group. The molecular descriptors have been calculated by means of DRAGON software. The sdf files representing all the molecules derived from the NCI database have been provided by the Enhanced NCI database browser. The molecular structures are visualized by means of MarvinView java applet produced by ChemAxon Ltd.. Many structural data presented on this web site are derived from information freely available to the public. For the NCI open database structures, see the web site of the Developmental Therapeutics Program (DTP). Useful websites:
Milano Chemometrics and QSAR Research Group: http://michem.disat.unimib.it/chm/
DRAGON software: https://chm.kode-solutions.net/products_dragon.php
Enhanced NCI database browser: http://cactus.nci.nih.gov/ncidb2/


6. References and contacts

If you use or publish results achieved by means of MOLE db - Molecular Descriptors Data Base, please refer to:

D. Ballabio, A. Manganaro, V. Consonni, A. Mauri, R. Todeschini, Introduction to MOLE DB - on-line Molecular Descriptors Database, MATCH communications in mathematical and in computer chemistry, (2009), 62, 199-207

Mail Roberto Todeschini, Davide Ballabio and Alberto Manganaro for bug reports, comments and questions.


7. Help and Example of use

A detailed help is provided for each page of the MOLE db - Molecular Descriptors Data Base, where you can find suggestions and explanations on all the database tools. Here we just report a general example of molecule searching to explain how the database can be used:
Goal: Search for molecules that have NOx group and NCI number inbetween 1 and 3000. Then, save the corresponding topological molecular descriptors for all of them:

1. In the “Search for formula” query row, write: "NO" and in the “Search for NCI number” query row, write: from "1" to "3000". Leave all the other query rows empty.

2. Click on "Search" button.

3. A list of all the molecules that match your query will appear. In the status frame at the top, you will see the total number of molecules.

4. By clicking on “Show details”, you can see details of each molecule: Formula, Molecular Weight, CAS number, MC number, NCI number, SMILES string, molecular structure and values of all the molecular descriptors (divided in the respective descriptor blocks).

5. At the bottom of the molecule list, you can see and/or plot the value of a specific descriptor for all the molecules in the list. To do so, select the molecular descriptor to be plotted in the “Show a descriptor for all the resulting molecules” combo boxes.

6. At the bottom of the molecule list, you can also save and export (in a text file) a block of molecular descriptors for all the molecules in the list. You can specify the block of descriptors to be saved and some additional fields (MC and NCI numbers, Name, Formula, CAS number, Molecular Weight, SMILES) in the “Select descriptor block and fields to be saved” combo box. Finally, the dataset saved in the text file can be used for further processing in various contexts.

Note that the MC Number is the enumeration we gave to the molecules inserted in the database. Basically, MC number = 20.000 + NCI number, since the first 20000 positions of the database are reserved to other molecules, external to the NCI database, that we inserted in the MOLE db - Molecular Descriptors Data Base (or we will insert in the next future).