References Data Sets
Notes:

If the data are used, please remember to give proper references.
Missing values are represented by -999.
Downloaded structures can be used to calculate molecular descriptors by DRAGON software.
Obj.: no. of objects;
X-var.: independent variables;
Y-var.: response variables;
C-var.: class variables

Available Data Sets:
 BENZODIAZEPINES
 BENZYL
 HALD
 IRIS
 MUSCARINIC
 OCTANES
 OLITOS
 PHENETYLAMINES
 SELWOOD
 WINES
 links to other data sets

 
 BENZODIAZEPINES [top]

Data download (.txt, 5 Kb)

Obj. 245 // Y-var. 1
on courtesy of Prof. Frank R. Burden

Data set for QSAR modelling constituted be 245 benzodiazepines compounds that act on the benzodiazepine receptor. No common substructure.

Original reference : P.W.Harrison, G.B. Barlin, L.P.Davies, S.J.Ireland, P.Matyus, and M.G.Wong. Eur . J.Med.Chem. , 1996, 31 , 651-662.

Some other sources : (a) F.R.Burden, M.G.Ford, D.C.Whitley, and D.A.Winkler, J.Chem.Inf.Comp.Sci. , 2000, 40 , 1423-1430.

 

 BENZYL [top]

Data download (.txt, 1 Kb)

Obj. 11 // X-var. 3 / Y-var. 1

3 independent variables (substituent descriptors) and a Y response.

Original reference : P.P.Mager, H.Rothe, Pharmazie , 1990, 45 , 758-769.

Other sources : (a) M.Stone, P.Jonathan, J.Chemometrics , 1993, 7 , 455-475. (b) P.P. Mager, J.Chemometrics , 1995, 9 , 211-221. (c) R.Todeschini, V.Consonni, A.Maiocchi, Chemom.Intell.Lab.Syst. , 1999, 46 , 13-29.

 

 HALD [top]

Data download (.txt, 1 Kb)

Obj. 13 // X-var. 4 / Y-var. 1

Original reference : A.Hald, Statistical Theory with Engineering Applications, Wiley, NewYork, 1952; p.647.

Other sources : (a) N.Draper and H.Smith: "Applied Regression Analysis", 2nd ed., Wiley, NewYork, 1981. (b) H.Kubinyi, J.Chemometrics , 1996, 10 , 119-133.

 

 IRIS [top]

Data download (.txt, 4 Kb)

Obj. 150 // X-var. 4 / C-var. 1

Data set for classification monitoring constituted by 150 samples of Iris flowers, 4 independent variables, 1 class variable with 3 classes (Iris species).

Original reference : A.Fisher, Annals of Eugenics , 1936, 7 , 179-188.

Some other sources : (a) R.Todeschini, Analytica Chimica Acta , 1997, 348 , 419-430.

 

 MUSCARINIC [top]

Data download (.txt, 3 Kb)

Obj. 162 // Y-var. 1
on courtesy of Prof. Frank R. Burden

Data set for QSAR modelling constituted be 162 compounds that act on the M 1 muscarinic receptor. No common substructure.

Original reference : B.S.Orlek, F.E.Blaney, F.Brown, M.S.G.Clark, M.S.Hadley, J.Hatcher, G.J. Riley, H.E.Rosenberg, H.J.Wadsworth, and P.Wyman, J.Med.Chem. , 1991, 34 , 2726-2735.

Some other sources : (a) F.R.Burden, M.G.Ford, D.C.Whitley, and D.A.Winkler, J.Chem.Inf.Comp.Sci. , 2000, 40 , 1423-1430.

 

 OCTANES [top]

Data download (.txt, 3 Kb)
HyperChem structures download (.zip, 18 Kb)

Obj. 18 // Y-var. 19

19 physico-chemical responses of the 18 alkane isomers. Standard data set for molecular descriptor monitoring. 3D structure from HyperChem.

Original reference : M.Randic, X.Guo, T.Oxley, H.Krishnapriyan, and L.Naylor, J.Chem.Inf.Comput.Sci. , 1994, 34, 361-367.

Other sources : (a) M.Randic, J.Mol.Struct. - Teochem , 1991, 233, 45-59. (b) M.Randic, Croat.Chem.Acta , 1993, 66, 289-312. (c) M.V.Diudea, O.M.Minailiuc and G.Katona, Rev.Roum.Chim. , 1997, 42, 239-249. (d)

 

 OLITOS [top]

Data download (.txt, 20 Kb)

Obj. 120 // X-var. 25 / C-var. 1
on courtesy of Prof. Michele Forina

25 independent variables; 1 class variable with 4 classes.

Original reference : C.Armanino, R.Leardi, S.Lanteri, and G.Modi, Chemom.Intell. Lab.Syst. , 1989, 5 , 343-354.

Other sources : (a) R.Todeschini, Analytica Chimica Acta , 1997, 348 , 419-430.

 

 PHENETYLAMINES [top]

Data download (.txt, 89 Kb)
HyperChem structures download (.zip, 24 Kb)

Obj. 22 // X-var. 628 / Y-var. 1

620 theoretical molecular descriptors calculated by DRAGON software for 22 N,N-dimethyl-2-Br-phenetylamines; 1 biological response. Structures optimized by Amber force field (HyperChem software).

Original reference : H. Kubinyi (Ed.): "QSAR: Hansch Analysis and Related Approaches", VCH, Weinhein (Ger), 1993, pp.57-68.

Other sources : (a) R.Todeschini and P.Gramatica, in "Perspectives in Drug Discovery and Design", 1998, 355-380.

 

 SELWOOD [top]

Data download (.txt, 13 Kb)

Obj. 31 // X-var. 53 / Y-var. 1
on courtesy of Dr. Hugo Kubinyi

31 antifilarial antimycin A1 analogues represented by 53 physicochemical descriptors for modelling in vitro antifilarial activity. Data set used for testing variable selection approaches.

Original reference : D.L.Selwood, D.J.Livingstone, J.C.W.Comley, A.B.O'Dowd, A.T.Hudson, P.Jackson, K.S.Jandu, V.S.Rose, and J.N.Stables, J.Med.Chem. , 1990, 33, 136-142. 

Other sources : (a) H.Kubinyi, Quant.Struct.-Act. Relat. , 1994, 13, 285-294; (b) D.Rogers and A.J.Hopfinger, J.Chem.Inf.Comput.Sci. , 1994, 34, 854-866; (c) S.S.So and M.Karplus, J.Med.Chem. , 1996, 39, 1521-1530.

 

 WINES [top]

Data download (.txt, 12 Kb) 

Obj. 174 // X-var. 13 / C-var. 1

13 independent variables; 1 class variable with 3 classes.

Original reference : M.Forina, C.Armanino, M.Castino, and M.Ubigli, Vitis , 1986, 25 , 189.

Other sources : (a) R.Todeschini, Analytica Chimica Acta , 1997, 348 , 419-430.

 

 Links to other data sets [top]
Chemometrics
  Data Bases at Food Technology
  Data Sets and Tutorials at Clarkson University
  OnLine Databases, NIST (National Institute of Standard and Technology)

QSAR
  Biodegradation and Bioaccumulation Data of Existing Chemicals (Chemicals Evaluation Research Institute)
  ChemFinder
  Chemical Databases (Chemdex)
  Chemical Databases (Chemical Information Network)
  Chemical Databases (Virtual Library)
  Databases at The QSAR and Molecular Modelling Society
  Environmental Fate DataBase (Syracuse Research Corporation)
  MathMol Library (Library of 3-D Molecular Structures) 
  OnLine Databases, NIST (National Institute of Standard and Technology)
  The University of Minnesota Biocatalysis/Biodegradation Database
  TOXNET (Toxicology Data Network)