Benchmark Data Sets
Notes:

If the data are used, please remember to give proper references.
Missing values are represented by -999.
Downloaded structures can be used to calculate molecular descriptors by DRAGON software.
Obj.: no. of objects;
X-var.: independent variables;
Y-var.: response variables;
C-var.: class variables

Available Data Sets:
 BENZODIAZEPINES
 BENZYL
 HALD
 IRIS
 MUSCARINIC
 OCTANES
 OLITOS
 PHENETYLAMINES
 SELWOOD
 WINES

 
 BENZODIAZEPINES [top]

Data download (.txt, 5 Kb)

Obj. 245 // Y-var. 1
on courtesy of Prof. Frank R. Burden

Data set for QSAR modelling constituted be 245 benzodiazepines compounds that act on the benzodiazepine receptor. No common substructure.

Original reference : P.W.Harrison, G.B. Barlin, L.P.Davies, S.J.Ireland, P.Matyus, and M.G.Wong. Eur . J.Med.Chem. , 1996, 31 , 651-662.

Some other sources : (a) F.R.Burden, M.G.Ford, D.C.Whitley, and D.A.Winkler, J.Chem.Inf.Comp.Sci. , 2000, 40 , 1423-1430.

 

 BENZYL [top]

Data download (.txt, 1 Kb)

Obj. 11 // X-var. 3 / Y-var. 1

3 independent variables (substituent descriptors) and a Y response.

Original reference : P.P.Mager, H.Rothe, Pharmazie , 1990, 45 , 758-769.

Other sources : (a) M.Stone, P.Jonathan, J.Chemometrics , 1993, 7 , 455-475. (b) P.P. Mager, J.Chemometrics , 1995, 9 , 211-221. (c) R.Todeschini, V.Consonni, A.Maiocchi, Chemom.Intell.Lab.Syst. , 1999, 46 , 13-29.

 

 HALD [top]

Data download (.txt, 1 Kb)

Obj. 13 // X-var. 4 / Y-var. 1

Original reference : A.Hald, Statistical Theory with Engineering Applications, Wiley, NewYork, 1952; p.647.

Other sources : (a) N.Draper and H.Smith: "Applied Regression Analysis", 2nd ed., Wiley, NewYork, 1981. (b) H.Kubinyi, J.Chemometrics , 1996, 10 , 119-133.

 

 IRIS [top]

Data download (.txt, 4 Kb)

Obj. 150 // X-var. 4 / C-var. 1

Data set for classification monitoring constituted by 150 samples of Iris flowers, 4 independent variables, 1 class variable with 3 classes (Iris species).

Original reference : A.Fisher, Annals of Eugenics , 1936, 7 , 179-188.

Some other sources : (a) R.Todeschini, Analytica Chimica Acta , 1997, 348 , 419-430.

 

 MUSCARINIC [top]

Data download (.txt, 3 Kb)

Obj. 162 // Y-var. 1
on courtesy of Prof. Frank R. Burden

Data set for QSAR modelling constituted be 162 compounds that act on the M 1 muscarinic receptor. No common substructure.

Original reference : B.S.Orlek, F.E.Blaney, F.Brown, M.S.G.Clark, M.S.Hadley, J.Hatcher, G.J. Riley, H.E.Rosenberg, H.J.Wadsworth, and P.Wyman, J.Med.Chem. , 1991, 34 , 2726-2735.

Some other sources : (a) F.R.Burden, M.G.Ford, D.C.Whitley, and D.A.Winkler, J.Chem.Inf.Comp.Sci. , 2000, 40 , 1423-1430.

 

 OCTANES [top]

Data download (.txt, 3 Kb)
HyperChem structures download (.zip, 18 Kb)

Obj. 18 // Y-var. 19

19 physico-chemical responses of the 18 alkane isomers. Standard data set for molecular descriptor monitoring. 3D structure from HyperChem.

Original reference : M.Randic, X.Guo, T.Oxley, H.Krishnapriyan, and L.Naylor, J.Chem.Inf.Comput.Sci. , 1994, 34, 361-367.

Other sources : (a) M.Randic, J.Mol.Struct. - Teochem , 1991, 233, 45-59. (b) M.Randic, Croat.Chem.Acta , 1993, 66, 289-312. (c) M.V.Diudea, O.M.Minailiuc and G.Katona, Rev.Roum.Chim. , 1997, 42, 239-249. (d)

 

 OLITOS [top]

Data download (.txt, 20 Kb)

Obj. 120 // X-var. 25 / C-var. 1
on courtesy of Prof. Michele Forina

25 independent variables; 1 class variable with 4 classes.

Original reference : C.Armanino, R.Leardi, S.Lanteri, and G.Modi, Chemom.Intell. Lab.Syst. , 1989, 5 , 343-354.

Other sources : (a) R.Todeschini, Analytica Chimica Acta , 1997, 348 , 419-430.

 

 PHENETYLAMINES [top]

Data download (.txt, 89 Kb)
HyperChem structures download (.zip, 24 Kb)

Obj. 22 // X-var. 628 / Y-var. 1

620 theoretical molecular descriptors calculated by DRAGON software for 22 N,N-dimethyl-2-Br-phenetylamines; 1 biological response. Structures optimized by Amber force field (HyperChem software).

Original reference : H. Kubinyi (Ed.): "QSAR: Hansch Analysis and Related Approaches", VCH, Weinhein (Ger), 1993, pp.57-68.

Other sources : (a) R.Todeschini and P.Gramatica, in "Perspectives in Drug Discovery and Design", 1998, 355-380.

 

 SELWOOD [top]

Data download (.txt, 13 Kb)

Obj. 31 // X-var. 53 / Y-var. 1
on courtesy of Dr. Hugo Kubinyi

31 antifilarial antimycin A1 analogues represented by 53 physicochemical descriptors for modelling in vitro antifilarial activity. Data set used for testing variable selection approaches.

Original reference : D.L.Selwood, D.J.Livingstone, J.C.W.Comley, A.B.O'Dowd, A.T.Hudson, P.Jackson, K.S.Jandu, V.S.Rose, and J.N.Stables, J.Med.Chem. , 1990, 33, 136-142. 

Other sources : (a) H.Kubinyi, Quant.Struct.-Act. Relat. , 1994, 13, 285-294; (b) D.Rogers and A.J.Hopfinger, J.Chem.Inf.Comput.Sci. , 1994, 34, 854-866; (c) S.S.So and M.Karplus, J.Med.Chem. , 1996, 39, 1521-1530.

 

 WINES [top]

Data download (.txt, 12 Kb) 

Obj. 174 // X-var. 13 / C-var. 1

13 independent variables; 1 class variable with 3 classes.

Original reference : M.Forina, C.Armanino, M.Castino, and M.Ubigli, Vitis , 1986, 25 , 189.

Other sources : (a) R.Todeschini, Analytica Chimica Acta , 1997, 348 , 419-430.