QSAR模型内部和外部验证方法综述

覃礼堂, 刘树深, 肖乾芬, 吴庆生. QSAR模型内部和外部验证方法综述[J]. 环境化学, 2013, 32(7): 1205-1211. doi: 10.7524/j.issn.0254-6108.2013.07.012
引用本文: 覃礼堂, 刘树深, 肖乾芬, 吴庆生. QSAR模型内部和外部验证方法综述[J]. 环境化学, 2013, 32(7): 1205-1211. doi: 10.7524/j.issn.0254-6108.2013.07.012
QIN Litang, LIU Shushen, XIAO Qianfen, WU Qingsheng. Internal and external validtions of QSAR model: Review[J]. Environmental Chemistry, 2013, 32(7): 1205-1211. doi: 10.7524/j.issn.0254-6108.2013.07.012
Citation: QIN Litang, LIU Shushen, XIAO Qianfen, WU Qingsheng. Internal and external validtions of QSAR model: Review[J]. Environmental Chemistry, 2013, 32(7): 1205-1211. doi: 10.7524/j.issn.0254-6108.2013.07.012

QSAR模型内部和外部验证方法综述

  • 基金项目:

    国家自然科学基金(21177097)资助

    中国博士后科学基金(2012M520932)资助.

Internal and external validtions of QSAR model: Review

  • Fund Project:
  • 摘要: 验证定量-结构活性相关(QSAR)模型,是保证模型对未知样本的生物活性具有可靠预测能力的重要前提.然而,目前部分QSAR论文没有对模型进行有效验证.因此,本文详细综述QSAR模型的内部验证方法和外部验证方法.内部验证方法包括留一法(leave-one-out,LOO)交叉验证,留多法(leave-many-out,LMO)或留N法(leave-N-out,LNO)交叉验证,y随机化验证和自举法.评价模型外部预测能力的统计量包括QF12、QF22、QF32、一致性相关系数(concordance correlation coefficient,CCC)、rm-2和Golbraikh-Tropsha方法.此外,从文献中总结出可接受QSAR模型对应的统计量参考数值,从而为QSAR建模者提供指导与帮助.
  • 加载中
  • [1] Rucki M,Tichy M.Validation of QSAR models for legislative purposes [J].Interdiscip Toxicol,2009,2(3):184-186
    [2] Gramatica P.Principles of QSAR models validation:internal and external [J].QSAR Comb Sci,2007,26(5):694-701
    [3] Eriksson L,Jaworska J,Worth A P,et al.Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs [J].Environ Health Perspect,2003,111(10):1361-1375
    [4] Wold S.Validation of QSAR's [J].Quant Struct-Act Rel,1991,10(3):191-193
    [5] Kiralj R,Ferreira M M C.Basic validation procedures for regression models in QSAR and QSPR studies:Theory and application [J].J Braz Chem Soc,2009,20(4):770-787
    [6] Geisser S.The predictive sample reuse method with applications [J].J Am Stat Assoc,1975,70:320-328
    [7] Konovalov D A,Llewellyn L E,Heyden Y V,et al.Robust cross-validation of linear regression QSAR models [J].J Chem Inf Model,2008,48(10):2081-2094
    [8] Clark R D.Boosted leave-many-out cross-validation:The effect of training and test set diversity on PLS statistics [J].J Comput Aid Mol Des,2003,17(2):265-275
    [9] Besalu E.Fast computation of cross-validated properties in full linear leave-many-out procedures [J].J Math Chem,2001,29(3):191-204
    [10] Qin L T,Liu S S,Chen F,et al.Chemometric model for predicting retention indices of constituents of essential oils [J].Chemosphere,2013,90(2):300-305
    [11] Rucker C,Rucker G,Meringer M.y-Randomization and its variants in QSPR/QSAR [J].J Chem Inf Model,2007,47(6):2345-57
    [12] Tropsha A,Gramatica P,Gombar V K.The importance of being earnest:Validation is the absolute essential for successful application and interpretation of QSPR models [J].QSAR Comb Sci,2003,22(1):69-77
    [13] Wehrens R,Putter H,Buydens L M C.The bootstrap:A tutorial [J].Chemometr Intell Lab Syst,2000,54(1):35-52
    [14] Golbraikh A,Tropsha A.Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection [J].J Comput Aid Mol Des,2002,16(5/6):357-369
    [15] Tropsha A.Best practices for QSAR model development,validation,and exploitation [J].Mol Inf,2010,29(6/7):476-488
    [16] Roy K.On some aspects of validation of predictive quantitative structure-activity relationship models [J].Expert Opin Drug Discovery,2007,2(12):1567-1577
    [17] Roy P P,Paul S,Mitra I,et al.On two novel parameters for validation of predictive QSAR models [J].Molecules,2009,14(5):1660-1701
    [18] Schuurmann G,Ebert R U,Chen J W,et al.External validation and prediction employing the predictive squared correlation coefficient-test set activity mean vs training set activity mean [J].J Chem Inf Model,2008,48(11):2140-2145
    [19] Consonni V,Ballabio D,Todeschini R.Comments on the definition of the Q2 parameter for QSAR validation [J].J Chem Inf Model,2009,49(7):1669-1678
    [20] Consonni V,Ballabio D,Todeschini R.Evaluation of model predictive ability by external validation techniques [J].J Chemometr,2010,24(3/4):194-201
    [21] Chirico N and Gramatica P.Real external predictivity of QSAR models.Part 2.New intercomparable thresholds for different validation criteria and the need for scatter plot inspection [J].J Chem Inf Model,2012,52(8):2044-2058
    [22] Lin L I.Assay validation using the concordance correlation coefficient [J].Biometrics,1992:599-604
    [23] Lin L I.A concordance correlation coefficient to evaluate reproducibility [J].Biometrics,1989,45(1):255-268
    [24] Mitra I,Roy P P,Kar S,et al.On further application of rm2 as a metric for validation of QSAR models [J].J Chemometr,2010,24(1):22-33
    [25] Roy P P,Roy K.On some aspects of variable selection for partial least squares regression models [J].QSAR Comb Sci,2008,27(3):302-313
    [26] Golbraikh A,Tropsha A.Beware of q2! [J].J Mol Graph Model,2002,20(4):269-276
    [27] Aptula A O,Jeliazkova N G,Schultz T W,et al.The better predictive model:High q2 for the training set or low root mean square error of prediction for the test set? [J].QSAR Comb Sci,2005,24(3):385-396
    [28] Ojha P K,Mitra I,Das R N,et al.Further exploring rm2 metrics for validation of QSPR models [J].Chemometr Intell Lab Syst,2011,107(1):194-205
    [29] Roy K,Mitra I,Kar S,et al.Comparative studies on some metrics for external validation of QSPR models [J].J Chem Inf Model,2012,52(2):396-408
    [30] Topliss J G,Edwards R P.Chance factors in studies of quantitative structure-activity relationships [J].J Med Chem,1979,22(10):1238-1244
    [31] Dearden J C,Cronin M T,Kaiser K L.How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR) [J].SAR QSAR Environ Res,2009,20(3/4):241-266
    [32] Benigni R,Bossa C.Predictivity of QSAR [J].J Chem Inf Model,2008,48(5):971-980
    [33] Chirico N,Gramatica P.Real external predictivity of QSAR models:How to evaluate it? comparison of different validation criteria and proposal of using the concordance correlation coefficient [J].J Chem Inf Model,2011,51(9):2320-2335
    [34] Kiralj R,Ferreira M M C.Is your QSAR/QSPR descriptor real or trash? [J].J Chemometr,2010,24(11/12):681-693
    [35] Hechinger M,Leonhard K,Marquardt W.What is wrong with quantitative structure-property relations models based on three-dimensional descriptors? [J].J Chem Inf Model,2012,52(8):1984-1993
    [36] Paster I,Shacham M,Brauner N.Investigation of the relationships between molecular structure,molecular descriptors,and physical properties [J].Ind Eng Chem Res,2009,48(21):9723-9734
    [37] Puzyn T,Mostrag-Szlichtyng A,Gajewicz A,et al.Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models [J].Struct Chem,2011,22(4):795-804
    [38] Roy P P,Leonard J T,Roy K. Exploring the impact of size of training sets for the development of predictive QSAR models [J]. Chemometr Intell Lab Syst,2008,90(1): 31-42
    [39] Rajer-Kanduc K,Zupan J,Majcen N.Separation of data on the training and test set for modelling:A case study for modelling of five colour properties of a white pigment [J].Chemometr Intell Lab Syst,2003,65(2): 221-229
    [40] Orfi L,Szantai-Kis C,Kovesdi I,et al.Validation subset selections for extrapolation oriented QSPAR models [J].Mol Divers,2003,7(1):37-43
    [41] Goodarzi M,Heyden Y V,Funar-Timofei S.Towards better understanding of feature-selection or reduction techniques for quantitative structure-activity relationship models [J].TrAC,Trends Anal Chem,2013,42:49-63
    [42] Goodarzi M,Dejaegher B,Vander Heyden Y.Feature selection methods in QSAR studies [J].J Aoac Int,2012,95(3):636-651
    [43] Eklund M,Norinder U,Boyer S,et al.Benchmarking variable selection in QSAR [J].Mol Inf,2012,31(2):173-179
    [44] Andersen C M,Bro R.Variable selection in regression-a tutorial [J].J Chemometr,2010,24(11/12):728-737
    [45] Young D,Martin T,Venkatapathy R,et al.Are the chemical structures in your QSAR correct? [J].QSAR Comb Sci,2008,27(11/12):1337-1345
    [46] Li J Z,Gramatica P.The importance of molecular structures,endpoints' values,and predictivity parameters in QSAR research:QSAR analysis of a series of estrogen receptor binders [J].Mol Divers,2010,14(4):687-696
    [47] Furusjo E,Svenson A,Rahmberg M,et al.The importance of outlier detection and training set selection for reliable environmental QSAR predictions [J].Chemosphere,2006,63(1):99-108
  • 加载中
计量
  • 文章访问数:  5887
  • HTML全文浏览数:  5620
  • PDF下载数:  2975
  • 施引文献:  0
出版历程
  • 收稿日期:  2013-01-25
覃礼堂, 刘树深, 肖乾芬, 吴庆生. QSAR模型内部和外部验证方法综述[J]. 环境化学, 2013, 32(7): 1205-1211. doi: 10.7524/j.issn.0254-6108.2013.07.012
引用本文: 覃礼堂, 刘树深, 肖乾芬, 吴庆生. QSAR模型内部和外部验证方法综述[J]. 环境化学, 2013, 32(7): 1205-1211. doi: 10.7524/j.issn.0254-6108.2013.07.012
QIN Litang, LIU Shushen, XIAO Qianfen, WU Qingsheng. Internal and external validtions of QSAR model: Review[J]. Environmental Chemistry, 2013, 32(7): 1205-1211. doi: 10.7524/j.issn.0254-6108.2013.07.012
Citation: QIN Litang, LIU Shushen, XIAO Qianfen, WU Qingsheng. Internal and external validtions of QSAR model: Review[J]. Environmental Chemistry, 2013, 32(7): 1205-1211. doi: 10.7524/j.issn.0254-6108.2013.07.012

QSAR模型内部和外部验证方法综述

  • 1.  同济大学长江水环境教育部重点实验室, 上海, 200092;
  • 2.  同济大学化学系, 上海, 200092;
  • 3.  同济大学环境科学与工程学院, 上海, 200092
基金项目:

国家自然科学基金(21177097)资助

中国博士后科学基金(2012M520932)资助.

摘要: 验证定量-结构活性相关(QSAR)模型,是保证模型对未知样本的生物活性具有可靠预测能力的重要前提.然而,目前部分QSAR论文没有对模型进行有效验证.因此,本文详细综述QSAR模型的内部验证方法和外部验证方法.内部验证方法包括留一法(leave-one-out,LOO)交叉验证,留多法(leave-many-out,LMO)或留N法(leave-N-out,LNO)交叉验证,y随机化验证和自举法.评价模型外部预测能力的统计量包括QF12、QF22、QF32、一致性相关系数(concordance correlation coefficient,CCC)、rm-2和Golbraikh-Tropsha方法.此外,从文献中总结出可接受QSAR模型对应的统计量参考数值,从而为QSAR建模者提供指导与帮助.

English Abstract

参考文献 (47)

返回顶部

目录

/

返回文章
返回