Binding Activity Prediction of Cyclin-Dependent Inhibitors
PBN-AR
Instytucja
Centrum Nowych Technologii UW (Uniwersytet Warszawski)
Informacje podstawowe
Główny język publikacji
en
Czasopismo
Journal of Chemical Information and Modeling
ISSN
1549-9596
EISSN
1549-960X
Wydawca
AMER CHEMICAL SOC
DOI
URL
Rok publikacji
2015
Numer zeszytu
7
Strony od-do
1469-1482
Numer tomu
55
Identyfikator DOI
Liczba arkuszy
Autorzy
Pozostali autorzy
+ 5
Streszczenia
Język
en
Treść
The Cyclin-Dependent Kinases (CDKs) are the core components coordinating eukaryotic cell division cycle. Generally the crystal structure of CDKs provides information on possible molecular mechanisms of ligand binding. However, reliable and robust estimation of ligand binding activity has been a challenging task in drug design. In this regard, various machine learning techniques, such as Support Vector Machine, Naive Bayesian classifier, Decision Tree, and K-Nearest Neighbor classifier, have been used. The performance of these heterogeneous classification techniques depends on proper selection of features from the data set. This fact motivated us to propose an integrated classification technique using Genetic Algorithm (GA), Rotational Feature Selection (RFS) scheme, and Ensemble of Machine Learning methods, named as the Genetic Algorithm integrated Rotational Ensemble based classification technique, for the prediction of ligand binding activity of CDKs. This technique can automatically find the important features and the ensemble size. For this purpose, GA encodes the features and ensemble size in a chromosome as a binary string. Such encoded features are then used to create diverse sets of training points using RFS in order to train the machine learning method multiple times. The RFS scheme works on Principal Component Analysis (PCA) to preserve the variability information of the rotational nonoverlapping subsets of original data. Thereafter, the testing points are fed to the different instances of trained machine learning method in order to produce the ensemble result. Here accuracy is computed as a final result after 10-fold cross validation, which also used as an objective function for GA to maximize. The effectiveness of the proposed classification technique has been demonstrated quantitatively and visually in comparison with different machine learning methods for 16 ligand binding CDK docking and rescoring data sets. In addition, the best possible features have been reported for CDK docking and rescoring data sets separately. Finally, the Friedman test has been conducted to judge the statistical significance of the results produced by the proposed technique. The results indicate that the integrated classification technique has high relevance in predicting of protein?ligand binding activity.
Cechy publikacji
ORIGINAL_ARTICLE
Inne
System-identifier
607407
CrossrefMetadata from Crossref logo
Cytowania
Liczba prac cytujących tę pracę
Brak danych
Referencje
Liczba prac cytowanych przez tę pracę
Brak danych