In many absorption, distribution, metabolism,
and excretion (ADME) modeling problems, imbalanced
data could negatively affect classification performance of
machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for
ADME modeling tasks is underexplored. In this paper, var-ious strategies including cost-sensitive learning and resam-plingmethodswere studied to tackle themoderate imbalance
problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for
data modeling. Support vector machine classifiers were con-structed and compared using multiple comparison tests.
Results showed that the models developed on the basis of
resampling strategies displayed better performanc...