Advanced search options

Advanced Search Options 🞨

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in Published in Earliest date Latest date

Sorted by

Results per page:

You searched for +publisher:"University of North Carolina" +contributor:("Zheng, Weifan"). One record found.

Search Limiters

Last 2 Years | English Only

No search limiters apply to these results.

▼ Search Limiters


University of North Carolina

1. Wang, Kun. Classifier Design to Improve Pattern Classification and Knowledge Discovery for Imbalanced Datasets.

Degree: 2010, University of North Carolina

Imbalanced dataset mining is a nontrivial issue. It has extensive applications in a variety of fields, such as scientific research, medical diagnosis, business, multiple industries, etc. Standard machine learning algorithms fail to produce satisfactory classifiers: they tend to over-fit the larger class but ignore the smaller class. Numerous algorithms have been developed to handle class imbalance, and limited progress has been achieved in improving prediction accuracy for smaller class. However, real world datasets may have hidden detrimental characteristics other than class imbalance. Those characteristics usually are dataset specific, and can fail otherwise robust algorithms for other imbalanced datasets. Mining such datasets can only be improved by algorithms tailored to domain characteristics (Weiss, 2004); therefore, it is important and necessary to do exploratory data analysis before classifier design. On the other hand, unmet needs in knowledge discovery, such as lead optimization during drug discovery, demand novel algorithms. In this study, we have developed a framework for imbalanced dataset mining tailored to data characteristics and adapted to knowledge discovery in chemical datasets. First, we explored the dataset and visualized domain characteristics, and then we designed different classifiers accordingly: for class imbalance, active learning (AL), cost sensitive learning (CSL) and re-sampling methods were designed; for class overlap, Class Boundary Cleaning (CBC) and Class Boundary Mining (CBM) were developed. CBM was also designed for lead optimization: ideally it would detect fine structural differences between different classes of compounds; and these differences could be options for lead optimization. Methods developed were applied to two datasets, hERG and CPDB. The results from imbalanced hERG liability dataset showed that CBC, CBM and AL were effective in correcting class imbalance/overlap and improving the classifier's performance. Highly predictive models were built; discriminating patterns were discovered; and lead optimization options were proposed. The methodology developed and knowledge discovered will benefit drug discovery, improve hazard test prioritization, risk assessment, and governmental regulatory work on human health and the environmental protection. Advisors/Committee Members: Wang, Kun, Tropsha, Alexander, Golbraikh, Alexander, Roth, Bryan, Marron, James Stephen, Zheng, Weifan.

Subjects/Keywords: Eshelman School of Pharmacy; Division of Chemical Biology and Medicinal Chemistry

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Wang, K. (2010). Classifier Design to Improve Pattern Classification and Knowledge Discovery for Imbalanced Datasets. (Thesis). University of North Carolina. Retrieved from https://cdr.lib.unc.edu/record/uuid:87a64bfc-38f3-43d3-8b27-b3f97cebdd2a

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Wang, Kun. “Classifier Design to Improve Pattern Classification and Knowledge Discovery for Imbalanced Datasets.” 2010. Thesis, University of North Carolina. Accessed December 02, 2020. https://cdr.lib.unc.edu/record/uuid:87a64bfc-38f3-43d3-8b27-b3f97cebdd2a.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Wang, Kun. “Classifier Design to Improve Pattern Classification and Knowledge Discovery for Imbalanced Datasets.” 2010. Web. 02 Dec 2020.

Vancouver:

Wang K. Classifier Design to Improve Pattern Classification and Knowledge Discovery for Imbalanced Datasets. [Internet] [Thesis]. University of North Carolina; 2010. [cited 2020 Dec 02]. Available from: https://cdr.lib.unc.edu/record/uuid:87a64bfc-38f3-43d3-8b27-b3f97cebdd2a.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Wang K. Classifier Design to Improve Pattern Classification and Knowledge Discovery for Imbalanced Datasets. [Thesis]. University of North Carolina; 2010. Available from: https://cdr.lib.unc.edu/record/uuid:87a64bfc-38f3-43d3-8b27-b3f97cebdd2a

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

.