|Title||Semi-analytical method for analyzing models and model selection measures|
Considering the large amounts of data that is collected everyday in various domains such as health care, financial services, astrophysics and many others, there is a pressing need to convert this information into knowledge. Machine learning and data mining are both concerned with achieving this goal in a scalable fashion. The main theme of my work has been to analyze and better understand prevalent classification techniques and paradigms which are an integral part of machine learning and data mining research, with an aim to reduce the hiatus between theory and practice. Machine learning and data mining researchers have developed a plethora of classification algorithms to tackle classification problems. Unfortunately, no one algorithm is superior to the others in all scenarios and neither is it totally clear as to which algorithm should be preferred over others under specific circumstances. Hence, an important question now is, what is the best choice of a classification algorithm for a particular application? This problem is termed as classification model selection and is a very important problem in machine learning and data mining. The primary focus of my research has been to propose a novel methodology to study these classification algorithms accurately and efficiently in the non-asymptotic regime. In particular, we propose a moment based method where by focusing on the probabilistic space of classifiers induced by the classification algorithm and datasets of size N drawn independently and identically from a joint distribution i.i.d.), we obtain efficient characterizations for computing the moments of the generalization error. Moreover, we can also study model selection techniques such as cross-validation, leave-one-out and hold out set in our proposed framework. This is possible since we have also established general relationships between the moments of the generalization error and moments of the hold-out-set error, cross-validation error and leave-one-out error. Deploying the methodology we were able to provide interesting explanations for the behavior of cross-validation. The methodology aims at covering the gap between results predicted by theory and the behavior observed in practice.
How to get this paper's electronic documents?
1, Click the "Buy Now" button to complete the online payment
2, Download the paper's electronic document from the successful payment return page/Or the system will send this paper's electronic document to your E-Mail within 24 hours
|Favorite||ADD TO FAVORITE|
Perhaps You will be interested in these papers
2012-03-08 On generalizations of Gowers norms
2012-03-08 Indeterminate strings