Feature selection research dates back to the 60’s. Hughes used a general parametric model to study the accuracy of a Bayesian classifier as a function of the number of features [Hughes, 1968]. He concludes: [...]“measurement selection, reduction and combination are not proposed as developed techniques. Rather, they are illustrative of a framework for further investigation.”
Since then the research in feature selection has been a challenging field, and some have been sceptical about it. In the discussion of the paper [Miller, 1984], J.B. Copas pessimistically commented that “It has been said: if you torture the data for long enough, in the end they will confess. Errors of grammar apart, what more brutal torture can there be than subset selection? The data will always confess, and the confession will usually be wrong.” Also, R.L. Plackett stated: “If variable elimination has not been sorted out after two decades of work assisted by high-speed computing, then perhaps the time has come to move on to other problems.”
Despite the computationally challenging scenario, the research in this direction continued. “As of 1997, when a special issue on relevance including several papers on variable and feature selection was published [Blum and Langley, 1997, Kohavi and John, 1997], few domains explored used more than 40 features.”, [Guyon and Elisseeff, 2003].
Nowadays machine learning and data acquisition advances demand the processing of data with thousands of features. An example is microarray processing. Wang and Gotoh work on molecular classification and qualify feature selection as “one intractable problem [...] is how to reduce the exceedingly high-dimensional gene expression data, which contain a large amount of noise” [Wang and Gotoh, 2009].
Thanks to the advances produced in entropy estimation during the last two decades, the subject of my Ph.D. Thesis [Bonev, 2010] was on feature selection in high-dimensional feature spaces.
[Hughes, 1968] G. F. Hughes. "On the mean accuracy of statistical pattern recognizers". IEEE Trransactions on Information Theory.
[Miller, 1984] A. J. Miller. "Selection of subsets of regression variables". Journal of the Royal Statistical Society.
[Blum and Langley, 1997] A. Blum and P. Langley. "Selection of relevant features and examples in machine learning". Artificial Intelligence.
[Kohavi and John, 1997] R. Kohavi and G. H. John. "Wrappers for feature subset selection". Artificial Intelligence.
[Guyon and Elisseeff, 2003] I. Guyon and A. Elisseeff. "An introduction to variable and feature selection". Journal of Machine Learning Research.
[Wang and Gotoh, 2009] X. Wang and O. Gotoh. "Accurate molecular classification of cancer using simple rules". BMC Medical Genomics.
[Wang and Gotoh, 2009] X. Wang and O. Gotoh. "Accurate molecular classification of cancer using simple rules". BMC Medical Genomics.
[Bonev, 2010] B. Bonev. "Feature Selection based on Information Theory". Thesis (pdf 10MB)
In the citation of Copas we can read "[...] Errors of grammar apart [...]".
ResponderEliminarAre there any errors? "Data" is plural, that is why "they will confess".