Data will confess: Entropy

Mostrando entradas con la etiqueta Entropy. Mostrar todas las entradas

viernes, 18 de mayo de 2012

Entropy will find the way

The most awesome measure in Information Theory is entropy. It is widely used in pattern recognition [Escolano,Suau,Bonev 2009] because it quantifies the expected value of the information contained in a message.

As far as I remember the first application I gave to entropy was to help a robot to find its way in a semi-structured environment using only vision [Bonev,Cazorla,Escolano 2007]. Not in terms of high-level knowledge, but just to go where there seem to be things in the distance. If people have a walk they don't get stuck by trying to go against a building. Instead, they see something in the end of a street and they start to walk that way.

In an image representing 360º of the environment that "something in the end of a street" is visually perceived as a more entropic region:

To avoid ambiguities I forced to have only two most entropic regions at a time: the two ends of a corridor or a street. A Fourier approximation results in the following map: for each moment in the time line we have only to hot regions in the angle axis. The robot should head to one of them: the one which grees with its current heading.

This simplistic approach had to be aided by another vision-based mechanism to avoid obstacles. I refer to it as visual sonars, but no range sensors and no GPS are used in the following video, only vision:

[Escolano, Suau, Bonev 2009] F. Escolano, P. Suau, B. Bonev. "Information Theory in Computer Vision and Pattern Recognition". (Hardcover) Springer, 2009

[Bonev, Cazorla, Escolano 2007] B. Bonev, M. A. Cazorla, F. Escolano. "Robot Navigation Behaviors based on Omnidirectional Vision and Information Theory". Journal of Physical Agents - September 2007

martes, 1 de noviembre de 2011

Information Theory in Computer Vision and Mobile Robotics

Information Theory is both a set of tools and a theoretical framework for many pattern recognition problems, one of them is Computer Vision. The following slides are from a talk I gave at the Max Planck Institute in Tübingen in 2010. They are a picture of the uses and significance of Information Theory in Computer Vision, mainly in the context of our research at the Dept. of Computer Science and Artificial Intelligence in Alicante.

Information-theoretic Computer Vision for Autonomous Robots

Pdf is also available.

"In the end they will confess"

Feature selection research dates back to the 60’s. Hughes used a general parametric model to study the accuracy of a Bayesian classifier as a function of the number of features [Hughes, 1968]. He concludes: [...]“measurement selection, reduction and combination are not proposed as developed techniques. Rather, they are illustrative of a framework for further investigation.”

Since then the research in feature selection has been a challenging field, and some have been sceptical about it. In the discussion of the paper [Miller, 1984], J.B. Copas pessimistically commented that “It has been said: if you torture the data for long enough, in the end they will confess. Errors of grammar apart, what more brutal torture can there be than subset selection? The data will always confess, and the confession will usually be wrong.” Also, R.L. Plackett stated: “If variable elimination has not been sorted out after two decades of work assisted by high-speed computing, then perhaps the time has come to move on to other problems.”

Despite the computationally challenging scenario, the research in this direction continued. “As of 1997, when a special issue on relevance including several papers on variable and feature selection was published [Blum and Langley, 1997, Kohavi and John, 1997], few domains explored used more than 40 features.”, [Guyon and Elisseeff, 2003].

Nowadays machine learning and data acquisition advances demand the processing of data with thousands of features. An example is microarray processing. Wang and Gotoh work on molecular classification and qualify feature selection as “one intractable problem [...] is how to reduce the exceedingly high-dimensional gene expression data, which contain a large amount of noise” [Wang and Gotoh, 2009].

Thanks to the advances produced in entropy estimation during the last two decades, the subject of my Ph.D. Thesis [Bonev, 2010] was on feature selection in high-dimensional feature spaces.

[Hughes, 1968] G. F. Hughes. "On the mean accuracy of statistical pattern recognizers". IEEE Trransactions on Information Theory.

[Miller, 1984] A. J. Miller. "Selection of subsets of regression variables". Journal of the Royal Statistical Society.

[Blum and Langley, 1997] A. Blum and P. Langley. "Selection of relevant features and examples in machine learning". Artificial Intelligence.

[Kohavi and John, 1997] R. Kohavi and G. H. John. "Wrappers for feature subset selection". Artificial Intelligence.

[Guyon and Elisseeff, 2003] I. Guyon and A. Elisseeff. "An introduction to variable and feature selection". Journal of Machine Learning Research.
[Wang and Gotoh, 2009] X. Wang and O. Gotoh. "Accurate molecular classification of cancer using simple rules". BMC Medical Genomics.

[Bonev, 2010] B. Bonev. "Feature Selection based on Information Theory". Thesis (pdf 10MB)

lunes, 31 de octubre de 2011

Entropy maximizing distributions

Entropy maximizing distributions play an important role in several entropy estimation methods. The Gaussian distribution is the one with maximum entropy from among all the probability distributions which have a finite mean and a finite variance. This can be proved using the Lagrange multipliers. Follows a detailed proof.

GaussianIntegral

Pdf also available at ITinCVPR.