martes, 26 de noviembre de 2013

How do you see with Google Glass?

How does the Google Glass image look like? Most previews show graphical recreations but not real photos. The best way to know is to try Glass but it is also possible to take a photo.

Google Glass projects light into a prism which reflects the image into you right eye. It is positioned over your eye so that you have to look upwards in order to see it. Also, you couldn't focus something so close to your eye. You need to focus at about 30cm away from the glass in order to see the image clearly. And that's how I managed to take the photo in which we can see the time projected by Glass.




The experience is nice and you don't need to close your left eye or make any kind of effort to see the image sharp. You focus it quite naturally. However, it is not like overlaying Glass projection over the real world image. In the photo I show it looks like that because the paper is close enough to be focused as well. If you're on the street and you look at the Glass viewer, everything else will be out of focus. Then you will need to move your gaze and focus the street again: as distracting as looking at you phone's display. The only difference is that you don't need to hold Glass, it is just a kind of a hands-free device.

Potentially it is much more: a camera attached to your head is a valuable source of information for computer vision. If it had eye-tracking it would open the doors to attention-controlled user interfaces. (In the end, eye-movement data will confess).

Related: Google Glass GDK

viernes, 22 de noviembre de 2013

Google Glass GDK

Today I tested Google Glass Development Kit.



(Related: How do you see with Google Glass?)


I am interested in computer vision so I checked the camera parameters when capturing frames from the preview. Google Glass GDK has a very poor documentation and very few have the glasses (and there is no emulator). So here I post the parameters I obtained for those who need to know them:


11-23 19:03:21.117: I/CAMERA PARAMETERS(1345): =s3d-prv-frame-layout-values=none;zoom=0;max-num-detected-faces-hw=35;ti-algo-external-gamma=false;sensor-orientation=0;whitebalance=daylight;autofocus=false;ti-algo-gic=true;preview-format-values=yuv420sp,yuv420p,yuv422i-yuyv,yuv420p;auto-convergence-mode-values=;jpeg-thumbnail-quality=60;preview-format=yuv420sp;exposure-mode-values=manual,auto,night,backlighting,spotlight,sports,snow,beach,aperture,small-aperture;exif-make=Google;iso=auto;flash-mode-values=off;supported-manual-convergence-min=-100;supported-preview-sidebyside-size-values=;preview-frame-rate=60;ti-algo-nsf1=true;camera-name=S5K4E1GA;ti-algo-nsf2=true;jpeg-thumbnail-width=480;scene-mode-values=auto,closeup,landscape,aqua,sports,mood,night-portrait,night-indoor,fireworks,document,barcode,super-night,cine,old-film,action,beach,candlelight,night,party,portrait,snow,steadyphoto,sunset,theatre;exif-model=Glass 1;preview-fps-range-values=(5000,5000),(10000,10000),(15000,15000),(20000,20000),(24000,24000),(15000,30000),(30000,30000),(60000,60000);gbce=false;preview-size-values=1920x1080,1280x960,1280x720,1024x768,1024x576,960x720,800x480,768x576,720x576,720x480,640x480,640x368,640x360,512x384,512x288,416x304,416x240,352x288,320x240,320x192,256x144,240x160,224x160,176x144,960x1280,720x1280,768x1024,576x1024,720x960,480x800,576x768,576x720,480x720,480x640,368x640,384x512,288x512,304x416,240x416,288x352,240x320,192x320,144x256,160x240,160x224,144x176;manual-exposure-right=1;vnf-supported=true;supported-picture-sidebyside-size-values=;preview-fps-range=60000,60000;auto-whitebalance-lock=true;min-exposure-compensation=-30;antibanding=auto;supported-manual-gain-iso-max=800;max-num-focus-areas=0;supported-manual-gain-iso-min=100;vertical-view-angle=42.5;video-stabilization-supported=true;exif-image-description=;iso-mode-values=auto,100,200,400,800;manual-gain-iso=100;s3d-cap-frame-layout=none;supported-manual-gain-iso-step=100;glbce=false;supported-manual-exposure-step=1;ti-algo-sharpening=true;picture-format-values=unused,yuv420sp,yuv420p,yuv422i-yuyv,rgb565,bayer-rggb,jpeg;supported-preview-topbottom-size-values=;glbce-supported=true;exposure-compensation-step=0.1;manual-convergence=0;picture-size=2528x1856;saturation=100;whitebalance-values=auto,daylight,cloudy-daylight,tungsten,fluorescent,incandescent,horizon,sunset,shade,twilight,warm-fluorescent;picture-format=jpeg;supported-picture-subsampled-size-values=;current-iso=100;preview-fps-range-ext-values=(5000,5000),(10000,10000),(15000,15000),(20000,20000),(24000,24000),(15000,30000),(30000,30000),(60000,60000);ipp=ldc-nsf;raw-height=1944;recording-hint=;video-stabilization=false;ipp-values=off,ldc,nsf,ldc-nsf;zoom-supported=true;sharpness=100;contrast=100;scene-mode=auto;jpeg-quality=95;supported-manual-exposure-min=1;manual-gain-iso-right=100;preview-size=320x240;focal-length=2.95;mode-values=high-quality,video-mode,high-performance,high-quality-zsl,cp-cam,zoom-bracketing,exposure-bracketing,temporal-bracketing;vnf=false;preview-fps-ext-values=5,10,15,20,24,30,60;preview-frame-rate-values=5,10,15,20,24,30,60;max-num-metering-areas=20;s3d-prv-frame-layout=none;manual-exposure=1;focus-mode-values=infinity;jpeg-thumbnail-size-values=640x480,160x120,200x120,320x240,512x384,352x144,176x144,96x96,0x0;supported-manual-exposure-max=125;zoom-ratios=100,104,107,111,115,119,123,127,132,137,141,146,152,157,162,168,174,180,187,193,200,207,214,222,230,238,246,255,264,273,283,293,303,314,325,336,348,361,373,386,400,414,429,444,459,476,492,510,528,546,566,586,606,628,650,673,696,721,746,773,800;gbce-supported=true;exif-software=Glass-1 XE11 901188;exposure=auto;picture-size-values=2592x1944,2560x1888,2528x1856,2592x1728,2592x1458,2560x1888,2400x1350,2304x1296,2240x1344,2160x1440,2112x1728,2112x1188,2048x1152,2048x1536,2016x1512,2016x1134,2000x1600,1920x1080,1600x1200,1600x900,1536x864,1408x792,1344x756,1296x972,1280x1024,1280x720,1152x864,1280x960,1024x768,1024x576,640x480,320x240;mechanical-misalignment-correction-supported=false;s3d-cap-frame-layout-values=none;auto-convergence-mode=fra

I also tested the BoofCV library with Realtime Edge detection. It works realtime, the delay that is observed is mostly due to the screencast performance:

 



domingo, 20 de mayo de 2012

Human vision: saccades and fixations

Human vision explores an image by fixations in some specific regions and fast movements or saccades from one region to another. Long saccades are, in general, less frequent. Depending on the task to be performed and depending on the image complexity, as we found in [Bonev, Chuang, Escolano 2012], the need for a long saccade may vary.
In the following slow-motion video we show an example of the behaviour of human saccades.


[Bonev, Chuang, Escolano, 2012] B. Bonev, L.L. Chuang, F. Escolano. "How do image complexity, task demands and looking biases influence human gaze behavior?" Pattern Recognition Letters 2012.

viernes, 18 de mayo de 2012

Entropy will find the way

The most awesome measure in Information Theory is entropy. It is widely used in pattern recognition [Escolano,Suau,Bonev 2009] because it quantifies the expected value of the information contained in a message.
As far as I remember the first application I gave to entropy was to help a robot to find its way in a semi-structured environment using only vision [Bonev,Cazorla,Escolano 2007]. Not in terms of high-level knowledge, but just to go where there seem to be things in the distance. If people have a walk they don't get stuck by trying to go against a building. Instead, they see something in the end of a street and they start to walk that way. 
In an image representing 360º of the environment that "something in the end of a street" is visually perceived as a more entropic region:
To avoid ambiguities I forced to have only two most entropic regions at a time: the two ends of a corridor or a street. A Fourier approximation results in the following map: for each moment in the time line we have only to hot regions in the angle axis. The robot should head to one of them: the one which grees with its current heading.
This simplistic approach had to be aided by another vision-based mechanism to avoid obstacles. I refer to it as visual sonars, but no range sensors and no GPS are used in the following video, only vision:


[Escolano, Suau, Bonev 2009] F. Escolano, P. Suau, B. Bonev. "Information Theory in Computer Vision and Pattern Recognition". (Hardcover) Springer, 2009
[Bonev, Cazorla, Escolano 2007] B. Bonev, M. A. Cazorla, F. Escolano. "Robot Navigation Behaviors based on Omnidirectional Vision and Information Theory". Journal of Physical Agents - September 2007

jueves, 17 de mayo de 2012

Análisis de sentimientos

En procesamiento del lenguaje natural (NLP) el término Sentiment analysis no se refiere a un análisis psicológico de un texto (todavía no nos estamos planteando quitarle el trabajo a los psicólogos). Sentiment analysis se refiere a extraer información subjetiva de un texto. 
Durante los últimos meses he estado trabajando en una aplicación de análisis de sentimiento que consiste en clasificar opiniones de usuarios en positivas y negativas. Se llama Opinum [Bonev et al, 2012] y se puede entrenar para cualquier contexto si uno cuenta con un conjunto de datos (etiquetados) para entrenarlo. Los idiomas para los que se podría entrenar son la mayoría de los europeos.

Un ejemplo de su funcionamiento. Las dos frases siguientes comparten casi las mismas palabras pero en orden diferente:
“Al tener la web, no pierdes el tiempo por teléfono.” [Reconocida como: positiva]
“En el teléfono os hacen perder el tiempo y no tienen web.” [Reconocida como: negativa]
Muchos analizadores tradicionales se basan en contar palabras clave que se consideran positivas y negativas, pero en Opinum lo que importa es la secuencia.

La primera prueba con Opinum está hecha con opiniones del ámbito financiero: bancos, cajas, productos financieros... un tema bonito. (Hablando de ironía, Opinum no está preparado para opiniones irónicas, aunque sí que hay trabajos en análisis del sentimiento que abordan este tema). Podéis probar esta versión on-line, en http://aplica.prompsit.com/es/opinum. Se trata de introducir una opinión completa, no sólo unas palabras o una frase:

¿Cómo funciona? Se entrenan dos modelos: uno para las opiniones positivas y otro para las negativas. Los modelos están basados en n-gramas, capturando la información de secuencia. 
¿Cuál es la dificultad? Para construir este tipo de modelos hace falta una cantidad enorme de texto, y no suele haber corpus tan grandes de opiniones etiquetadas.
¿Cómo se supera el problema? Simplificando los textos simplificamos el modelo de lengua, de manera que con los datos que contamos es factible entrenarlos. 

La simplificación consiste en eliminar (casi) toda la morfología de los textos: desaparecen singulares, plurales, tiempos verbales, personas. La esencia del texto permanece, pero la complejidad se reduce drásticamente. Así, las frases siguientes serían equivalentes:
Me di cuenta de mi error cuando hablé con la directora.
Te darás cuenta de tus errores cuando hables con el director.
Os daríais cuenta de vuestro error cuando hablarais con los directores.
 porque todas pasan a ser:
PrnPers dar cuenta de DetPos error cuando hablar con el director.
Puede parecer un abuso, pero si torturas los datos lo suficiente, in the end they will confess.

[Bonev et al, 2012] B. Bonev, G. Ramírez-Sánchez, S. Ortiz Rojas. "Opinum: statistical sentiment analysis for opinion classification". WASSA - ACL2012.