InvisibleEye: Mobile Eye Tracking Using Multiple Low-Resolution Cameras and Learning-Based Gaze Estimation


Analysis of everyday human gaze behaviour has signi cant potential for ubiquitous computing, as evidenced by a large body of work in gaze-based human-computer interaction, attentive user interfaces, and eye-based user modelling. However, current mobile eye trackers are still obtrusive, which not only makes them uncomfortable to wear and socially unacceptable in daily life, but also prevents them from being widely adopted in the social and behavioural sciences. To address these challenges we present InvisibleEye, a novel approach for mobile eye tracking that uses millimetre-size RGB cameras that can be fully embedded into normal glasses frames. To compensate for the cameras’ low image resolution of only a few pixels, our approach uses multiple cameras to capture di erent views of the eye, as well as learning-based gaze estimation to directly regress from eye images to gaze directions. We prototypically implement our system and characterise its performance on three large-scale, increasingly realistic, and thus challenging datasets: 1) eye images synthesised using a recent computer graphics eye region model, 2) real eye images recorded of 17 participants under controlled lighting, and 3) eye images recorded of four participants over the course of four recording sessions in a mobile setting. We show that InvisibleEye achieves a top person-speci c gaze estimation accuracy of 1.79° using four cameras with a resolution of only 5 × 5 pixels. Our evaluations not only demonstrate the feasibility of this novel approach but, more importantly, underline its signi cant potential for  nally realising the vision of invisible mobile eye tracking and pervasive attentive user interfaces.


We used this first hardware prototype to record a dataset of more than 280,000 close-up eye images with ground truth annotation of the gaze location. A total of 17 participants were recorded, covering a wide range of appearances:
• Gender: Five (29%) female and 12 (71%) male
• Nationality: Seven (41%) German, seven (41%) Indian, one (6%) Bangladeshi, one (6%) Iranian,
and one (6%) Greek
• Eye Color: 12 (70%) brown, four (23%) blue, and one (5%) green
• Glasses: Four participants (23%) wore regular glasses and one (6%) wore contact lenses
For each participant, two sets of data were recorded: one set of of training data and a separate set of test data. For each set, a series of gaze targets was shown on a display that participants were instructed to look at. For both training and test data the gaze targets covered a uniform grid in a random order, where the grid corresponding to the test data was positioned to lie in between the training points. Since the NanEye cameras record at about 44 FPS, we gathered approximately 22 frames per camera and gaze target. The training data was recorded using a uniform 24 × 17 grid of points, with an angular distance in gaze angle of 1.45° horizontally and 1.30° vertically between the points. In total the training set contained about 8,800 images per camera and participant. The test set’s points belonged to a 23 × 16 grid of points and it contains about 8,000 images per camera and participant. This way, the gaze targets covered a field of view of 35° horizontally and 22° vertically.


The dataset consists of 17 folders. Each folder contains the two subfolders for train and test sets with the video frames from each of the four NanEye cameras as well as a .npy file with pixel coordinates on the display.



Please download the full dataset here(49.2GB).