Labelled Pupils in the Wild (LPW): Pupil detection in unconstrained environments

Abstract

We present labelled pupils in the wild (LPW), a novel dataset of 66 high-quality, high-speed eye region videos for the development and evaluation of pupil detection algorithms. The videos in our dataset were recorded from 22 participants in everyday locations at about 95 FPS using a state-of-the-art dark-pupil head-mounted eye tracker. They cover people with different ethnicities, a diverse set of everyday indoor and outdoor illumination environments, as well as natural gaze direction distributions. The dataset also includes participants wearing glasses, contact lenses, as well as make-up. We benchmark five state-of-the-art pupil detection algorithms on our dataset with respect to robustness and accuracy. We further study the influence of image resolution, vision aids, as well as recording location (indoor, outdoor) on pupil detection performance. Our evaluations provide valuable insights into the general pupil detection problem and allow us to identify key challenges for robust pupil detection on head-mounted eye trackers.

 

Above you can see a few example images from the dataset. The first row shows eyes with very different appearances indicating the variety of our dataset. The second row shows the most difficult cases according to our evaluation: 1. strong shade, 2. eyelid occlusion, 3. reflection on glasses and 4. strong makeup. The third row shows croped images around the pupil region under challenging conditions: 1. reflection on the pupil, 2. self occluded, 3.strong sunlight and shade and 4. occlusion by glasses.

Download

Download: Please download the full dataset here (2.4 GB).
Contact: Andreas Bulling Campus E1.4, room 628, E-mail: bulling@mpi-inf.mpg.de

The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following paper:

Marc Tonsen, Xucong Zhang, Yusuke Sugano, Andreas Bulling
Labelled pupils in the wild: A dataset for studying pupil detection in unconstrained environments (Inproceeding) Proc. of the 9th ACM International Symposium on Eye Tracking Research & Applications (ETRA 2016), pp. 139-142, 2016.

doi: 10.1145/2857491.2857520

The LPW Dataset

We  designed a  data  collection procedure  with  two main  goals  in mind:  1) to record samples of participants under different conditions,  i.e.   different lighting conditions and eye camera positions, and 2) to have a large variability in appearance of participants, such as gender, ethnicity and use of vision aids. We took each participant to  a  different  set  of  locations  and  recorded  their  eye  movements while looking at a moving gaze target.

Participants

Detailed information about our participants can be found in Table 2. We recruited 22 participants including 9 female through university mailing lists and personal communication.   Among them are five different ethnicities:  11 Indian, 6 German, 2 Pakistani, 2 Iranian, and 1 Egyptian. In total we had five different eye colors: 12 brown, 5  black,  3  blue-gray,  1  blue-green,  1  green.   Also  5  people  had impaired vision, 2 wore glasses and 1 wore contact lenses.  Strong eye make-up was worn by 1 person (with participant ID 22).

Apparatus

The eye tracker used for the recording was a high-speed Pupil Pro head-mounted eye tracker that record eye videos with 120 Hz [Kass- ner et al. 2014]. In order to capture high frame rate scene videos, we replaced the original scene camera with a PointGrey Chameleon3 USB3.0 camera recording at up to 149 Hz.  The hardware set up is shown in Figure 2a and Figure 2b. It allowed us to record all videos with 95 FPS, which is a speed at which even fast eye movements last through several frames.

Procedure

As shown in the right image below, the participants were instructed to look at a moving red ball as a fixation target during the data collection.  The position of the red ball in the visual field of the participant is shown in middle image below with an image captured by the scene camera. In order to cover as many different conditions as possible, we randomly picked the recording locations in and around of several buildings.   Each  location  was  not  chosen  more  than  once  during  the whole recording of all participants.  34.3% of the recordings were done outdoors, in 84.7% natural light was present and in 33.6% artificial light was present.  Besides locations, we have also tweaked the angle of the eye cameras such that the dataset contains a wide range of camera angles from frontal views to highly off-axis angles. This is done by either asking the participant to take the tracker off and put it back on, or manually moving the camera.  With each of the 22 participant we recorded three videos with around 20 seconds length, yielding 130,856 images overall.Participants could keep their glasses and contact lenses on during the recording.

Ground truth annotation

We used different methods for annotation. In many easy cases such as some indoor recordings, the pupil area has a clear boundary and no strong reflections inside.   We annotated these frames by manually selecting 1 or 2 points inside the pupil area,  using them as seed points to find the largest connected area with similar intensity values. The pupil center is defined as the centroid of this area. Some  recordings  have  a  clear  scene  video  but  strong  reflections/noise  in  the  eye  video,  such  as  outdoor  recordings  under strong sunlight.  In those cases, we tracked the fixation target (red ball) in the scene videos and manually annotated part of the eye pupil positions in the eye videos. From this calibration data we com- puted a mapping function from target positions to pupil positions. In addition, we examined the annotated videos again to find wrong annotations, and corrected them by selecting 5 or more points on the pupil boundary and fitting an ellipse to them. The center of the ellipse was used as a refined pupil center position.

Evaluation

To evaluate the difficulty and challenges contained in our dataset,we have analysed the performance of five state-of-the art pupil detection algorithms. Pupil Labs[1] is the algorithmused in the Pupil Pro eye tracker. Swirski[2] and ExCuSe[3] are taken as examples of the state-of-the-art algorithms. Isophote[4] and Gradient[5] are two simple algorithms designed forthe iris shape fitting task on low-resolution remote eye images. In the following sections we examine several performance values andhighlight key challenges in our dataset. We ran the evaluations on a Linux system desktop with an Intel E5800 CPU 3.16GHz processorand 8GB memory. The average processing speed of each algorithmwas: Isophote225.59 fps,Pupil Labs 45.09 fps, Gradient 43.52 fps, Swirski 5.44 fps, ExCuSe 1.90 fps.

Accuracy and Robustness

The top-left image below shows the cumulative error distribution of all algorithms onthe entire dataset. One can see that Pupil Labs, Swirski and ExCuSe all return very good results in roughly 30% of all cases with less then 5px error; however their performances fall off quickly. It isworth mentioning that ExCuSe falls off last. The Gradient detector follows a similar curve but shifted to the right, indicating a highererror on average. The Isophote detector’s curve rises the least steep indicating the highest error on average. Pupil Labs stands out by cutting off very early. While giving fairly accurate results in almost40% of all cases, it completely fails in the other 60%. ExCuSe, Swirski and the Gradient detector return reasonable results with an error of roughly 40px in about 70% of all cases, indicating a higher robustness in comparison toPupil Labs.
Overall there is no satisfying performance on the dataset yet for gaze estimation. This indicates the difficulty of our dataset, i.e., pupil detection in the wild is still challenging for current methods. According to our observations, the hardest samples are mainly casesof strong shadows, eyelid occlusions, reflections from glasses andstrong make-up.

 

Indoor vs Outdoor

Outdoor images are especially challenging for pupil detection algorithms, since the infrared portion of strong sunlight can create intense reflections and shadows on the pupil and iris. Light falling directly into the camera lense cancreate additional reflections. The figure in the top-right below shows the cumulative error distribution for the mean error of all algorithms for indoor and outdoor scenes. While on indoor scenes roughly 60% of all detectionshad an error of 50px or lower, on outdoor scenes it is only about 50%.

 

Glasses and Makeup

For users with impaired eyes, the possibility to wear glasses alongwith the eye tracker is very important. However, glasses can cause intense reflections in the images and the pupil will often be partially occluded. The performance of the examined algorithms is significantly worse for participants wearing glasses compared to ones without glasses (see bottom-left figure below). According to our evaluation, makeup also greatly disturbs the per-formance of the examined algorithms, which is also visible in the figure. One could expect this, since all algorithms either look forlarge black blobs or strong edges, which both could be also createdby makeup.

 

Resolution

The examined algorithms have been designed for different systemsworking with different image resolutions. Namely the Isophote and Gradient detectors have been designed to work on low-resolution images while the others are usually for higher resolutions. In the bottom-right figure below, we show the performance of each algorithm for different resolutions. The error is normalized by image width, and the percentage of detections with an error lower then 0.02 is shown. Parameters depending on the image size have been modified accordingly for all algorithms. The results for 30p of Swirskiare missing because we couldn’t get it to work on that resolution. It is important to note that in the implementations of the Gradient and Isophote detector the input image was by default already downsampled to 80x35pixels. Thus the performance for those algorithms remainsconstant, except for the smallest resolutions. As one can see theother algorithms all start to drop significantly in performance at some point while decreasing the resolution, until the performance becomes equal or worse to the former mentioned method. Interestingly, the performances of Swirski and ExCuSe improved when downsampling from 480p to 240p. It indicates that 240p resolution is already enough for those methods, and higher resolution can harm the performance possibly due to increased image noise.

 

References

[1] KASSNER, M., PATERA, W.,ANDBULLING, A.
Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction.
InAdj. Proc. UbiComp 2014, 1151–1160.

[2] SWIRSKI, L., BULLING, A.,ANDDODGSON, N.
Robustreal-time pupil tracking in highly off-axis images.
InProc. ETRA 2012, 173–176.

[3] W. FUHL, T. C. KBLER, K. S., W. R., E. K.
Excuse: Robustpupil detection in real-world scenarios.
InProc. CAIP 2015.

[4] VALENTI, R.,ANDGEVERS, T.
Accurate eye center loca-tion through invariant isocentric patterns.
IEEE Transactions on Pattern Analysis and Machine Intelligence 34 2012, 9, 1785–1798.

[5] TIMM, F.,ANDBARTH, E.
Accurate eye centre localisationby means of gradients.
InProc. VISAPP 2011, 125–130.