EyeDentify: A Dataset for Pupil Diameter Estimation based on Webcam Images

1German Research Center for Artificial Intelligence (DFKI), Germany, 2RPTU Kaiserslautern-Landau, Germany

Abstract

In this work, we introduce EyeDentify, a dataset specifically designed for pupil diameter estimation based on webcam images. EyeDentify addresses the lack of available datasets for pupil diameter estimation, a crucial domain for understanding physiological and psychological states traditionally dominated by highly specialized sensor systems such as Tobii. Unlike these advanced sensor systems and associated costs, webcam images are more commonly found in practice. Yet, deep learning models that can estimate pupil diameters using standard webcam data are scarce. By providing a dataset of cropped eye images alongside corresponding pupil diameter information, EyeDentify enables the development and refinement of models designed specifically for less-equipped environments, democratizing pupil diameter estimation by making it more accessible and broadly applicable, which in turn contributes to multiple domains of understanding human activity and supporting healthcare

Dataset Collection and Processing

Figure 1. Data recording flow. Tobii eye-tracker records pupil diameter, and ChameleonView captures facial recordings using a webcam. Facial recordings start when the participant clicks on the button in the center. The start and end timestamp of the recording is collected in order to synchronize the data with an eye-tracker.

Figure 2. Data alignment flow of a single recording. To synchronize the 90 frames with the 270 Tobii-captured data points, each metric column is concatenated horizontally across the 90 data points from the three unique timestamps in the Tobii-captured CSV file, followed by computing a row-wise mean.

Figure 3. Pipeline of our data preprocessing. For face detection and landmark localization, we used Mediapipe to extract the respective cropped eye images (32x16), left and right, separately. Next, we applied blink detection on the cropped eyes using the Eye Aspect Ratio (EAR) and a pre-trained vision transformer for blink detection. Cropped eye images are then saved based on the EAR threshold and model confidence score.

Dataset Statistics & Distribution

Figure 4. Visualization of total frames vs. frames after pruning due to blink detection for one participant in all recording sessions (50 in total) as outlined in Section 3.4. Note that each recording takes a total of three seconds, which is why the impact of a blink and the amount of blinks can vary significantly (around 40 - 200 ms per blink).

Figure 5. Pupil diameter distribution of one participant during the recordings. A set of different pupil diameter measurements and webcam images were captured during the different three-second long sessions (in total, 50 sessions). The colors of the boxes indicate the display color used during the recordings (white, black, red, blue, yellow, green, gray, and white again).

Datasets Comparision

Table 1. Comparison of related datasets for eye monitoring. While most datasets have gaze coordinates [1, 2, 3, 4, 5, 6], there is a significant gap in pupil diameter informed [7, 8] datasets.
Dataset Participants Amount of data [frame] Public Gaze Coordinates Pupil Diameter
MAEB [1] 20 1,440 ✗ ✓ ✗
MPIIFaceGaze [2] 15 213,659 ✓ ✓ ✗
Dembinsky et al. [3] 19 648,000 ✓ ✓ ✗
Gaze360 [4] 238 172,000 ✓ ✓ ✗
ETH-XGaze [5] 110 1,083,492 ✓ ✓ ✗
VideoGazeSpeech [6] unknown 35,231 ✓ ✓ ✗
Ricciuti et al. [7] 17 20,400 ✗ ✓ ✓
Caya et al. [8] 16 unknown ✗ ✓ ✓
EyeDentify (ours) 51 212,073 ✓ ✓ ✓

Results

Table 2. 5-fold cross-validation of ResNet-18 and ResNet-50, evaluated separately for left and right eyes. Each group contains 10 randomly selected participants: 5 for validation and 5 for testing. The remaining participants were used to train the models. ResNet18 performs the best for the pupil diameter estimation regarding mean values on the test partitions, whereas ResNet-50 shows a lower standard deviation, indicating more robustness for varied test partitions.
Eye Model Validation
MAE ↓
Test
MAE ↓
Left ResNet-18 0.0837 ± 0.0135 0.1340 ± 0.0196
ResNet-50 0.1001 ± 0.0197 0.1426 ± 0.0167
Right ResNet-18 0.1054 ± 0.0173 0.1403 ± 0.0328
ResNet-50 0.1089 ± 0.0204 0.1588 ± 0.0203

Figure 6. Class Activation Map (CAM) visualizations of ResNet50 and ResNet18 for a test participant’s left and right eyes viewing different display colors on a monitor. True and Predicted values indicate the original and estimated pupil diameters of the left and right eyes in millimeters.


EyeDentify [Dataset and Code] by Vijul Shah, Ko Watanabe, Brian Moser, and Prof. Dr. Andreas Dengel are licensed under Creative Commons Attribution-NonCommercial 4.0 International