Table 1. Comparison of related datasets for eye monitoring. While most datasets have gaze coordinates [1, 2, 3, 4, 5, 6], there is a significant gap in pupil diameter informed [7, 8] datasets.
Dataset	Participants	Amount of data [frame]	Public	Gaze Coordinates	Pupil Diameter
MAEB [1]	20	1,440	✗	✓	✗
MPIIFaceGaze [2]	15	213,659	✓	✓	✗
Dembinsky et al. [3]	19	648,000	✓	✓	✗
Gaze360 [4]	238	172,000	✓	✓	✗
ETH-XGaze [5]	110	1,083,492	✓	✓	✗
VideoGazeSpeech [6]	unknown	35,231	✓	✓	✗
Ricciuti et al. [7]	17	20,400	✗	✓	✓
Caya et al. [8]	16	unknown	✗	✓	✓
EyeDentify (ours)	51	212,073	✓	✓	✓

Table 2. 5-fold cross-validation of ResNet-18 and ResNet-50, evaluated separately for left and right eyes. Each group contains 10 randomly selected participants: 5 for validation and 5 for testing. The remaining participants were used to train the models. ResNet18 performs the best for the pupil diameter estimation regarding mean values on the test partitions, whereas ResNet-50 shows a lower standard deviation, indicating more robustness for varied test partitions.
Eye	Model	Validation MAE ↓	Test MAE ↓
Left	ResNet-18	0.0837 ± 0.0135	0.1340 ± 0.0196
ResNet-50	0.1001 ± 0.0197	0.1426 ± 0.0167
Right	ResNet-18	0.1054 ± 0.0173	0.1403 ± 0.0328
ResNet-50	0.1089 ± 0.0204	0.1588 ± 0.0203

Quantitative Mean Absolute Error (MAE) ↓ comparison across different pre-trained SR methods and pupil diameter prediction models for both left and right eyes. The lowest errors are highlighted.
Eye	Scale	Method	ResNet18	ResNet50	ResNet152
Left	×1	No SR	0.1329 ± 0.0235	0.1280 ± 0.0164	0.1259 ± 0.0176
×2	Bi-cubic	0.1340 ± 0.0196	0.1402 ± 0.0327	0.1225 ± 0.0166
GFPGAN	0.1428 ± 0.0360	0.1486 ± 0.0195	0.1339 ± 0.0122
CodeFormer	0.1328 ± 0.0245	0.1476 ± 0.0364	0.1442 ± 0.0189
Real-ESRGAN	0.1265 ± 0.0179	0.1369 ± 0.0153	0.1384 ± 0.0195
SRResNet	0.1286 ± 0.0139	0.1249 ± 0.0062	0.1391 ± 0.0261
HAT	0.1251 ± 0.0129	0.1277 ± 0.0241	0.1418 ± 0.0197
×4	Bi-cubic	0.1375 ± 0.0192	0.1382 ± 0.0287	0.1497 ± 0.0275
GFPGAN	0.1397 ± 0.0244	0.1230 ± 0.0122	0.1348 ± 0.0183
CodeFormer	0.1383 ± 0.0170	0.1404 ± 0.0201	0.1413 ± 0.0164
Real-ESRGAN	0.1338 ± 0.0178	0.1306 ± 0.0160	0.1316 ± 0.0183
SRResNet	0.1384 ± 0.0234	0.1345 ± 0.0163	0.1509 ± 0.0242
HAT	0.1330 ± 0.01191	0.1305 ± 0.0115	0.1454 ± 0.0179
Right	×1	No SR	0.1548 ± 0.0273	0.1501 ± 0.0214	0.1452 ± 0.0163
×2	Bi-cubic	0.1402 ± 0.0327	0.1558 ± 0.0214	0.1500 ± 0.0194
GFPGAN	0.1470 ± 0.0328	0.1628 ± 0.0286	0.1499 ± 0.0130
CodeFormer	0.1480 ± 0.0188	0.1519 ± 0.0288	0.1542 ± 0.0423
Real-ESRGAN	0.1505 ± 0.0235	0.1502 ± 0.0154	0.1526 ± 0.0350
SRResNet	0.1531 ± 0.0213	0.1490 ± 0.0328	0.1391 ± 0.0261
HAT	0.1477 ± 0.0321	0.1349 ± 0.0226	0.1413 ± 0.0372
×4	Bi-cubic	0.1383 ± 0.0287	0.1319 ± 0.0222	0.1424 ± 0.0232
GFPGAN	0.1595 ± 0.0157	0.1559 ± 0.0204	0.1498 ± 0.0137
CodeFormer	0.1450 ± 0.0152	0.1454 ± 0.0296	0.1441 ± 0.0211
Real-ESRGAN	0.1396 ± 0.0164	0.1321 ± 0.0375	0.1520 ± 0.0336
SRResNet	0.1462 ± 0.0234	0.1345 ± 0.0163	0.1446 ± 0.0220
HAT	0.1489 ± 0.0136	0.1379 ± 0.0198	0.1369 ± 0.0236

Leave one participant out cross validation (LOPOCV) of ResNet18 and ResNet50, evaluated separately for left and right eyes. We excluded one participant per training run and tested the model performance on the left-out participant. This process was repeated for all participants, with the table summarizing the mean and standard deviation of performance metrics across all runs.
Eye	Model	Test MAE ↓
Left	ResNet-18	0.080668 ± 0.041350
ResNet-50	0.077170 ± 0.044088
Right	ResNet-18	0.102757 ± 0.054122
ResNet-50	0.088437 ± 0.041912

EyeDentify: A Dataset for Pupil Diameter Estimation based on Webcam Images

Abstract

Dataset Collection and Processing

Figure 2. Data alignment flow of a single recording. To synchronize the 90 frames with the 270 Tobii-captured data points, each metric column is concatenated horizontally across the 90 data points from the three unique timestamps in the Tobii-captured CSV file, followed by computing a row-wise mean.

Dataset Statistics & Distribution

Datasets Comparision

Results

Figure 6. Class Activation Map (CAM) visualizations of ResNet50 and ResNet18 for a test participant’s left and right eyes viewing different display colors on a monitor. True and Predicted values indicate the original and estimated pupil diameters of the left and right eyes in millimeters.

Webcam-based Pupil Diameter Prediction Benefits from Upscaling

Abstract

Dataset Processing

SR Methods Comparision

Datasets Comparision

Figure 3: Comparison of applying pre-trained SR models on the EyeDentify Dataset.

Results

PupilSense: A Novel Application for Webcam-Based Pupil Diameter Estimation

Abstract

Dataset Collection

Dataset Processing

Results

Result of the mean absolute error (MAE) for leave one participant out cross validation (LOPOCV). The figure compare output of ResNet18 and ResNet50 on left eyes and right eyes datasets.

Class Activation Map (CAM) visualizations of ResNet50 and ResNet18 for the left and right eyes of a test participant viewing different display colors on a monitor. True and Predicted values indicate the original and estimated pupil diameters of the left and right eyes in millimeters.

WebApp - PupilSense

Insights From User Experience