Person Recognition in Personal Photo Collections

Seong Joon Oh, Rodrigo Benenson, Mario Fritz, and Bernt Schiele

Abstract — Recognising persons in everyday photos presents major challenges (occluded faces, different clothing, locations, etc.) for machine vision. We propose a convnet based person recognition system on which we provide an in-depth analysis of informativeness of different body cues, impact of training data, and the common failure modes of the system. In addition, we discuss the limitations of existing benchmarks and propose more challenging ones. Our method is simple and is built on open source and open data, yet it improves the state of the art results on a large dataset of social media photos (PIPA).

This paper has been accepted to ICCV 2015.

  title={Person Recognition in Personal Photo Collections},
  author={Oh, Seong Joon and Benenson, Rodrigo and Fritz, Mario and Schiele, Bernt},
  booktitle = {ICCV},
  year={2015} }

Data & Downloads

Validation / Test set splits

We provide additional evaluation protocols (splits) on the People In Photo Albums (PIPA) dataset published in [2]. These broaden the scope of person recognition scenarios considered.

Splits can be downloaded here: pipa-splits.tar.gz.



All the models are based on caffe. We use the same network (AlexNet) for all the models. The network prototxt is given here: alexnet_extraction.prototxt

By default, models are pretrained on ImageNet and finetuned on PIPA for person recognition task. However, some models are finetuned either on a different database (e.g. CASIA heads) or with differen tasks (e.g. gender prediction). 



We also release the naeil (final system in the paper) scores in four different settings (original, album, time, day) and the evaluation code for regenerating the naeil results in the paper: naeil-evaluation.tar.gz


Attribute annotations

Long term attributes are the attributes that are fixed for a given identity. We have annotated five long term attributes (age, gender, glasses, hair colour, hair length) per identity based on the PIPA heads. The attributes are determined by manually observing multiple instances of each identity.

Long term attribute signals give a coarser supervision than the identity signal. Nonetheless, we find that the long term attribute and the identity supervisions are complementary [1].

PIPA attributes annotations can be downloaded here: attribute-annotations.tar.gz.


AgeInfantNot walking due to young age, in many pictures.
ChildBody size is not fully grown.
Young AdultBody size is fully grown & Age < 45.
Middle Age45 <= Age < 60
SeniorAge >= 60
Unknown / changingLittle visual evidence to determine. Not included in the finetuning of h_age.
GenderFemaleFemale looking persons.
MaleMale looking persons.
Unknown / changingLittle visual evidence to determine. Not included in the finetuning of h_gender.
GlassesNoneNo eyewear.
GlassesGlasses without major eye occlusion.
SunglassesGlasses with major eye occlusion.
Unknown / changingLittle visual evidence to determine. Not included in the finetuning of h_glasses.
Hair colourBlackCompletely black hair.
WhiteAny hint of whiteness.
OthersNeither of the above.
Unknown / changingLittle visual evidence to determine. Not included in the finetuning of h_haircolour.
Hair lengthNo hairNo hair on the scalp.
Less hairHairless for > 1/2 of the upper scalp.
Short hairHair length < 10 cm (when straightened).
Med hairHair does not extend below chin (when straightened).
Long hairHair extends below chin (when straightened).
Unknown / changingLittle visual evidence to determine. Not included in the finetuning of h_hairlength.


For the upper body attributes, as described in the paper, we finetune on the PETA database of pedestrians [3] with five long term attributes:

  • Age1: personalLess30
  • Age2: personalLess45
  • Gender: personalMale
  • Short hair: hairShort
  • Black hair: hair(multiclass) Black



We share the "photo-taken-date" metadata used for generating the "Time split": data_timestamp.mat. The times are in the format YYYY mm DD HH MM SS, in the instance order given by: index.txt. The data were collected using Flickr API.


For further information or data, please contact Seong Joon Oh <joon at>.


[1] Person Recognition in Personal Photo Collections. S. Oh, R. Benenson, M. Fritz and B. Schiele, IEEE International Conference on Computer Vision (ICCV), 2015, (to appear).

[2] Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues, N. Zhang, M. Paluri, Y. Taigman, R. Fergus and L. Bourdev, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[3] Pedestrian Attribute Recognition at Far Distance, Y. Deng, P. Luo, C. C. Loy, X. Tang, In Proceedings of ACM Multimedia (ACM MM), 2014.