We use the standard split, and train with WideResNet architecture [Zagoruyko and Komodakis, 2016] with depth 40.
For the OOD test dataset, we use the following six datasets:
Textures [Cimpoi et al., 2014],
SVHN [Netzer et al., 2011],
Places365 [Zhou et al., 2017],
LSUN-Crop [Yu et al., 2015],
LSUN-Resize [Yu et al., 2015],
iSUN [Xu et al., 2015].
Evaluation metrics. We evaluate the performance of OOD detection by measuring the following metrics:
(1) the false positive rate (FPR95) of OOD examples when the true positive rate of in-distribution examples is 95%;
(2) the area under the receiver operating characteristic curve (AUROC);
(3) the area under the precision-call curve (AUPR).
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
