Shape-from-Template

A Database for SfT

We provide synthetic and real datasets with groundtruth. These were designed to aid researchers develop their algorithms, evaluate their performance, benchmark the field and motivate future research in unsolved cases. We provide the templates and the input images separately. The input images are contained in datasets. Any SfT algorithm takes as input a template-dataset pair. We designed our database to include such pairs in and out of the scope of current algorithms. We have based our choice on the properties of the template and the input image.

Properties

More precisely, we identified 25 properties from 5 categories revealed by the SfT setup:

We identify each property by a two letter code. The first letter gives the category and the second letter gives the property within the category. For instance, DI refers to the Isometry property in the Deformation category. A property may be related to the template or to the dataset. The categories and properties are as follows:

The Deformation category depends on the object's deformation law. It has 2 properties for the template, Isometry and Localness, and 3 properties for the dataset, Non-extension, Smooth, Topology
The Shape category only concerns the template and depends on the object's shape. It has 4 properties: Flattenable, Zero-genus, Thin-shell and Smooth.
The Appearance category only concerns the template and depends on the object's texturemap and reflectance law. It has 2 properties: Texture and Matte.
The Imaging category only concerns the dataset and depends on the geometric and photometric imaging conditions. It has 7 properties: Perspective, Calibration, Sharpness, Resolution, Exposure, Noise and Video.
The Layout category only concerns the dataset and describes what was photographed. It has 7 properties: Illumination, Clutter, Presence, Instances, Templates, Baseline and Occlusion.

Each property has a set of possible values, each associated to a difficulty score. The values are coded by one character. The most common ones are Y and N for Yes and No. A weight is also attached to each properties to reflect the extent to which it contributes to the overall difficulty. The weighted sum of the difficulty score for all properties gives us the difficulty score 𝒯 of a dataset. A detailed description and analysis of the properties may be found in our submitted paper. The following two tables summarize the 8 template properties and the 17 dataset properties.

Template properties

Categories	Properties	Code	Weight	Pairs of (value, difficulty score)
1 - Deformation
	Isometry	DI	1.0	(Yes, 0.0) / (No, 1.0)
	Localness	DL	0.5	(Yes, 0.0) / (No, 1.0)
2 - Shape
	Flattenable	SF	0.1	(Yes, 0.0) / (No, 1.0)
	Zero-genus	SZ	0.1	(Yes, 0.0) / (No, 1.0)
	Thin-shell	ST	0.1	(Yes, 0.0) / (No, 1.0)
	Smooth	SS	0.5	(Yes, 0.0) / (No, 1.0)
3 - Appearance
	Texture	AT	2.0	(Strong, 0.0) / (Poor, 0.5) / (Repetitive, 0.7) / (Absent, 1.0)
	Matte	AM	0.5	(Yes, 0.0) / (No, 1.0)

Dataset properties

Categories	Properties	Code	Weight	Pairs of (value, difficulty score)
1 - Deformation
	Non-extension	DN	1.0	(Yes, 0.0) / (No, 1.0)
	Smooth	DS	1.0	(Yes, 0.0) / (No, 1.0)
	Topology	DT	1.0	(Preserved, 0.0) / (Altered, 1.0)
4 - Imaging
	Perspective	IP	0.5	(Yes, 0.0) / (No, 1.0)
	Calibration	IC	1.0	(Yes, 0.0) / (No, 1.0)
	Sharpness	IS	1.0	(Sharp, 0.0) / (Optical-blur, 0.7) / (Motion-blur, 1.0)
	Resolution	IR	0.5	(High, 0.0) / (Medium, 0.5) / (Low, 1.0)
	Exposure	IE	0.5	(Regular, 0.0) / (Under-exposed, 1.0) / (Over-exposed, 1.0)
	Noise	IN	0.5	(Regular, 0.0) / (Strong, 1.0)
	Video	IV	0.5	(Yes, 0.0) / (No, 1.0)
5 - Layout
	Illumination	LI	0.5	(Uniform, 0.0) / (Varying, 1.0)
	Clutter	LC	0.5	(Tidy, 0.0) / (Cluttered, 1.0)
	Presence	LP	0.5	(Always, 0.0) / (Perhaps, 1.0)
	Instances	LI	0.5	(Single, 0.0) / (Multiple, 1.0)
	Templates	LT	0.5	(Single, 0.0) / (Multiple, 1.0)
	Baseline	LB	1.0	(Short, 0.0) / (Wide, 1.0)
	Occlusion	LO	1.0	(None, 0.0) / (External, 0.5) / (Self, 1.0)

The Templates

A template is named as T###{D,P}-Keywords, where T simply stands for 'template', ### is a unique numerical identifier and Keywords are concatenated simple words describing the object. For instance, T001D-A2PaperCaptain is the first template of the set and represents a piece of paper. The data for a template come as an zip archive named T###{D,P}-Keywords.zip. The letter D or P coming after the identifier indicates how the template was constructed. D stands for 'digital-first' and means that the template was created on the computer before the physical object existed. P stands for 'physical-first' and means that the template was acquired by scanning a physical object. We now give the list of available templates with a short description. Each Id links to the template's archive:

Id	Name	DI	DL	SF	SZ	ST	SS	AT	AM	𝒯	Description
001	T001D-A2PaperCaptain	Y	Y	Y	Y	Y	Y	S	Y	0.0	Well-textured A2 piece of paper showing a Captain America comics cover page
002	T002D-A4PaperMagazine	Y	Y	Y	Y	Y	Y	S	Y	0.0	Well-textured A4 piece of paper showing a magazine cover page

** The other templates will be made available depending on the outcome of our paper submission **

The Datasets

The basic contents of a dataset is a set of input images. In most cases a dataset's images show a unique object in various conditions and the dataset is thus related to a unique template. The datasets come with additional information, in particular regarding groundtruth. In some cases, designed to evaluate SfT as an object detector, the dataset is still related to a unique template but the images may or may not show the object. In some other cases, the images may show multiple instances of the object or multiple objects for which a template is available. In the latter case the dataset is related to multiple templates. A dataset is named as D***{S,R}{A,B,C}{1,2,3}-T###-Keywords, where D simply stands for 'dataset', *** is a unique numerical identifier, ### is the corresponding template's identifier and Keywords are concatenated simple words describing the dataset's deformation and imaging conditions. For instance, D001RC2-T001-Simple is the first dataset, related to the first template and shows very simple deformation and imaging cases. For the datasets concerned with multiple templates the template's identifier is left as '###'. The letter S or R coming after the identifier indicates if the dataset is synthetic or real. It is followed by a letter A, B or C indicating the dataset class. This is the type of groundtruth available: A for registration only, B for 3D shape only and C for registration and 3D shape. The dataset class is a key to determine which class of algorithms can use it as input and which evaluation metrics can be used. We thus use the names A-dataset, B-dataset and C-dataset to unambiguously refer to the datasets. The number 1, 2 or 3 following the dataset class is the dataset visibility. It forms an important characteristic which we describe in the next section. The data for a dataset come as an zip archive named D***{S,R}{A,B,C}-T###-Keywords.zip. We now give the list of available datasets with a short description. Each Id links to the dataset's archive:

Name

#images

𝒯

Description

001

D001RC2-T001-PaperSimple

010

0.5

Usual simple case: still-images ; smooth, isometric deformations

002

D002SC2-T002-PaperSimple

016

0.5

Usual simple case: still-images ; smooth, isometric deformations

** The other datasets will be made available depending on the outcome of our paper submission **

About Visibility

We define three visibility classes to characterize a dataset's groundtruth. These classes are ordered from weakest to strongest. 1-visibility is for datasets with groundtruth available only for the object's part visible in the image. These datasets typically occur when groundtruth is captured for a video with a depth sensor. 2-visibility is for datasets with groundtruth available for the object's complete outer surface. These datasets typically occur when groundtruth is captured for a still deformation using Structure-from-Motion. 3-visibility is for datasets with groundtruth available for the object's outer and inner part. These datasets are the most difficult to acquire. They are only available as the results of simulation. The visibility class not only applies to the datasets but also to the error statistics and to the algorithms.

Archive Contents

A template archive contains the following files:

texturemap.png -- this image contains the object's texturemap.
shape.obj and shape.mtl -- these form a 3D shape model which uses the texturemap image. This is conveniently viewed with the Meshlab software.
scale.txt -- for flattenable objects objects only. This file contains a single number in px/mm which allows one to convert lengths in mm to px and vice-versa. More precisely, one converts a length measured on the texturemap in px (number of pixels) to mm (millimeters) by dividing it by the provided scale.
construction -- this optional folder contains data used to construct the template, such as the images from which we ran Structure-from-Motion or the 3D scanner's data file.

A dataset archive usually includes several input images indexed from 01. The images may be stills or video frames. The archive contains the following files where %% is the image index:

%%_image.png -- this is the input image.
%%_calib.txt -- this contains camera calibration. It contains 5 coefficients for matrix K and 5 distortion coefficients following Matlab's Computer Vision toolbox format.
%%_GT_crsp.txt -- for {A,C}-datasets. This file contains template-image point correspondences. These were obtained using SIFT and thus may be noisy but do not contain mismatches. Each line concerns a single point correspondence and gives 7 numbers. The first 2 numbers are the point coordinates on the template's texturemap in px. The next 3 numbers give the point coordinates on the template's 3D shape in mm. The last 2 numbers give the point coordinates in the input image in px.
%%_GT_crsp_3D.txt -- for C datasets. This file contains the groundtruth 3D position for each correspondence from %%_GT_crsp.txt. Similarly, each correspondence is on a line, which gives 3 numbers representing the coordinates in camera coordinates in mm.
%%_GT_shape.obj, %%_GT_shape.mtl and %%_GT_shape.png -- for {B,C}-datasets. These files form the groundtruth 3D shape model. Note that this is not a deformation of the template but an independently computed groundtruth. This is conveniently viewed with the Meshlab software.
%%_eval.txt -- this gives evaluation data. The format closely follows that of %%_GT_crsp.txt and %%_GT_crsp_3D.txt but for so-called landmark points regularly sampled on the template's 3D shape. For A datasets, each line gives 7 numbers formatted as in %%_GT_crsp.txt. For C datasets this is complemented by 3 numbers as in %%_GT_crsp_3D.txt. For B datasets, only the first 5 numbers are given, representing evaluation points in the template's 3D shape and texturemap.
construction -- this optional folder contains data used to construct the dataset, such as the images from which we ran stereo or Kinect's frames.

Some archives were made for evaluation purposes and do not disclose groundtruth.

Data Storage Structure and Access Functions

The data are stored in a folder which we recommend to name 'SfTBenchmark'. This is the benchmark's root folder. This has three sub-folders: 'templates', 'datasets' and 'code'. The first two contain the template and dataset folders named as the above described zip archives. The third folder contains Matlab functions which we provide to setup and access the database easily and run evaluation. These are as follows:

SfTbm_setRootPath -- this is to set the root path as a global variable named 'SfTbmRoot' and add the code to Matlab's access path. Example: SfTbm_setRootPath('c:\data\SfTBenchmark'). We advise one to call this function in Matlab's startup.m.
SfTbm_getTemplate -- this is to get properties of a template from the template's id. Example: SfTbm_getTemplate(1,'Flatness').
SfTbm_getDataset -- this is to get properties of a dataset from the dataset's id and possibly the image number. Example SfTbm_getDataset(1,'NumberOfImages').
SfTbm_readTemplate -- this is to read a template's data from the template's id. Example: SfTbm_readTemplate(1,'Texturemap').
SfTbm_readDataset -- this is to read a dataset's image data from the dataset's id and possible the image number. Example: SfTbm_readDataset(1,1,'Image').
SfTbm_eval -- this is to evaluate one's results and uses the dataset's evaluation structure.

A basic example of use is as follows:

datasetId = 1;
imageId = 1;
[p,P,q,K] = SfTbm_readDataset(datasetId,imageId,'TemplateTexturemapPoints','Template3DShapePoints','ImagePoints','Intrinsics');
Q = my_sft_shape_inference_algorithm(p,q,K);
E = SfTbm_readDataset(datasetId,imageId,'EvaluationData');
shapeError = SfTbm_eval(E,'ShapeError','NonLandmarks',P,Q);