The Benchmark
The goal of the benchmark is to evaluate the current methods, detect solved cases, track the progress on currently studied unresolved cases and stimulate research in new unresolved cases.
This goal is handled by the
evaluation tracks.
Each of these define a specific SfT scenario and uses the corresponding template-dataset pairs from the database.
The registration and/or 3D shape groundtruth was reserved for evaluation purposes only for these data used in the evaluation tracks.
The SfT benchmark proposes a complete evaluation methodology, designed to accomodate the variety of SfT algorithms in a unified and consistent manner.
Error Classes
All errors are computed using evaluation landmarks, which are predefined points evenly distributed in the template 3D shape.
These evaluation landmarks do not necessarily cover the template 3D shape.
We use three classes of errors.
A-errors are registration errors computed in the image and expressed in px.
B-errors and C-errors are 3D shape errors computed in camera coordinates and expressed in mm.
The difference between B-errors and C-errors is that the former is unregistered and the latter is registered.
Both errors are based on predicted evaluation landmarks in camera coordinates.
B-errors use the distance between these predictions and the groundtruth 3D shape while C-errors use the distance to the corresponding groundtruth evaluation landmarks in camera coordinates.
Consequently, the B-errors always underestimate the C-errors.
The C-errors are more sensible and they are preferred over the B-errors whenever possible.
We happen the visibility class whenever required to specify where the errors are or can be computed.
For instance, the A1-errors are registration errors computed only on the object's visible part in the image.
Handling Various Inputs and Outputs via Algorithm Classes and Optional Interpolation
Existing SfT algorithms may have different inputs and outputs.
They fall into three main classes: A-algorithms compute registration, B-algorithms require registration and compute 3D shape and C-algorithms compute registration and 3D shape.
Consequently, not all algorithms can be run on all datasets and the class of computable errors depends on the algorithm's and dataset's categories:
Algorithms |
Inputs |
Outputs |
A-datasets |
B-datasets |
C-datasets |
A-algorithms |
Template, image |
Registration |
A-errors |
All errors uncomputable |
A-errors |
B-algorithms |
Template, image, registration |
3D shape |
All errors uncomputable |
Do not run |
C-errors |
C-algorithms |
Template, image |
Registration, 3D shape |
A-errors |
B-errors |
A-errors and C-errors |
In addition, the outputs of SfT algorithms have different possible representations, which may be sparse or dense.
The benchmark uses the following rules to conduct evaluation and to accomodate as many types of outputs as possible:
- Landmark points for all error classes. The evaluation is based on a predefined set of landmark points whose location is given in the template. They were chosen to cover the template's 3D shape uniformly while taking the visibility class into account.
- A-errors. The evaluation of registration errors is based on the predicted location of the landmark points in the image. If the registration output of an algorithm is the set of landmark points, these are directly compared to the groundtruth image landmarks for evaluation. Alternatively, the registration output may be a set of 3D-2D correspondences between the template and the image other than the landmark points. In that case we use these correspondences to estimate the position of the landmark points using interpolation.
- C-errors. The evaluation of 3D shape errors is based on the predicted location of the landmark points in camera coordinates. If the 3D shape output of an algorithm is the set of landmark points, these are directly compared to the groundtruth 3D landmarks for evaluation. Alternatively, the 3D shape output may be a set of 3D-3D correspondences between the template and the camera coordinates other than the landmark points. In that case we use these correspondences to estimate the position of the landmark points using interpolation.
- B-errors. The evaluation of 3D shape errors is similar to the case of C-errors with the difference that the groundtruth position of the landmark points is unavailable. This is because groundtruth registration is not available. We use the closest point on the 3D shape groundtruth to compute the error for each landmark point. This systematically underestimates the 3D shape error.
- Registration groundtruth for B-algorithms. {A,C}-datasets provide registration groundtruth as a set of 3D-2D point correspondences between the template and the image. These correspondences were chosen realistically using SIFT. They are different from the evaluation landmarks which were chosen to uniformly cover the 3D shape.
- Interpolation.Whenever interpolation of corresponding points is required, we use the smoothest interpolant. This is simply a Radial Basis Function whose kernel is the Thin-Plate Spline's for a flat template and the absolute value for a non-flat template.
Providing Various Cases via Evaluation Tracks
We defined evaluation tracks to accomodate the different types of SfT scenarios.
These evaluation tracks will be dynamically updated to keep the benchmark up-to-date with the state of research.
As of today, the benchmark has the following 20 tracks:
Id | Name | Datasets | 𝒯 | Description |
001 | ET001-SimpleDefault | 001, 002 | 0.5 | Usual simple case: still-images ; thin-shell, flattenable, well-textured objects ; smooth, isometric deformations |
** The other evaluation tracks will be made available depending on the outcome of our paper submission **
For each evaluation track, the benchmark provides inclass algorithm rankings and inter-class algorithm rankings.
This is because some classes partially share their outputs which may thus be directly compared.
For instance, both A-algorithms and C-algorithms compute registration and may be evaluated and compared in this respect.
This provides valuable insights into how SfT should be solved.
Indeed, in the above example, it provides empirical evidence to answer the question of whether constraining registration by the 3D deformable shape as in C-algorithms brings an improvement or not over using image-level constraints only as in A-algorithms.
Error Statistics
Each evaluation landmark provides an error, whether for registration or 3D shape.
Our statistics are established over the set of landmarks in an image, for the joint set of landmarks of all images in a dataset or for the joint set of landmarks of all datasets in an evaluation track.
The error statistic are labelled with shortnames defined as follows:
- Error class. The first letter A, B or C indicates the error class.
- RMS statistic. The Root Mean Square Error statistic is written *RE, where * is A, B or C.
- Average statistic. The Average Error statistic is written *AE, where * is A, B or C.
- Median statistic. The Median Error statistic is written *ME, where * is A, B or C.
- Robustness statistics. The Robustness statistics are written *Rx, where * is A, B or C. They represent the percentage of evaluation landmarks whose error is a given value x.
- Accuracy statistics. The Accuracy statistics are written *Ax, where * is A, B or C. They represent the error at the x-th percentile.