SLAM-for-AR Competition @ ISMAR2019

Click here to visit our competition Homepage.

The competition results and the system descriptions has been published.
Competition Results - V-SLAM
Competition Results - VI-SLAM

Download

TrainingData : OneDrive or Our Server.
Registration Form : doc

Dataset Format

Each sequence would provide several ‘sensors’ along with their sensor.yaml file that specifies sensor type, intrinsic and extrinsic parameters. The sensor measurements(or measurement indices, for camera) is stored in data.csv. In our case, both camera and IMU data would be provided. The vicon and groundtruth is also treated like a ‘sensor’. Here is an example:

A01
|--camera
|   |--sensor.yaml
|   |--data.csv
|   `--data
|     |--771812250517066.png
|     |--771812283849357.png
|     `--...
|--imu
|   |--sensor.yaml
|   `--data.csv
|--vicon
|   |--sensor.yaml
|   `--data.csv
`--groundtruth
    |--sensor.yaml
    `--data.csv

Evaluation Instruction

The Format of Submission Result

The estimated 6 DoF camera poses (from camera coordinate to the world coordinate) are required to evaluate the performance. Considering that there is a certain randomness of estimation, each sequence is required to be run for 5 times, resulting in 5 pose files and 5 running time files. We will select the median result from all five results for evaluation. It should be noted that : The format for each pose file is described as follows:

timestamp[i] p_x p_y p_z q_x q_y q_z q_w

where (p_x, p_y, p_z) is the camera position, and the unit quaternion (q_x, q_y, q_z, q_w)is the camera orientation. You should output the real-time poses after each frame is processed (not the poses after final global optimization), and the output of poses should be in the same frame rate as the input camera images (Otherwise, the completeness evaluation would be affected).

The format for each running time file is described as follows

timestamp[i] t_pose

where t_pose denotes the system time (or cumulative running time in seconds and at least three decimal places, even for black frames in D6) when the pose is estimated.

Please submit a zip file containing all the poses and running time files. The structure of zip file should follow the form described as follows:

YourSLAMName/sequence_name/Round-pose.txt
YourSLAMName/sequence_name/Round-time.txt

e.g.
MY-SLAM/C0_test/0-pose.txt
MY-SLAM/C0_test/0-time.txt

You can click here to download the example.

Evaluation

We evaluate the overall performance of a SLAM system considering tracking accuracy, initialization quality, tracking robustness, relocalization time and the computation efficiency. The criteria are as follows:

$\color{black}{\varepsilon_{APE} / \varepsilon_{ARE}}$ - absolute positional / rotational error
$\color{black}{\varepsilon_{RPE} / \varepsilon_{RRE}}$ - relative positional / rotational error
$\color{black}{\varepsilon_{bad}}$ - the ratio of bad poses (100% - completeness)
$\color{black}{\varepsilon_{init}}$ - initialization quality
$\color{black}{\varepsilon_{RO}}$ - tracking robustness
$\color{black}{t_{RL}}$ - relocalization time

The detailed description of the above criteria can be found in the following paper:

Jinyu Li, Bangbang Yang, Danpeng Chen, Nan Wang, Guofeng Zhang, Hujun Bao. Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality. Journal of Virtual Reality & Intelligent Hardware, 2019, 1(4): 386-410. DOI:10.3724/SP.J.2096-5796.2018.0011. URL: http://www.vr-ih.com/vrih/html/EN/10.3724/SP.J.2096-5796.2018.0011/article.html

We convert each criteria error $\color{black}{\varepsilon_{i}}$ into a normalized score by $\color{black}{s_i=\frac{{\sigma_i}^2}{{\sigma_i}^2+{\varepsilon_i}^2}\times100\%}$ , where $\color{black}{{\sigma_i}^2}$ is the variance controlling the normalization function shape. The complete score is a weighted sum of all the individual scores as:

$\color{black}{S=w_{APE}s_{APE}+w_{ARE}s_{ARE}+w_{RPE}s_{RPE}+w_{RRE}s_{RRE}+w_{bad}s_{bad}+w_{init}s_{init}+w_{RO}s_{RO}+w_{RL}s_{RL}}$

The weight $\color{black}{w}$ and variance $\color{black}{\sigma}$ (V-SLAM / VI-SLAM) for each criteria are listed below:

$\color{black}{w_{APE}}$	$\color{black}{w_{ARE}}$	$\color{black}{w_{RPE}}$	$\color{black}{w_{RRE}}$	$\color{black}{w_{bad}}$	$\color{black}{w_{init}}$	$\color{black}{w_{RO}}$	$\color{black}{w_{RL}}$
1.0	1.0	0.5	0.5	1.0	1.0	1.0	1.0
$\color{black}{\sigma_{APE}}$	$\color{black}{\sigma_{ARE}}$	$\color{black}{\sigma_{RPE}}$	$\color{black}{\sigma_{RRE}}$	$\color{black}{\sigma_{bad}}$	$\color{black}{\sigma_{init}}$	$\color{black}{\sigma_{RO}}$	$\color{black}{\sigma_{RL}}$
72.46 / 55.83	7.41 / 2.48	6.72 / 2.92	0.26 / 0.17	20.68 / 2.38	2.79 / 1.85	2.27 / 0.95	0.65 / 1.42

The variance $\color{black}{\sigma}$ list above is obtained by computing the median of our previous evaluation results, which contain the results of 4 V-SLAM systems (PTAM, ORB-SLAM2, LSD-SLAM, DSO) and 4 VI-SLAM systems (MSCKF, OKVIS, VINS-Mono, SenseSLAM) evaluated on our previous released dataset.

You can evaluate your SLAM system with our training dataset using the evaluation tool: https://github.com/zju3dv/eval-vislam.

In the final round competition, we will test all systems on benchmarking PCs with the same hardware configuration. The running time will be taken into account for computing the final score according to the following equation:

$\color{black}{S^*=\frac{\text{min}(30,framerate)}{30}S}$

where $\color{black}{framerate}$ denotes the average framerate of the system.

It should be noted that not all sequences are evaluated for all the critera. The corresponding critera for the sequences are listed below:

Sequences	Corresponding Critera
C0-C11, D8-D10	APE, RPE, ARE, RRE, Badness, Initialization Quality
D0-D4	Tracking Robustness
D5-D7	Relocalization Time

Motion and Scene Type of Sequences

Sequence		Motion	Scene	Description
Xiaomi MI8	C0	inspect+patrol	floor	Walking and looking around the glossy floor.
	C1	inspect+patrol	clean	Walking around some texture-less areas.
	C2	inspect+patrol	mess	Walking around some random objects.
	C3	aiming+inspect	mess+floor	Random objects first, and then glossy floor.
	C4	aiming+inspect	desktop+clean	From a small scene to a texture-less area.
	C5	wave+inspect	desktop+mess	From a small scene to a texture-rich area.
	C6	hold+inspect	desktop	Looking at a small desktop scene.
	C7	inspect+aiming	desktop	Looking at a small desktop scene.
	C8	inspect+forward	clean	Walking forward in texture-less area at low position.
	C9	inspect+forward	mess	Walking forward in texture-rich area at low position.
	C10	inspect+downward	clean	Walking backward in texture-less area at low position.
	C11	inspect+downward	mess	Walking backward in texture-rich area at low position.
	D0	rapid-rotation	desktop	Rotating the phone rapidly at some time.
	D1	rapid-translation	desktop	Moving the phone rapidly at some time.
	D2	rapid-shaking	desktop	Shaking the phone violently at some time.
	D3	inspect	moving people	A person walks in and out.
	D4	inspect	covering camera	An object occasionally occluding the camera.
	D5	inspect	desktop	Similar to A6 but with black frames.
	D6	inspect	desktop	Similar to A6 but with black frames.
	D7	inspect	desktop	Similar to A6 but with black frames.
	D8	inspect	foreground+background	Walking around the near plane and far plane.
	D9	inspect	plant	Walking around the plant.
	D10	loop	office	Walking around the office with loop closure.

Dataset Preview

Video Source From YouTube

Video Source From bilibili

Competition Results - V-SLAM

$\text{Final Score} = \text{Benchmark Score} \times \text{Speed Penalty} \times 100$

Click here to download the detailed result.

Configuration of the benchmark PC:
- CPU : i7-9700K 3.60GHz
- Memory : 32G
- GPU : Nvidia RTX 2070-8G
- Hard Disk : Samsung SSD 850EVO 500G

Rank	Participants	Affiliation	System Name	APE	RPE	ARE	RRE	Badness	Initialization Quality	Robustness	Relocalization Time	Benchmark Score	Average FrameRate	Speed Penalty	Final Score	System Description
1	Zike Yan, Pijian Sun, Xin Wang, Shunkai Li, Sheng Zhang, Hongbin Zha	Key Laboratory of Machine Perception(MOE), School of EECS, Peking University	LF-SLAM	0.7155	0.4037	0.8115	0.2078	0.4554	0.7166	0.9972	0.9965	0.8331	30.0687	1.0000	83.31	Robust line tracking for monocular visual system (Doc would be uploaded after the acceptance of the paper)
2	Darius Rueckert	University of Erlangen-Nuremberg	Snake-SLAM	0.7543	0.5700	0.8229	0.2709	0.5183	0.7818	0.6685*	0.9974	0.8273*	106.1527	1.0000	82.73*	Doc
3	Neo Yuan Rong Dexter, Toh Yu Heng	Pensees	AR-ORB-SLAM2	0.5511	0.3850	0.5280	0.2028	0.6020	0.7080	0.9947	0.9890	0.7778	31.0803	1.0000	77.78
4	Xinyu Wei, Zengming Tang, Huiyan Wu, Jun Huang	Shanghai Advanced Research Institute, Chinese Academy of Sciences	PL-SLAM	0.6036	0.4754	0.7354	0.2376	0.4551	0.8062	0.9985	0.9916	0.8245	27.5107	0.9170	75.61	Doc Slides
5	Ao Li, Yue Ni	University of Science and Technology of China	Dy-SLAM	0.7588	0.4390	0.7953	0.2002	0.5776	0.6018	0.9981	0.8578	0.8182	3.8851	0.1295	10.60	Doc

* Note: the output trajectory files by Snake-SLAM (the final submitted executable program during the competition) denote the invalid poses by an identity one, i.e. (0 0 0 -0 -0 -0 1). Unfortunately, it does not fit the invalid pose format that we define, so our evaluation tool still regarded them as valid poses. As a result, the computed robustness score was affected. If we remove these invalid poses, the new robustness score of Snake-SLAM becomes 0.9348, and the final score is 87.17. Since this format issue was discovered after the competition, and did not appear in the results of other teams, the competition ranking can no longer be changed.

Competition Results - VI-SLAM

$\text{Final Score} = \text{Benchmark Score} \times \text{Speed Penalty} \times 100$

Click here to download the detailed result.

Configuration of the benchmark PC:
- CPU : i7-9700K 3.60GHz
- Memory : 32G
- GPU : Nvidia RTX 2070-8G
- Hard Disk : Samsung SSD 850EVO 500G

Rank	Participants	Affiliation	System Name	APE	RPE	ARE	RRE	Badness	Initialization Quality	Robustness	Relocalization Time	Benchmark Score	Average FrameRate	Speed Penalty	Final Score	System Description
1	Shaozu Cao, Jie Pan, Jieqi Shi, Shaojie Shen	Hong Kong University of Science and Technology	VINS-Mono	0.6341	0.4225	0.4945	0.2429	0.8175	0.5678	0.8037	0.8572	0.7513	30.1062	1.0000	75.13	Doc Slides
2	Xinyu Wei, Zengming Tang, Huiyan Wu, Jun Huang	Shanghai Advanced Research Institute, Chinese Academy of Sciences	PLVI-SLAM	0.2767	0.0994	0.3383	0.1675	0.0097	0.2813	0.8183	0.6654	0.4205	18.3380	0.6113	25.71	Doc
3	Jianhua Zhang, Shengyong Chen, Mengping Gui, Jialing Liu, Luzhen Ma, Kaiqi Chen	Zhejiang University of Technology	MMF-SLAM	0.1203	0.0834	0.1760	0.1407	0.0010	0.3214	0.0102	0.0000	0.1235	29.9620	0.9987	12.33	Doc Slides

Competition Chairs

Guofeng Zhang

Zhejiang University, China

Jing Chen

Beijing Institute of Technology, China

Guoquan Huang

University of Delaware, USA

Acknowledgement

We thank Bangbang Yang for his great help in preparing competition dataset, building the website and evaluating the participating SLAM systems.