System Evaluation


We open sourced our evaluation toolkit at eval-vislam.

Acciracy(APE, RPE, ARE, RRE, Completeness)

Usage:
./accuracy <groundtruth> <input> <fix scale>

Arguments:
<groundtruth>  Path to sequence folder, e.g. ~/VISLAM-Dataset/A0.
<input>        SLAM camera trajectory file in TUM format.
<fix scale>    Set to 1 for VISLAM, set to 0 for VSLAM.

Initialization Scale Error and Time

Usage:
./initialization <groundtruth> <input> <has inertial>

Arguments:
<groundtruth>  Path to sequence folder, e.g. ~/VISLAM-Dataset/A0.
<input>        SLAM camera trajectory file in TUM format.
<has inertial> Set to 1 for VISLAM, set to 0 for VSLAM.

Robustness

Usage:
./robustness <groundtruth> <input> <fix scale>

Arguments:
<groundtruth>  Path to sequence folder, e.g. ~/VISLAM-Dataset/A0.
<input>        SLAM camera trajectory file in TUM format.
<fix scale>    Set to 1 for VISLAM, set to 0 for VSLAM.

Relocalization Time

Usage:
./relocalization <groundtruth> <input> <has inertial>

Arguments:
<groundtruth>  Path to sequence folder, e.g. ~/VISLAM-Dataset/A0.
<input>        SLAM camera trajectory file in TUM format.
<has inertial> Set to 1 for VISLAM, set to 0 for VSLAM.

Representative Monocular SLAM Results


Due to the randomness in these SLAM systems, they may not always produce the same results. So, we run each benchmark 10 times to take the average of the results. Some algorithms, when running against some sequences, may produce inconsistent results. Therefore, we remove these failure cases (i.e., APEs were unusually large) by inspection, and then compute the average of the remaining ones.

The number of failures for each sequence

Seq PTAM ORB-SLAM2 LSD-SLAM DSO MSCKF OKVIS VINS-Mono VINS-Mono-NoLoop SenseSLAM v1.0
A0 1 0 0 0 0 4 0 0 0
A1 3 5 1 6 3 0 0 0 0
A2 1 0 0 0 0 3 0 0 0
A3 0 0 0 0 0 2 0 0 0
A4 0 1 1 0 0 2 0 0 0
A5 1 0 0 2 0 6 0 0 0
A6 3 0 0 0 0 2 0 0 0
A7 0 0 0 0 0 0 0 0 0

VSLAM Tracking accuracy

(a) APE/RPE (mm)
Sequence PTAM ORB-SLAM2 LSD-SLAM DSO
A0 75.442 6.696 96.777 5.965 105.963 11.761 231.860 10.456
A1 113.406 16.344 95.379 10.285 221.643 23.833 431.929 12.555
A2 67.099 6.833 69.486 5.706 310.963 8.156 216.893 5.337
A3 10.913 4.627 15.310 7.386 199.445 10.872 188.989 4.294
A4 21.007 4.773 10.061 2.995 155.692 10.756 115.477 4.595
A5 40.403 8.926 29.653 11.717 249.644 12.302 323.482 7.978
A6 19.483 3.051 12.145 6.741 49.805 3.018 14.864 2.561
A7 13.503 2.462 5.832 1.557 38.673 2.662 27.142 2.213
(b) ARE/RRE (deg)
Sequence PTAM ORB-SLAM2 LSD-SLAM DSO
A0 12.051 0.257 5.119 0.342 20.589 0.371 9.983 0.401
A1 53.954 0.291 8.534 0.242 51.122 0.288 39.007 0.524
A2 8.789 0.301 5.550 0.255 30.282 0.296 10.584 0.253
A3 6.225 0.293 1.431 0.264 31.370 0.475 20.580 0.241
A4 6.295 0.255 1.015 0.157 9.592 0.498 5.217 0.180
A5 14.030 0.452 1.963 0.546 36.789 0.810 40.939 0.324
A6 2.348 0.217 0.892 0.169 5.012 0.207 1.435 0.189
A7 1.218 0.153 0.569 0.115 3.052 0.147 2.239 0.135
Completeness (%)
Sequence PTAM ORB-SLAM2 LSD-SLAM DSO
A0 79.386 65.175 49.513 14.476
A1 60.893 68.303 11.511 0.869
A2 85.348 79.263 21.804 22.878
A3 71.635 98.497 27.112 43.493
A4 95.418 100.000 64.283 80.371
A5 87.399 97.785 25.033 2.059
A6 97.399 99.786 94.883 100.000
A7 100.000 100.000 98.663 100.000

VISLAM Tracking accuracy

(a) APE/RPE (mm)
Sequence MSCKF OKVIS VINS-Mono VINS-Mono-NoLoop SenseSLAM v1.0
A0 156.018 7.436 71.677 7.064 63.395 3.510 75.388 2.497 58.995 2.525
A1 294.091 14.580 87.73 4.283 80.687 3.472 161.444 1.676 55.097 2.876
A2 102.657 10.151 68.381 5.412 74.842 8.605 56.562 1.334 36.370 1.560
A3 44.493 3.780 22.949 8.739 19.964 1.234 23.643 0.837 17.792 0.779
A4 114.845 8.338 146.89 12.46 18.691 1.091 21.532 0.953 15.558 0.930
A5 82.885 8.388 77.924 7.588 42.451 2.964 49.790 1.473 34.810 1.954
A6 66.001 6.761 63.895 6.86 26.240 1.167 27.088 0.683 20.467 0.569
A7 105.492 4.576 47.465 6.352 18.226 1.465 19.973 0.746 10.777 0.831
(b) ARE/RRE (deg)
Sequence MSCKF OKVIS VINS-Mono VINS-Mono-NoLoop SenseSLAM v1.0
A0 6.584 0.203 3.637 0.741 3.441 0.205 3.378 0.206 3.660 0.197
A1 8.703 0.135 5.14 1.098 1.518 0.088 1.470 0.088 2.676 0.092
A2 3.324 0.195 2.493 0.869 1.775 0.201 1.766 0.201 1.674 0.181
A3 6.952 0.186 2.459 0.825 2.121 0.176 2.443 0.176 1.642 0.182
A4 4.031 0.104 3.765 0.603 1.185 0.063 1.419 0.063 1.129 0.071
A5 4.928 0.167 8.843 0.360 3.000 0.040 4.521 0.040 2.041 0.089
A6 2.625 0.170 2.275 0.629 1.478 0.131 1.511 0.131 1.656 0.134
A7 6.810 0.120 3.536 0.602 1.248 0.073 0.842 0.073 0.502 0.082
Completeness (%)
Sequence MSCKF OKVIS VINS-Mono VINS-Mono-NoLoop SenseSLAM v1.0
A0 40.186 94.255 92.546 82.945 97.317
A1 1.646 98.235 86.508 19.674 95.072
A2 61.423 94.959 88.301 92.389 99.707
A3 97.814 95.972 100.000 100.000 100.000
A4 76.629 97.429 100.000 100.000 100.000
A5 76.738 98.162 98.795 98.733 99.143
A6 94.128 97.805 100.000 100.000 100.000
A7 68.341 96.690 100.000 100.000 100.000

Initialization quality

Sequence PTAM ORB-SLAM2 LSD-SLAM DSO MSCKF OKVIS VINS-Mono VINS-Mono-NoLoop SenseSLAM v1.0
A0 13.914 2.040 1.615 3.783 1.154 1.067 0.895 0.900 1.840
A1 18.334 6.930 25.578 15.598 5.182 2.892 3.220 3.135 3.674
A2 3.087 1.945 4.980 1.321 3.820 1.155 0.584 0.641 2.154
A3 1.667 0.974 0.810 0.683 3.730 0.690 1.254 1.273 0.764
A4 12.059 2.777 6.404 1.793 2.872 10.997 1.751 1.783 2.967
A5 18.743 4.062 12.934 17.815 4.366 2.119 1.866 1.895 1.183
A6 2.415 2.794 5.655 2.699 6.712 6.696 2.246 2.132 1.484
A7 1.037 0.772 1.624 0.671 9.532 1.413 1.164 1.126 0.835
Average 8.907 2.787 7.450 5.545 4.671 3.379 1.622 1.611 1.863
Max 18.743 6.930 25.578 17.815 9.532 10.997 3.220 3.130 3.674

Tracking robustness

Sequence PTAM ORB-SLAM2 LSD-SLAM DSO MSCKF OKVIS VINS-Mono VINS-Mono-NoLoop SenseSLAM v1.0
B0 (Rapid Rotation) 4.73 0.844 1.911 6.991 --- 1.071 2.789 2.835 0.306
B1 (Rapid Translation) 4.971 0.231 1.09 2.636 --- 0.597 1.211 1.415 0.199
B2 (Rapid Shaking) 5.475 0.294 1.387 --- --- 3.917 13.403 21.08 2.013
B3 (Moving People) 7.455 0.6 0.897 6.399 --- 0.673 0.785 0.531 0.465
B4 (Covering Camera) 16.033 2.702 0.727 --- --- 1.976 0.714 0.826 0.326

Relocalization time

seconds
Sequence PTAM ORB-SLAM2 LSD-SLAM VINS-Mono SenseSLAM v1.0
B5 (1s black-out) 1.032 0.077 1.082 5.274 0.592
B6 (2s black-out) 0.366 0.465 5.413 3.755 1.567
B7 (3s black-out) 0.651 0.118 1.834 1.282 0.332
Average 0.683 0.220 2.776 3.437 0.830

System Reference


  1. [1] Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. Nara, Japan, 2007: 225–234 DOI:10.1109 / ISMAR.2007.4538852

  2. [2] Mur-Artal R, Tardos J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 2017, 33(5): 1255–1262 DOI:10.1109 / tro.2017.2705103.

  3. [3] Engel J, Schöps T, Cremers D. LSD-SLAM: Large-Scale Direct Monocular SLAM. Computer Vision–ECCV 2014. Cham: Springer International Publishing, 2014: 834−849 DOI:10.1007 / 978-3-319-10605-2_54.

  4. [4] Engel J, Koltun V, Cremers D. Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3): 611–625 DOI:10.1109 / tpami.2017.2658577.

  5. [5] Mourikis A I, Roumeliotis S I. A multi-state constraint kalman filter for vision-aided inertial navigation. In: IEEE International Conference on Robotics and Automation. Roma, Italy, 2007: 3565–3572 DOI:10.1109 / ROBOT.2007.364024.

  6. [6] Leutenegger S, Lynen S, Bosse M, Siegwart R, Furgale P. Keyframe-based visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 2015, 34(3): 314–334 DOI:10.1177 / 0278364914554813.

  7. [7] Qin T, Li P L, Shen S J. VINS-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 2018, 34(4): 1004–1020 DOI:10.1109 / tro.2018.2853729.