CVG (Computer Vision Group) is a group of State Key Lab of CAD&CG, Zhejiang University. The main research interests of CVG focus on Structure-from-Motion, SLAM, 3D Reconstruction, Augmented Reality, Video Segmentation/Matting and Editing.

  • SfM & SLAM
    • Offline Structure-from-Motion : we have addressed a few key issues in structure-from-motion. First, we propose a robust SFM method for efficiently and reliably handle long sequences with varying focal length. Second, we propose an efficient non-consecutive feature tracking method to match the common features for loop-back sequences and multiple sequences. Third, we propose an efficient segment-based bundle adjustment which can perform global optimization for large datasets with limited memory space. Based on these works, we have developed two robust and efficient SfM systems ACTS and LS-ACTS.
    • Realtime Camera Tracking: a robust markerless real-time camera tracking system based on a novel keyframe selection and recognition method.
    • RDSLAM: a real-time monocular simultaneous localization and mapping system which can robustly work in dynamic environments (ISMAR 2013 paper).
  • 3D Reconstruction
    • Depth Video Recovery: a novel method for recovering consistent depth maps from a video sequence. A set of applications are developed based on this work, such as refilming, moving object extraction, and spatio-temporal segmentation.
    • 3D Reconstruction of Dynamic Scenes: focus on recovering high-quality depth maps for dynamic scenes from very few synchronized cameras (i.e. 2~3 cameras). In CVPR 2012, we propose a novel dense depth recovery method for a trinocular video sequence. In ECCV 2012, we propose a more general method which only uses 2~3 handheld freely moving cameras. Compared to traditional methods, this data capturing setup is much more flexible and easier to use.
  • Video Segmentation/Matting & Editing
    • Moving Object Extraction: a new method for high-quality extracting the moving object from a video sequence taken by a handheld camera.
    • Fast Bilayer Segmentation: a novel fast bilayer segmentation method which can effectively extract the dynamic foreground under rotational camera configuration.
    • Spatio-Temporal Segmentation: a novel spatio-temporal segmentation method for depth-inferred videos.
    • Refiliming: a new content-based video editing system for creating various kinds of visual effects, which includes but not limited to video composition, ``predator'' effect, bullet-time, depth-of-field, and fog synthesis.

Developed Softwares

RKSLAM is a real-time monocular simultaneous localization and mapping system which can robustly work in challenging cases, such as fast motion and strong rotation. It can run real-time on a mobile device and outperform state-of-the-art systems (e.g. ORB-SLAM, PTAM, LSD-SLAM) in challenging cases of fast motion and strong rotation. [software]


LS-ACTS is a robust and efficient structure-from-motion system which can recover camera motion and 3D scene structure from large videos/sequences datasets. Compared to our previous SfM system ACTS, it is much faster (near real-time in a normal desktop PC) and can handle multiple extremely long sequences (over 100K frames). [software]

ACTS is an automatic camera tracking system which can recover camera motion and 3D scene structure from videos and film sequences, providing the ease of automatic tracking. It can track all kinds of the camera motion efficiently and stably, which can be rotational or free-moving. It is a cornerstone for many other computer vision tasks. [software]

RDSLAM is a real-time simultaneous localization and mapping system which allows parts of the scene to be dynamic or the whole scene to gradually change. Compared to PTAM, RDSLAM not only can robustly work in dynamic environments, but also can handle a larger scale scene (the number of the reconstructed 3D points can be tens of thousands). [software]