Processamento de Imagem e Visão

The project component of PIV accounts for 50% of the grade and will include two parts:

  1. 3D reconstruction - this part is mandatory and all groups must submit code and a full report.
  2. Challenge  - You have to solve one challenge (optionally you can develop an idea of your own)
                    The challenge will be about recognition of books in a shelf (to be cleared later)
    Submission : From 16 Dec to 23 Dec http://printart.isr.ist.utl.pt/piv/submit

Part I  - 3D scene "scanning"  with the Kinect images

Self-Localization and Mapping (SLAM)

Consider a robot moving in an indoor scene, e.g. a factory or a museum, or a person moving with a hand-held device. We wish to build a 3D model of the scene and to track the robot trajectory in the scene using its sensors. This problem is known as the Self-localization and mapping (SLAM) problem.



Figure 1 – 3D reconstruction of a scene


Sensors

The SLAM problem can be solved in many ways, depending on the type of sensors available in the robot. In this project, we assume that the robot is equipped with a Kinect sensor which includes a depth camera and a color camera. These cameras provide two images (depth and a color images) that have the same resolution and are aligned. This means that the Kinect provides the depth and color associated to each pixel (a cloud of 3D points and their colors).

Pose estimation and spatial integration

The Kinect sensor provides a local representation of the environment (cloud of 3D points) but this is not enough to reconstruct the scene from a single view since only local information is available. Therefore, we need to gather information from multiple views that has to be aligned and fused into a single 3D model of the scene. This can be done assuming that the robot (and the Kinect) moves along a trajectory and acquires data at different positions and orientations.

In each position, we need to compute the Kinect pose (position and orientation) with respect to the scene coordinates. After computing the Kinect pose, we need to fuse the 3D cloud of points obtained at the current position with the information obtained at previous acquisitions.

So, given a sequence of depth+rgb images the main tasks to accomplish are:

  1.   Find the correspondence between  point clouds. This can be done either by registering 3D point clouds (using only depth images) or by using the rgb images to find point correspondences (feature matching).
  2.  Given pairs of (3D) corresponding points in two images,  compute the  tranformation between them (Rotation+Translation)
  3. Given transformations between pairs of  images, propagate  these transformations until  you  have all  point clouds in one single reference frame
  4. Filter the final point cloud (to look good)

CODE DEVELOPMENT AND DELIVERY

Matlab Main Function
The main file must be named  reconstruction.m

You may have as many files as you want and include any  package or code not included in the MATLAB distribution.

The first line of your reconstruction.m file must look like this:

function [pcloud, transforms]=reconstruction( image_names, K_depth, K_rgb, Rdtrgb,Tdtrgb)

INPUT ARGUMENTS:

image_names - an array of structures with the names of the images. Each element of the array is a structure

image_name(k).depth - a string with the path of the .mat file with depth data of image k.
image_name(k).rgb - a string with the path of a jpeg/png file with rgb image k

K_depth - a 3x3 matrix with the intrinsic parameters

K_rgb - a 3x3 matrix with the intrinsic parameters

Rdtrgb and Tdrgb allow  transforming 3D coordinates represented in the depth camera  reference frame to the RGB camera frame

Rdtrgb - is a 3x3 rotation matrix
Tdrgb - is a 3x1 vector
RETURN VALUES:

pcloud -  [Xi Yi Zi Ri Gi Bi] - One  N x 6 matrix with the 3D points and RGB data of each point,  represented in the world reference frame (you choose the reference frame! For example, could be  the depth camera coordinate system of the first image).

transforms - an array of structures with the same size of image_name where each element contains the transformation between the depth camera reference frame and the world reference frame (as selected above) for image k.

the fields are the following:
transforms(k).R - The rotation matrix between depth camera and world for image k
transforms(k).T - The translation vector between depth camera and world for image k


DATASETS  http://printart.isr.ist.utl.pt/piv/project/datasets/reconstruction/

Details will be discussed in  lab sessions.

OTHER DATASETS: http://vision.in.tum.de/data/datasets/rgbd-dataset/download#     Instrinsic parameters for this dataset:
http://vision.in.tum.de/data/datasets/rgbd-dataset/file_formats#intrinsic_camera_calibration_of_the_kinect

OBJECT SCANS (lots of them ) http://redwood-data.org/3dscan/


  Part II (Challenge)- Book inventory

Consider the set of images below. From this sequence you should be able to detect the books in the shelf (in every image of the sequence) and identify which book is taken and shown.





To identify the books you have a file with the depth image and the rgb image of each book




Details will be discussed in the  lab classes.
  CODE DEVELOPMENT AND DELIVERY

Matlab Main Function
The main file must be named  books.m

You may have as many files as you want and include any  package or code not included in the MATLAB distribution.

The first line of your reconstruction.m file must look like this:

function bookindex=books( test_image_names, training_images_names)

INPUT ARGUMENTS:

test_image_names - an array of structures with the names of the TEST images. Each element of the array is a structure
image_name(k).depth - a string with the path of the .mat file with depth data of image k.
image_name(k).rgb - a string with the path of a jpeg/png file with rgb image k

training_image_names - an array of structures with the names of the training images. Each element of the array is a structure
training_image_name(k).depth - a string with the path of the .mat file with depth data of training image k.
itraining_image_name(k).rgb - a string with the path of a jpeg/png file with rgb training image k
RETURN VALUES:


bookindex - a one dimensinal array of integers with length equal to the length of test images. Element bookindex(i)  is 0 if test_image_names(i) does not contain any book from the training set  or an integer that represents the index of the training set more similar to image i.