PIV PROJECT


Below we describe the problem that must be solved in the PIV project. All projects must be implemented in Matlab according to a set of rules that will be defined in due time.  

Each group will submit a ZIP file with all the code and one report in PDF not exceeding 10 pages (extensions are allowed only for good reasons such as experimental data and analysis).


THE PROJECT MUST BE SUBMITTED BY FRIDAY DECEMBER 21 17:00H

 

To give an idea of what a report should be like look at these examples with comments http://printart.isr.ist.utl.pt/piv/projectcomments_examples.pdf  . I hope you improve these examples. Some suggestions:


 1) Start by describing the problem you are solving

2) Describe how you approached it. If you devided the whole problem in subproblems,  first describe the whole solution in general terms and then each subproblem in detail. Don't mix "high level" descriptions with details in the same sentence ("to remove outliers we implemented RANSAC with 10 iterations and a threshold of 0.34 for the error"). No code details! I don't care how many functions you wrote
3) if you are Portuguese native speaker, write in Portuguese. Sometimes it is hard to detect if the error is due to ignorance or to bad English writing
4) Show experiments and  a critical analysis as detailed as possible. If things don't run well, there is no problem, just show that you are aware and know why things went wrong. Show it with DATA not words !
5) Write decent, fluent and rigorous Portuguese/English. This is not a post for Instagram or Facebook!


Project submission will be done through an online form where you will upload the code and report. Details will be posted later




General Description

Object detection and tracking using multiple depth cameras

The goal of the project is the detection, localization and tracking of moving objects using information from depth cameras.

The cameras are in a fixed position and the scene is composed of a static set of objects (background) and a variable number of moving objects.

The project is devided in two parts and all groups must submit code for each part.

Part 1 - Single camera

Consider the sequence of images that illustrate one possible single camera scenario:


As you can see, the (noisy) background is still (the camera does not move) and there is one "object" moving. You must detect this object (and all  other moving objects) and follow them along the sequence.
You can use any technique of your choosing to detect and follow the objects. In other words you can use the rgb images alone (2D), the depth images alone(2D), the 3D point clouds,  or any "mix" that you find best to do the job.

The final output will consist of  the 3D coordinates of  a 3D box that encloses the 3D points of the objects.

For each image and for each object you must return the 8  xyz-coordinates of  the box that encolse the pointcloud.


In the figure above we illustrate the task you must accomplish. The output will be the sequence of 8 points P1-8

The definition of variables and formats for the inputs and outputs will be explained later.

Part 2

In part 2 all groups must do the exact same job as before but with 2 cameras. Both cameras are static they do not move relative to each other and the scene is similar (a fixed background with moving objects). The figure below explains the whole scenario:


A few notes:

1 - There are 2 cameras so there are two sequences of rgb and depth images

2 - Camera 1 defines the world coordinate systems. The coordinates of the "boxes" representing the detected objects must be expressed in the camera 1 reference frame (worlds).

3- This means that you must compute the rigid transformation from camera 2 to camera 1(R2-1 T2-1) and return always coordinates in the camera 1 reference frame.

4 - Depending on the technique you use, you can process separetly each camera and fuse the information, or you can fuse the information and detect/track in the "fused" information.


CODE TO DEVELOP:

You must have one matlab function for each part with the specifications below. You can write as many functions as you want but the main function must be as defined:

PART 1 :  The function receives the images as inputs and  returns the 8 points describing the time trajectories of the enclosing box of the objects in world (camera) coordinates


objects = track3D_part1( imgseq1,   cam_params )

INPUT VARIABLES
imgseq1
 Array of structures with the name of the files for the RGB and DEPTH images.
Each element has the following fields

imgseq1(i).rgb - the file name of rgb image i
imgseq1(i).depth - the file name of depth image i

RGB images are jpg or png (opened with imread() and depth files are matlab files that must be loaded. The depth image is in the depth_array variable (like you did in the lab)
cam_params
A structure with the instrinsic and extrinsic camera parameters.
cam_params.Kdepth  - the 3x3 matrix for the intrinsic parameters for depth
cam_params.Krgb - the 3x3 matrix for the intrinsic parameters for rgb
cam_params.R - the Rotation matrix from depth to RGB (extrinsic params)
cam_params.T - The translation from depth to RGB
This is also like you did in the lab, except that everything is in one structure. For example,
cam_params.R should be Rdtorgb that you use in the lab
A FILE WITH THE CAMERA PARAMETERS IS HERE (http://printart.isr.ist.utl.pt/project/cameraparametersAsus.mat)

OUTPUT VARIABLE
objects
An array of structures with the objects detected and the coordinates of the "box" in all frames where each object was detected.
Each element of the  array (say objects(i) ) has the following fields

objects(i). X  A matrix with 8 columns and variable number of rows (for each objects(i) ). Each row of this matrix represents the 8 X coordinates of the "box" in one frame where the object was detected. So if you track the same object for 6 images this matrix should be 6x8
objects(i).Y
The same as objects(i).X  but with the Y coordinate of the box
objects(i).Z
The same as objects(i).X but for the Z coordinate of the box
objects(i).frames_tracked
 An array with the index of all images where the object was detected. The length of this 1D array is the same as the number of rows of objects(i).X . For example,  objects(i).X(3,:) are the 8 X coordinates of the box of object i,  detectedin  image  number objects(i).frames_tracked(3)
As an example, if I  want to plot the trajectory of object 23 (the corners of the box) in one figure  and see the images where it was detected in another figure,  I would do:

for k=1:length(objects(23).frames_tracked),
     figure(1);
     plot3(objects(23).X(k,:) , objects(23).Y(k,:),objects(23).Z(k,:),'*')
     figure(2).
    im1 =imread( imgseq1(objects(23).frames_tracked(k)).rgb)
    imshow(im1)
end



 
PART 2  :The function just receives the images as input and must compute both the transformations(R2-2,T2-1)  and the object trajectories. Cam1 is assumed the world.

[objects, cam2toW] = track3D_part2( imgseq1, imgseq2,   cam_params)

INPUT VARIABLES
imgseq1
 Array of structures with the name of the files for the RGB and DEPTH images.
Each element has the following fields

imgseq1(i).rgb - the file name of rgb image i
imgseq1(i).depth - the file name of depth image i

RGB images are jpg or png (opened with imread() and depth files are matlab files that must be loaded. The depth image is in the depth_array variable (like you did in the lab)
imgseq2
 Like imgseq1 but for the second camera
cam_params
The same as in PART1

OUTPUT VARIABLES
objects

The same as part I
An array of structures with the objects detected and the coordinates of the "box" in all frames where each object was detected.
Each element of the  array (say objects(i) ) has the following fields

objects(i). X  A matrix with 8 columns and variable number of rows (for each objects(i) ). Each row of this matrix represents the 8 X coordinates of the "box" in one frame where the object was detected. So if you track the same object for 6 images this matrix should be 6x8
objects(i).Y
The same as objects(i).X  but with the Y coordinate of the box
objects(i).Z
The same as objects(i).X but for the Z coordinate of the box
objects(i).frames_tracked
 An array with the index of all images where the object was detected. The length of this 1D array is the same as the number of rows of objects(i).X . For example,  objects(i).X(3,:) are the 8 X coordinates of the box of object i,  detectedin  image  number objects(i).frames_tracked(3)

cam2toW

A structure with the coordinate transformation between camera 2 and the world coordinate frame.

cam2toW.R - the 3x3  roation matrix between camera 2 and world (camera 1)
cam2toW. T - the 3x1 translation vector 
So   a point in camera 2 has world coordinates  Pw = cam2toW.R * P2 + cam2toW.T
 


TESTING YOUR SOFTWARE


DATASETS  http://printart.isr.ist.utl.pt/piv/project/datasets/