Processamento de Imagem e Visão (Epoca especial 2016/17)

 Object Tracking with a static (fixed) kinect

The goal is to detect and track objects (mostly people) in a sequence of depth+rgb images of a fixed kinect. So, imagine you have a kinect in the elevator lobby of Torre Norte and we want to know how many people passed and their trajectory along time. Just to give you a glimpse check this video.

The main tasks are:
1 - detect the ground plane to locate the kinect relative to a "gravity" minded reference frame.
2 - detect moving objects in each image
3 - determine the correspondence between objects in the current image and the objects detected in the previous image
4 - compute the location of the object (you must define one particular point or shape that represents the object - top point, a box ...anything!).

CODING PROTOCOL (Interfacing) for PROJECT EpEspecial

your code must have one main file named peopletracking.m  as follows (you may have more files with other functions):
function tracked_objs=peopletracking( file_names, depth_cam, rgb_cam, Rdtrgb,Tdtrgb)


file_names - an array of structures of size F (number of images) with the names of the files. Each element, say k, of the array is a structure
  file_name(k).depth - a string with the path of the .mat file with depth data of instant k. Depth image is obtained by loading this file that contains a variable depth_array with the depth image
  file_name(k).rgb - a string with the path of a jpeg/png file with rgb instant  k

depth_cam - A structure with the intrinsic parameters of the depth camera
depth_cam.K - a 3x3 matrix with the intrinsic parameters
depth_cam.DistCoef - a 1x5 array with the lens distortion coeficients (you do not need to use these)

rgb_cam - A structure with the intrinsic parameters of the rgb camera
rgb_cam.K - a 3x3 matrix with the intrinsic parameters
rgb_cam.DistCoef - a 1x5 array with the lens distortion coeficients

Rdtrgb - a 3x3 rotation matrix
Tdrgb - a 3x1 vector
Rdtrgb and Tdrgb allow  transforming 3D coordinates represented in the depth camera  reference frame to the RGB camera frame


tracked_objs -  a cell array of  size N (number of objects detected) where each element i has the following form:
tracked_objs{i}= array of size L_i x 4 . L_i is the number of images where object  i is visible. Each row of the array  represents the position of object i at each time instant. Suppose object 4 appeared in frame 25 and you tracked it until frame 45. Then
                              X     Y   Z   t
tracked_objs{4}=[X1, Y1, Z1 25;
                                 ...... .........
                              X20,Y20,Z20,45];   in total 21 rows

XYZ is the position of the object (person!) relative to the ground. In other words assuming there is a ground plane, XY will represent the position of the person in the ground. Variable Z could be 0 (just represent the postion in the plane) or any value you choose to represent the object (e.g. the top coordinate of the object or the coordinte of its centroid...).  The last column represents the  image  number  where the object  was  located.