Algorithm for objects segmentation and object’s boundaries detection
Alexander Lipanov, PhD in Computer science, Softarex Technologies, Inc.
The aim of motion tracking is to detect changes in the positions of moving objects over time based on a sequence
of images. Motion tracking is widely used for monitoring activity in public places as well as for detailed
analysis of video sequences recorded by various video surveillance systems. This technology is also very
important for scientific, technical, and medical applications, such as industrial robots, road traffic
control systems, livestock RFID tags, and blood analysis.
In recent years, extensive research has been conducted in order to develop more advanced, robust, and multipurpose
object tracking algorithms. Presently, there are three large groups of such methods:

Correspondencebased object tracking – these algorithms perform object detection by representing discrete
objects as centroids or silhouettes and then establishing correspondence between them frame by frame. 
Transformationbased object tracking – these algorithms perform object detection by transforming discrete
objects into planar surfaces, such as rectangles or ellipses, and estimating the changes in their relative
positions frame by frame. 
Contourbased object tracking – these algorithms perform object detection by representing discrete objects
as contours (boundaries) and matching similar contours with each other or with a predefined template contour
frame by frame.
Based on our own experience and knowledge, we suggest our own method for boundary detection – the
IntegroDifferential Algorithm.
The main idea behind this approach is to combine the benefits of the three groups of algorithms mentioned above.
First of all, we need to define the average brightness in the neighborhood of the reference object’s center
C_{0} :
where B (i, j) is the brightness function of the image being analyzed.
The second step is to build a brightness distribution histogram based on the calculated value of C_{0}:
In the third step, we can define the optimal brightness threshold T:
where T_{0} = 0 is the initial value of the threshold,
N_{k} is the number of points with brightness k,
N_{cp} is the number of points with brightness belonging to the neighborhood of the average
brightness C_{0},
k is the current value of brightness,
P _{1} > 0 and P _{2} > 0 are the predefined constants for boundaries of the object’s brightness range.
Our experiments have shown that most computer vision systems deliver the best results with P1 = 64 and
P2 = 50. However, these two thresholds are always empirical and strongly depend on the type and quality
of the images being analyzed.
In this step, our image comprises three subsets, which are the subset of the object’s points, the subset of
object’s boundaries, and the subset of background’s points. Therefore, we need to remove (subtract) all the
points belonging to the background and inner areas of the object to obtain the subset of the object’s
boundaries.
Let’s define the supplementary d _{1} set in the following way:
where i and j are the width and height of the image, respectively (in points),
m and n run through
from the beginning to the end.
In the d^{1} set, each element d^{1} (i, j) = 1 corresponds to a point of the object’s contour,
while each element d^{1} (i, j) = 0 stands for a point of the background, object’s inner area, or
their intersection, as shown in the picture below.
Using the d^{1} set, we can solve two important tasks at once: find the points of similar
brightness on the image and at the same time filter out various noises, distortions, and glitches, since their
brightness significantly differs from the brightness of the points belonging to
.
Now, let’s define yet another set and name it d^{2}:
where i and j are the width and height of the image, respectively (in points).
The d^{2} set contains the points with brightness close to the average brightness C_{0}
limited by the T threshold defined above. The main difference between the d1 and d^{2} sets is
that the former comprises points with brightness belonging to ,
while d^{2} consists of points with brightness between C_{0} – T and
C _{0} + T plus all the points of the d^{1} set. The d^{2}
set makes the entire algorithm more robust, accurate, and less dependent on the object type and image quality.
And the last step is to define the resulting set L containing all the points of the object’s contour:
where .
This expression removes all the points outside the object’s contour by assigning them 255 (i.e. white color code),
while leaving the points of the object’s contour unchanged. As a result, the L set will contain only the points
of the object’s contour and thereby the task will be solved.
In general, this algorithm is relatively simple, delivers reliable results in real time, and saves computing
resources, since it uses only basic arithmetical and logical operators. The latter is especially important for
the entrylevel workstations that often cannot provide enough processing power.