<<

ECONTECHMOD. AN INTERNATIONAL QUARTERLY JOURNAL – 2017. Vol. 6. No. 2. 79–84

Analysis of vision and analysis technics Z. Rybchak 1, O. Basystiuk 2

1Lviv Polytechnic National University, e-mail: [email protected] 2Lviv Polytechnic National University, e-mail: [email protected]

Received February 15.2017: accepted May 28.2017

Abstract. and image recognition are one algorithms for your applications, you may be missing out of the most popular theme nowadays. Moreover, this technology on a huge opportunity to get better results. developing really fast, so filed of usage increased. The main Finally, we are ready to return to the main goal – aims of this article are explain basic principles of this field and understand main principles of image recognition methods overview some interesting technologies that nowadays are using traditional computer vision techniques. widely used in computer vision and image recognition. Key words: computer vision, image recognition, object COMPUTER VISION MAIN TASKS recognition, machine learning, computer with high-level understanding, digital processing, scene reconstruction. Computer vision is a part of field INTRODUCTION that deals with how organise computer work for gaining high-level understanding information from digital images Seems that our brains make visual recognise pretty or videos. Moreover, computer vision unites the following easy. For humans it does not take any effort, to see the areas image recognition, object recognition, object pose difference between a dog and a cat, or a car and a plane, estimation, motion estimation, and scene reconstruction. read a sign, or recognize a human's face. However, what There are huge volume of tasks, which computer vision about computer vision and image recognition, is image can handle using methods for producing numerical or recognition problem same easily solved for computer? symbolic representation of information which was Definitely not, there are actually hard problems, that need acquiring from real world, next steps are processing, to be solve, to teach computer recognise images: they analyzing and understanding digital images, and result of only for first view looks like easy, I think it happens all this steps is high-dimensional data. Understanding in because our brains are incredibly good at understanding this context means the transformation of images into images. Nevertheless, just imagine how many field of information which computer can understand. This image human life can be improved by using computer vision. understanding can be seen as the disentangling of Most common field of using is manufacturing, for symbolic information from image data using models example quality control, when you start manufacturing constructed with the aid of geometry, physics, statistics, business you need quality control department, but what if and machine learning. replace this department using with computer Some examples of computer vision applications and vision, involve more people for creating something new, I goals: think, this business will be more profitable . That`s why, • automatic face recognition, and interpretation of last few years the field of machine learning has make expression; huge progress in field of computer vision. Main point in • visual guidance of autonomous vehicles • this progress is creation of mathematical method for automated medical , interpretation, and image recognition that will give us high accuracy result. diagnosis; Nowadays most popular are deep learning techniques for • robotic manufacturing: manipulation, grading, and IR, especial convolutional neural nets, this method is assembly of parts; much more advanced than earlier approaches like Fourier • OCR: recognition of printed or handwritten transforms. By involving deep learning method for this characters and words; field was achieved significant increasing degrees of • agricultural robots: visual grading and harvesting of accuracy, with these techniques. Accuracy rate produce; approaching to 95 percent. (Accuracy is usually measured • smart offices: tracking of persons and objects; against a how humans classified the data set.) understanding gestures; So, keep in mind that if you have not looked at Deep • biometric-based visual identification of persons • Learning based image recognition and object detection visually endowed robotic helpers; 80 Z. RYBCHAK, O. BASYSTIUK

• security monitoring and alerting; detection of However, keep in mind that nobody knows in anomaly; advance which combination of preprocessing methods • intelligent interpretive prostheses for the blind; will produce the best results. Comparing the methods of • tracking of moving objects; collision avoidance; image recognition and classification leads to the stereoscopic depth; conclusion that: • object-based (model-based) compression of video · for structural method of preprocessing is streams; important the accuracy of the allocation boundaries and • general scene understanding. information within these boundaries; In many respects, computer vision is an “AI-comp- · for statistical methods is important to know the lete” problem: building general-purpose vision machines internal contours and mutual distribution of brightness would entail, or require, solutions to most of the general values; goals of . It would require finding · for feature extraction methods (filters) should allo- ways of building flexible and robust visual represen- cate contour components, which is typical for some existing tations of the world, maintaining and updating them, and methods, and obtain information about the distribution of interfacing them with attention, goals and plans. structural elements outside the boundaries of objects. Like other problems in AI, the challenge of vision can be described in terms of building a signal-to-symbol FEATURE EXTRACTION converter. The external world presents itself only as physical signals on sensory surfaces (such as Why is recognition so hard? The real world is made videocamera, retina, microphone, etc), which explicitly of a jumble of objects, which all occlude one another and express very little of the information required for appear in different poses. Furthermore, the variability intelligent understanding of the environment. These intrinsic within a class (e.g., dogs), due to complex non- signals must be converted ultimately into symbolic rigid articulation and extreme variations in shape and representations whose manipulation allows the machine appearance (e.g., between different breeds), makes it or organism to interact intelligently with the world. unlikely that we can simply perform exhaustive matching Researchers that work in this field try to achieve against a database of exemplars. automate tasks that the human visual system can do. So The recognition problem can be broken down along let`s take a look at the most popular method in field of several axes. Computer vision, and will examine how the works. 1. For example, if we know what we are looking for, the problem is one of object detection, which involves TECHNIQUES AND TOOLS quickly scanning an image to determine where a match may occur. The process of image recognition in general looks 2. If we have a specific rigid object we are trying to not really complicated. There are only three steps, that recognize, we can search for characteristic feature points you need to go through, for achieving the result: and verify that they align in a geometrically plausible way. 1. Preprocessing – at this step you take filters and 3. The most challenging version of recognition is try to make your image more adapted for recognition. general category (or class) recognition, which may 2. Feature Extraction – at this step you try to involve recognizing instances of extremely varied classes recognize useful information and extract it and throw such as animals or furniture. away extraneous information. In many instances, recognition depends heavily on the 3. Classification – analyzing and recognition the context of surrounding objects and scene elements. Woven information that you fetched at feature extraction step. into all of these techniques is the topic of learning, since handcrafting specific object recognizers seems like a futile PREPROCESSING METHODS approach given the complexity of the problem. But, let`s investigate how the computer recognizes objects. This step is important because quality of our If you want find numbers located on paper, you will preprocessing affect application outcome. Most common notice a significant variation in RGB pixel values (for preprocessing steps are normalization of contrast, example paper white, numbers black). By running an edge subtraction the mean of image intensities, brightness detector on an image, application will get the shape of the effects correction, and division of image by the standard number, and by comparing the acquire shapes with deviation. Sometimes, gamma correction gives slightly default number shapes, will make conclusion about what better results. In case, when you are dealing with color numbers are at the paper. Edge detector will retains the images, a color space transformation may help get better essential information, and ignore non- essential results. information. The step is called feature extraction. For achieving good results of preprocessing photos, In computer vision design of these features are you need to do the following steps: critical to the performance of the algorithm. Turns out, we · Firstly, need to make reasonable guesses based can use already created solution for . There on type of images you need to recognize. are huge volumes of systems, some well-known features · Secondly, try a few different ones and some of used in computer vision are Haar-like features introduced them might give slightly better results. by Viola and Jones, Histogram of Oriented Gradients · Finally, you need to resize image to a fixed size, (HOG), Scale-Invariant Feature Transform (SIFT), because next step performed on a fixed sized images. Speeded Up Robust Feature (SURF). ANALYSIS OF COMPUTER VISION AND IMAGE ANALYSIS TECHNICS 81

Fig. 1. Absolute value Fig. 2. Absolute value of y-gradient Fig. 3. Magnitude of gradient of x-gradient

Let’s look crosier at algorithm implementation of So, after creating of a cells(matrix in computer Histogram of Oriented Gradients method of feature representation) we need to a histogram of gradients in extraction. HOG is based on the idea that local object these 8×8 cells. The histogram contains 9 bins appearance can be effectively described by the corresponding to angles 0, 20, 40–160 and than create the distribution (histogram) of edge directions (oriented 9-bin histogram. gradients). Algorithm consisted of six main steps: · Block Normalization – Gradients of an image are · Image preprocessing – HOG feature descriptor sensitive to overall lighting. If you make the image darker used for pedestrian detection is calculated on a 64×128 by dividing all pixel values by 2, the gradient magnitude patch of an image. Of course, an image may be of any will change by half, and therefore the histogram values size, but if image is bigger than 64×128 preprocessor will will change by half. Ideally, we want our descriptor to be crop and scale image for this size. independent of lighting variations. In other words, we · Calculate the Gradient Images – firstly, we need to would like to “normalize” the histogram so they are not calculate the horizontal and vertical gradients; This is affected by lighting variations. Let’s say we have an RGB easily achieved by filtering the image with the following color vector [ 128, 64, 32 ]. The length of this vector is: kernels or by using Sobel operator in OpenCV with karnel size 1. Next step is finding of the magnitude and direction of gradient using the following formula: This is also called the L2 norm of the vector. Dividing each element of this vector by 146.64 gives us a normalized vector [0.87, 0.43, 0.22]. Now consider another vector in which the elements Or if you are using OpenCV, the calculation can be are twice the value of the first vector 2 ´ [128, 64, 32] = done using the function cartToPolar. = [256, 128, 6 ]. You can work it out yourself to see that The Figs. 1–3 below shows the gradients. Calculate normalizing [256, 128, 6 ] will result in [0.87, 0.43, 0.22], Histogram of Gradients in 8×8 cells – In this step, the which is the same as the normalized version of the image is divided into 8×8 cells and a histogram of original RGB vector. You can see that normalizing a gradients is calculated for each 8×8 cells. But why 8×8 vector removes the scale. patch? It is a design choice informed by the scale of Now that we know how to normalize a vector, you features we are looking for. HOG was firstly used for may be tempted to think that while calculating HOG you pedestrian detection initially and 8×8 cells in a photo of a can simply normalize the 9×1 histogram the same way we pedestrian scaled to 64×128 are big enough to capture normalized the 3×1 vector above. It is not a bad idea, but interesting features like face. a better idea is to normalize over a bigger sized block of

82 Z. RYBCHAK, O. BASYSTIUK

16×16. A 16×16 block has 4 histograms which can be Deep learning exploits this idea of hierarchical concatenated to form a 36 x 1 element vector and it can be explanatory factors where higher level, more abstract normalized just the way a 3×1 vector is normalized. The concepts are learned from the lower level ones. These window is then moved by 8 pixels and a normalized 36×1 architectures are often constructed with a greedy layer-by- vector is calculated over this window and the process is layer method. Deep learning helps to disentangle these repeated. Calculate the HOG feature vector – final feature abstractions and pick out which features are useful for vector is creating by appending all vectors into one huge learning. vector. For supervised learning tasks, deep learning methods How many memory we need to allocate for this obviate feature engineering, by translating the data into vector? Each block is represented by 36×1 vector and compact intermediate representations akin to principal each vector have 7 horizontal positions and 15 vertical components, and derive layered structures which remove positions. Let`s calculate: redundancy in representation. 7 ´ 15 = 105; Many deep learning algorithms are applied to unsupervised learning tasks. This is an important benefit 36 ´ 105 = 3780; because unlabeled data are usually more abundant than So we need 3780 dimensional vector. labeled data. Examples of deep structures that can be trained in an unsupervised manner are neural history LEARNING ALGIRITHMS compressors and deep belief network. FOR CLASSIFICATION LEARINING AND VISION In the paragraph about Feature Extraction, we learned how to convert an image to a feature vector. In this When we recover the geometry of the world, or paragraph, we will learn how a classification algorithm recognize objects, we are fitting models to the data handle recognition process. provided by our eyes. These models are formed from our Before a classification algorithm can do recognition experience in two ways. In supervised learning, a teacher process, we need to trained him by labeled "training" specifies class labels “this is a tiger” for example, and data. Different learning algorithms learn differently, but images. In unsupervised learning, which is the norm in the general principle is that learning algorithms treat biological systems, the process has to be driven by an feature vectors as points in higher dimensional space, and attempt to fin1d good internal representations given the try to find similarity between all vectors of same class of statistics of natural images. images. Nevertheless, they need massive amounts of data Finding the best model is a compromise between to do it. There are massive and free-to-anyone databases fitting the data and minimizing the complexity of the contain millions of images labeled by keywords about model. In science the preference for simple theories over what’s inside the pictures. In case, if you have enough complex ones is known as Occam’s razor and often is data for training you algorithm, it’s time to build a treated as a matter of aesthetics. However, for a visual machine that can learn from it. There are a lot of organism that is constantly engaged in model framework and open-source libraries, which can help you, construction, constructing better theories, i.e., ones with build your recognition machine: greater predictive power, is a matter of life and death! · Google TensorFlow – one of the better-known A mathematical justification for preferring simple library, for machine learning. models or theories in accordance with available data · UC Berkeley’s Caffe – library created by Berkeley arises from statistical learning theory. Flexible models University, become popular because of its ease of with many degrees of freedom adapt to stochastic customizability and large community of innovators. fluctuations in the data, whereas overly simple models · Torch – is also popular, owing to its use by cannot represent essential aspects of the signal. Vapnik Facebook AI Research (FAIR), which open sourced some and Chervonenkis estimate how much the expected of its modules in early 2015. performance risk of a selected solution, i.e., the best Most common way nowadays to learn algorithm for solution on the available data, deviates from the optimal recognition is deep learning. Deep learning algorithms are solution in the model class. based on distributed representations. The underlying This optimal solution with minimal expected risk assumption behind distributed representations is that most often does not minimize the costs on the available observed data are generated by the interactions of factors data. Uniform convergence of empirical risk to expected organized in layers. Deep learning adds the assumption risk is a necessary and sufficient mathematical condition that these layers of factors correspond to levels of for learning. Image analysis is inherently multiscale; abstraction or composition. Varying numbers of layers segmentation, grouping, or classification have to be and layer sizes can be used to provide different amounts performed at the appropriate scales of resolution. The of abstraction. foliage of trees in a photograph of a forest might appear ANALYSIS OF COMPUTER VISION AND IMAGE ANALYSIS TECHNICS 83

homogeneous at low resolution but a closer look at REFERENCES high resolution reveals differences in leaf shapes that generate tree-specific foliage textures. Statistical learning 1. Richard Szeliski. 2011. Computer Vision: theory relates the complexity of models to the amount of Algorithms and Applications. – United Kingdom: Springer London, 812 p. available data, i.e., the appropriate image scale. In image 2. Richard Szeliski. 2014. Concise Computer Vision: segmentation these scales denote the spatial resolution of An Introduction into Theory and Algorithms. – segments, the fuzziness of segment boundaries, and the United Kingdom: Springer London, 429 p. number of segments, respectively. These scales have to be 3. Brytik V., Grebinnik O., Kobziev V. 2016. coupled by an underlying inference principle. The well- Research the possibilities of different filters and their known stochastic optimization algorithm simulated application to image recognition problems. – Poland: annealing or its deterministic variants provide a ECONTECHMOD. An international quarterly computational temperature as a control parameter to journal, Vol. 5, No. 4, рр. 21–27. couple these scales. These algorithms can be tuned for 4. Ethem Alpaydin. 2010. Introduction to Machine real-time applications and for just-in-time processing with Learning. London: The MIT Press, 584p. limited resources as they occur in , vision-based 5. Satya Mallick. 2016. Image Recognition and Object surveillance, and inspection. Detection. Available online at: http://www. Last but not the least, there are a lot of visual learnopencv.com/image-recognition-and-object- recognition challenges. Best known is ImageNet was detection-part1/ launched by computer scientists at Stanford and Princeton 6. Ken Weiner. 2016. Why image recognition is about in 2009 with 80,000 tagged images. It has since grown to to transform business. Available online at: include more than 14 million tagged images, any of which https://techcrunch.com/2016/04/30/why-image- are up for grabs at any time for machine training recognition-is-about-to-transform-business/ purposes. 7. John C. Russ, F. Brent Neal. 2015. The Image Processing Handbook. United States of America: CONCLUSION Florida CRC Press, 1035 p. Nowadays, computer vision became one of the most 8. Venmathi E. Ganesh, N. Kumaratharan. 2016. popular sphere of machine learning. In my opinion, main Kirsch Compass Kernel Edge Detection Algorithm reason for this is high spectrum of usage for computer for Micro Calcification Clusters in Mammograms. vision. There are many frameworks for creation your Middle-East Journal of Scientific Research, 24 (4), recognition system, by them you can create app in a few рр. 1530–1535. minutes. Application based on computer vision 9. Brytik V., Zhilina E., 2014. Investigation technologies can solve not only trivial tasks, such as possibilities of various filters which used in pattern recognise, but complicated one like scene reconstruction recognition problems Bionica Intellecta, 2(83), or motion analyses. Some companies use combinations of рр. 88–95. open data and open-source frameworks, as long as they 10. Semenets V., Natalukha Yu., O. Taranukha, have a team of engineers, or they might just use hosted Tokarev V., 2014. About One Method of APIs if computer vision is not something on which they Mathematical Modelling of Human Vision Functions. are staking their entire business. ECONTECHMOD. An international quarterly And for companies with a wide range of very specific journal, Vol. 3, No. 3, рр. 51–59. needs, there are custom solutions. No matter how it’s 11. Nick McClure. 2017. TensorFlow Machine Learning approached, though, it’s clear that image recognition Cookbook. Packt Publishing, 370 p. rarely exists in isolation; it’s made stronger by access to 12. Tensorflow. Image Recognition. Available online more and more pictures, real-time big data, unique at: https://www.tensorflow.org/tutorials/image_recog applications and speed. The businesses that make the nition most of these connections are the ones that will be best 13. Michael Nielsen. 2017. Using neural nets to recognize poised for success. Huge amounts of different software handwritten digits. Available online at: tools, like: free-to-anyone databases, open-source http://neuralnetworksanddeeplearning.com/chap1.html libraries, frameworks, API, etc.; provided by top IT 14. Michael Nielsen. 2017. How the backpropagation companies of the world create really good opportunities algorithm works. Available online at: for studying fast. Many different worldwide competitions http://neuralnetworksanddeeplearning.com/chap2.html at computer vision. In my opinion, computer vision 15. Michael Nielsen. 2017. Improving the way neural become trivial thing in next few years, and will be one of networks learn. Available online at: highly use technology in future. http://neuralnetworksanddeeplearning.com/chap3.html

84 Z. RYBCHAK, O. BASYSTIUK

16. Michael Nielsen. 2017. Why are deep neural networks 19. Parker J. R. 2011. Algorithms for Image Processing hard to train? Available online at: and Computer Vision. Wiley, 504 p. http://neuralnetworksanddeeplearning.com/chap5.html 20. Simon J. D. Prince. 2014. Computer Vision: Models, 17. The British Association and Society for Learning, and Inference. Cambridge University . 2017. What is computer vision? Press, 505 p. Available online at: http://www.bmva.org/visionoverview 21. Giovanni Maria Farinella, Sebastiano Battiato, 18. Gary Bradski, Adrian Kaehler. 2016. Learning Roberto Cipolla. 2015. Advanced Topics in OpenCV 3 Computer Vision in C++ with the Computer Vision. Springer Science & Business OpenCV Library. O'Reilly Media, 1024 p. Media, 433 p.