SENSITIVE TEXT ICON CLASSIFICATION
FOR ANDROID APPS
by
ZHIHAO CAO
Submitted in partial fulfillment of the requirements for the degree of
Master of Science
Thesis Advisor: Dr. Xusheng Xiao
Department of Electrical Engineering and Computer Science
CASE WESTERN RESERVE UNIVERSITY
January, 2018
CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES
We hereby approve the thesis/dissertation of
Zhihao Cao
candidate for the degree of Master of Science.
Committee Chair
Xusheng Xiao
Committee Member
Andy Podgurski
Committee Member
Ming-Chun Huang
Date of Defense
Nov. 30 2017
*We also certify that written approval has been obtained for any proprietary material contained therein.
Contents
List of Tables 3
List of Figures 4
List of Abbreviations 7
Abstract 8
1 Introduction 9
2 Background 16
2.1 Permission System in Android…………………………………………………..16
2.2 Sensitive UI Detection in Android……………………………………………….17
2.3 Pixel and Color Model…………………………………………………………...19
2.4 Optical Character Recognition…………………………………………………...21
3 Design of DroidIcon 23
3.1 Overview…………………………………………………………..……………..23
3.2 Image Mutation…………………………………………………………………..24
3.2.1 Image Scaling………………………………………………………………24
3.2.2 Color Inversion…………………………………………………………….31
1 3.2.3 Opacity Conversion………………………………………………………..33
3.2.4 Grayscale Conversion………..…………………………………………….37
3.2.5 Contrast Adjustment……………………………………………………….42
3.3 Text Icon Classification 48
3.3.1 Text Cleaning……………………...……………………………………….48
3.3.2 Keyword Dataset Construction…………………………………………….49
3.3.3 Classification Algorithm…………………………………………………...50
3.4 DroidIcon………..……………………………………………………………….58
4 Evaluations 60
4.1 Icon Dataset Construction………………………………………………………..60
4.2 Effectiveness of DroidIcon…..…………………………………………………..61
4.3 Case Study……………………………………………………………………….73
5 Related Work 76
6 Discussion and Conclusion 78
Bibliography 80
2
List of Tables
3.1 The pseudo code for image scaling……...……………………………….30
3.2 The pseudo code for color inversion……………………………………..32
3.3 The pseudo code for opacity conversion…………...……………………37
3.4 The pseudo code for grayscale conversion….…...………………………40
3.5 The pseudo code for contrast adjustment………………………………...46
3.6 Keyword Set……………………………………………………………...49
3.7 The pseudo code for text icon classification……………………………..57
3.8 The pseudo code for DroidIcon …………………………………………58
3
List of Figures
1.1 Motivation example of DroidIcon……………………..………………...13
2.1 Screenshots of Android Permission Requests………….………………...17
2.2 Example sensitive text label……………………………………………...18
2.3 Example of pixels in an image…………………………………………...19
2.4 RGB color model mapped to a cube……………………………………..20
3.1 Overview of DroidIcon……….………………………………………….23
3.2 Normalized sinc function………………………………………………...26
3.3 Lanczos window for a = 1, 2, 3…………………………………………..26
3.4 (a) Lanczos kernel for a = 2………………………………………………….28
3.4 (b) Lanczos kernel for a = 3………………………………………………….28
3.5 (a) Before scaling……...……………………………………………………..30
3.5 (b) After scaling……...………………………………………………………30
3.6 An example icon with bright characters and deep background………….31
3.7 The icon in Figure 3.6 after color inversion……..………………………33
3.8 User Interface containing ghost buttons………………………………....34
3.9 (a) Example ghost button……………………………………...……………..35
3.9 (b) Example ghost button without transparent ghost background...…………35
3.10 Converted icon with opacity mapped to RGB color……...……………...37
3.11(a) Image of colored bars…………………………………………………....39
3.11(b) Converted bars after using Intensity…………………………………..…39
4 3.11(c) Converted bars after using Luminance………………………………..…39
3.12(a) Example color icon that OCR fails to process……..…………………….40
3.12(b) The grayscale image for Figure 3.12 (a)……………………………...…41
3.12(c) Example icon after grayscale conversion and color inversion..…………41
3.13(a) Example of image with very low contrast……………………………….43
3.13(b) Example of image with very high contrast………………………………44
3.14(a) Example icon before contrast adjustment………………………….…….47
3.14(b) Example icon after contrast adjustment………...……………………….47
3.14(c) Example icon after contrast adjustment and color inversion….…………47
3.15 Example Email icon with extracted text “L\_/j Email”...………………..48
3.16 Demonstration of Levenshtein distance………………………………….52
4.1 Number of words in text icons………………………..………………….61
4.2 Recall of OCR…………………………………………………………...62
4.3 Recall of OCR + Classification ………………………………………...62
4.4 Comparison of the recalls of OCR and OCR + Classification……...….62
4.5 Recall of OCR + Classification + Image Scaling…………………...... 64
4.6 Recall of OCR + Classification + Color Inversion………………...….64
4.7 Recall of OCR + Classification + Opacity Conversion……………….64
4.8 Recall of OCR + Classification + Grayscale Conversion….………….65
4.9 Recall of OCR + Classification + Contrast Adjustment...... 65
4.10 Comparison of recalls among all the image mutation techniques……….66
4.11 Recall of DroidIcon…………………………...………………………...67
4.12 Comparison of recalls between OCR and DroidIcon……………….…..67
5 4.13 Recall, precision, accuracy, and F1-score achieved by DroidIcon……...69
4.14 Icons with unusual or decorative fonts…………………………………..69
4.15 Icons with unsuitable character and image size….………………………69
4.16 Scaled Icon from Figure 4.15……….……………………………………70
4.17 Email icon with image scaling and contrast adjustment.……………..….70
4.18 Icon with similar colors in text and background………………………....71
4.19 Icon in Figure 4.18 after contrast adjustment……………………………71
4.20 Comparison of DroidIcon’s performance using different similarity
thresholds……………………………………………………………...... 72
4.21 Influence of similarity threshold on effectiveness……………………….72
4.22 Case study: Text Icons for Location, Message, Email, Contracts and
Call……………………………………………………..………………...74
4.23 Case study: Text Icons for Messaging and Email
……………………………………………………………………………75
6
List of Abbreviations
UI User Interface
OCR Optical Character Recognition
7
Sensitive Text Icon Classification for Android Apps
Abstract
by
ZHIHAO CAO
As smartphones have played a very important role in people’s daily life, users' privacy and security become a serious concern. Previous research efforts in improving mobile app security mainly focused on the predefined sources of sensitive information managed by smartphone platforms. To the best of our knowledge, text icons, a type of user interface elements that may indicate uses of the users' sensitive information, have been largely neglected. In this thesis, we proposed an approach to automatically identify text icons in the UIs of smartphone apps, and classify them into predefined categories of sensitive information. In particular, we developed an algorithm DroidIcon based on OCR
(Optical Character Recognition) to determine whether the texts contained in text icons indicate uses of sensitive information. To evaluate the effectiveness of DroidIcon, we apply the algorithm to 707 text icons collected from 2000 popular Android apps. The algorithm achieves an accuracy of 90.52%, a precision of 91.28% and a recall of 88.25% for classifying text icons into pre-defined categories of sensitive information.
8
Chapter 1
Introduction
With the rapid development of mobile phones, smartphones become more and more popular and are playing an important role in people’s daily life. Today, millions of mobile applications (i.e., apps) are available in app stores. These apps enable smartphones to address various kinds of needs from users. In order to provide better services, apps use more and more user’s sensitive information to customize their functionality. However, certain apps may have behaviors that could be less than desirable or even harmful. For example, some apps obtain users’ personal data such as GPS coordinates, contact list, and e-mail addresses without consensus from the users, and advertisers exploit such data as a marketing channel to bundle pushy ads with apps [27].
To protect users’ sensitive information in smartphones, a lot of research efforts have been spent in constraining the uses of private user data through a data-access control mechanism. That is, in order to access users’ sensitive information, apps need to request the corresponding permissions from the users. For example, to access the users’ contact list, the apps need to request the READ_CONTACTS permission. However, this kind of protection mechanism has shown limited success [28], since many apps have legitimate reasons to request users’ permissions in using their private data and it is difficult to distinguish such legitimate behaviors from the undersized behaviors. For example, apps
9 recommending restaurants use users’ GPS data to suggest restaurant near the users, and apps providing travel planning services let users make phone calls or send messages.
To detect undesired behavior in mobile apps, we are motivated by the vision: can analysis of an app’s program behavior be contrasted with the intents of the app to determine whether the app will perform within the user’s expectation? In other words, we aim to automatically check the compatibility between the intents expressed by an app and its behind-the-scene behaviors. For example, if an app’s user interface (UI) has no texts or images to indicate that it will access users’ GPS data (i.e., no intents for GPS data), but the app sends out users’ GPS data when a button is pressed, then red flags should be raised.
Other useful scenarios include reading users’ contacts, sending SMS messages, and taking pictures.
Apps’ UIs contain various types of semantic information that express the intents of the apps. For example, a button with the text “Location” in the UI indicates that the app will access user’s sensitive location data once the user clicks the button. Therefore, understanding these types of semantic information provides us an important mechanism for automatically detecting apps’ intentions in using user’s sensitive data, which is the first step towards automatically checking the compatibility between apps’ behaviors and their intentions.
Existing research works [1][2] focus on detecting sensitive information via analyzing the texts in UIs, such as text labels and input fields. However, another important type of UI elements, icon, which also contains rich semantic information, has not been explored yet. Icon is an important component of UI and has been wildly used in mobile
10 apps. As mentioned in [26], designers replace text labels with icons because they make UI more stylish, save screen space, and fast to recognize at a glance.
Among the icons used in Apps’ UIs, text icons, which refers to icons embedded with texts, are widely used to show the apps’ intentions in using users’ private information.
Unfortunately, existing works [1][2][29][30] focus on analyzing the textual artifacts of
Android apps, and have limited capability in analyzing the texts in icons to understand their semantic information. The reason is that these texts are represented using pixels in digital images, rather than texts that can be extracted directly from UI layout files [1][2]. Although these works may analyze the file names of icons to infer semantic information based on the keywords in the file names, many apps adopt file names such as “icon1.png” or “1.png”, which do not provide much semantic information and render these works ineffective.
To address the important problem of understanding the semantic information of icons in UIs, this thesis proposes an approach, Sensitive Text Icon Classification
(DroidIcon), that classifies text icons into one of the pre-defined semantic categories. More specifically, DroidIcon adapts Optical Character Recognition (OCR) techniques to extract characters from the icons, computes the similarity between the words formed by the extracted characters and the keywords in each semantic category, and classifies the icons to the semantic category based on the highest similarity.
In particular, since icons in Android apps are usually small, diversified, and partially or totally transparent, OCR techniques face challenges in recognizing the characters with high precision. In fact, directly applying existing OCR techniques can only infer semantic information from less than 10% of the studied icons. To address this challenge, we propose DroidIcon that explores the possibility of applying various image
11 mutation techniques to convert the icons into OCR-friendly images. Our algorithm significantly improves the precision of character recognition, and thus the overall effectiveness of our approach as well.
To determine whether the texts in text icons indicate the users of sensitive information, we define 9 categories of semantic information based on the frequently-used sensitive information in mobile apps. Among the 9 categories, 7 of them indicate different types of sensitive information: Camera, Contracts, Location, Email, Phone, Photo, SMS, and the rest two do not: Non-sensitive text and Non-text. The non-sensitive text category means an icon contains text but the text does not indicate the uses of users’ sensitive information, and the non-text category refers to the icons with no embedded text. Based on the 9 predefined categories, given a text icon, our work aims to determine which category does it belong to. The semantic information provided by these 9 categories can be used by various types of privacy analysis, such as checking whether the semantic information represented by a category is compatible with the permissions requested by apps.
Based on our empirical study of icons from 2000 apps downloaded from Google
Play, most of the icons contain less then 3 words (Section 4.1). Thus, to classify a text icon into a semantic category, we adopt the keyword-based approach that compares the words formed by the extracted characters from the icon to the keywords used in each of the 7 sensitive semantic categories. If a match is found, then the icon is classified into the corresponding semantic category; otherwise, the icon is classified into the Non-sensitive text category. For icons without texts, they are classified into the Non-text category.
However, even with the precision of character recognition can be improved via iterative image mutations, it is still very difficult, if not impossible, to perfectly recognize
12 every character from text icons, since texts could be presented using custom fonts and styles. Thus, in many cases, OCR may extract part of the embedded text from the text icons, and obtain a set of incomplete words. To address this challenge, we propose an edit distance based algorithm to find the most similar keyword via computing the similarity between the extracted words and the keywords in each semantic category. If the similarity between an extracted word and a keyword is higher than a threshold, we consider it is a match and the icon is classified into the corresponding semantic category.
Figure 1.1 Motivation example of DroidIcon
To better illustrate the motivation of DroidIcon, we show a real Android app, named
MyCityWay, whose UI is shown in Figure 1.1. This app provides information of local places to users. As we can see, there exist five icons (marked in red) in this UI. Among these icons, three of them contain the texts that indicate the uses of sensitive information which are “Call”, “Direction”, and “Map”. The “Call” icon indicates it will access user’s phone call information. The “Direction” and “Map” icons indicate they will use user’s location information. The developer will get access to the sensitive data when a user clicks
13 the icons. This may lead to potential risk of exploiting the user’s sensitive information if the app abuses the users’ phone call information or the app accesses other types of sensitive information in contrary to what the users expect. Therefore, if we can detect the apps’ intentions in using the user’s sensitive information and classify them to correct sensitive category, we can apply appropriate behavioral analysis to check whether the corresponding behaviors of the program is within the user’s expectation. As shown in Figure 1.1, given an icon, our algorithm classifies it to one of 9 predefined semantic categories. The “Call” icon should be classified to the Phone category. The “Direction” and “Map” should be classified to the Location category. The “Home” should be classified to the Non-sensitive
Text category. The rest one should be classified to the Non-text category. The classification result will be used for further behavioral analysis.
To evaluate the effectiveness of DroidIcon, we apply DroidIcon on 707 text icons extracted from 2000 apps downloaded from Google Play. Among 707 text icons, 332 positive icons contain sensitive texts and the other 375 negative icons either contain non- sensitive texts or do not embed any text. We compare DroidIcon with OCR in recognizing texts and classifying text icons, and the results show that DroidIcon correctly classifies
90.1% of the 332 positive icons while the approach based on OCR correctly classifies less than 10% of the 332 positive icons. We also measure the effectiveness of different image mutation techniques adopted in DroidIcon, and show the improvement brought by each technique. Based on the results, we show DroidIcon that iteratively apply different image mutation technique achieves the best results.
The rest of the thesis is organized as follows. In chapter 2, we present related work about sensitive UI detection and basic background knowledge of our work. In chapter 3,
14 we present an overview of our algorithm and introduce each component of algorithm in detail. In chapter 4, we conduct experiments to evaluate the effectiveness of our algorithm and have case study. In chapter 5, we conclude our work.
15
Chapter 2
Background
In this chapter, we first introduce the background of Android permission system and related works about sensitive UI detection in Android. Then we provide background knowledge about pixel, color model and Optical Character Recognition (OCR) for further usage.
2.1 Permission System in Android
Android has become a very popular platform for third-party applications because of its’ unrestricted application market and open source. It supports third-party development with an extension API that includes access to phone hardware, settings, and user data [31].
Access to privacy- and security-relevant parts of Android’s rich API is controlled by an install-time application permission system. It means each application must declare what permission it needs and notice users during the installation (Figure 2.1(a)). All these applications can only access to their own files by default. Therefore, in order to access system resources such as text messages, list of contracts, privacy images, the third-party have to get the access permissions from user. For example, to access list of contacts, the permission READ_CONTACTS must be requested and granted. Recent improvement of
Android’s permission system supports runtime permission requests (Figure 2.1(b)), which
16 pops up a dialog to request a permission the first time an app uses users’ protected information.
(a) (b)
Figure 2.1 Screenshots of Android Permission Requests
However, it is difficult for users to make decision about whether to grant permissions to the app or not. The reason is that it is very difficult to distinguish legitimate behaviors of benign apps from the undesired behaviors of malicious apps, since many benign apps request the same permissions as malicious apps. Therefore, if we can understand the intents expressed by an app and check whether the corresponding behaviors behind the screen are compatible as the intents, we can leverage such mismatch to detect the undesired behaviors.
2.2 Sensitive UI Detection in Android
In Android apps, UIs communicate the intents of app through texts and images, and thus contain lots of semantic information that may indicate the uses of user’s sensitive information. There already exit some research efforts that focus on detecting the use of
17 user’s sensitive information using the semantic information in the UI. UIPicker [1] detects sensitive user information through applying supervised learning on the semantic information extracted from the program code of UI elements. Besides the features extracted from the texts and layout descriptions, UIPicker also considers the texts of the sibling elements in the layout file which could include unrelated texts as features.
SUPOR [2] leverages the semantic information from the text labels that are physically close to input fields in the screen. Generally, text labels are used in UI as the descriptions to guide users to enter the input. Therefore, understanding the semantics of text labels could help us determine whether the corresponding input fields need to access user’s sensitive information. Then we can analyze the corresponding program behind the screen to check whether the app behaves as expected or maliciously.
Figure 2.2 Example sensitive text label [2]
Figure 2.2 shows an example UI that requires users to input their sensitive information: User ID and Password. There are two text labels “User ID” and “Password” in the UI that guide user to enter the information in the input fields. SUPOR first parses the layout files and finds the text labels together with input fields. Then it compares the texts in the text labels against a predefined keyword dataset to determine the sensitiveness of the input fields. Finally, the result is sent to the privacy analysis part for behavioral analysis.
18 Both UIPicker and SUPOR have studied the possibility of detecting user of user’s sensitive information from semantic information in texts. Images, especially icons, another important element in the UIs have been largely neglected by these approaches. Thus, UIs with sensitive text icons but without sensitive texts may be classified by these approaches as non-sensitive, causing lots of false negatives in privacy analysis. To address this important problem, we propose DroidIcon to detect sensitive text icons in the UIs of
Android apps.
2.3 Pixel and Color Model
In order to sense, represent and display images in electronic system, researchers proposed pixels, which are the smallest elements in a digital image. The digital image is a rectangular grid of pixels with fixed rows and columns. Figure 2.3 is an example of pixels
[4]. It shows an image with a portion enlarged, the individual pixels are rendered as small squares and can easily be seen.
Figure 2.3 Example of pixels in an image [4]
In an image, a pixel represents a single color dot. All the pixels arranged in a rectangular grid form a colorful image. To represent the full range of colors, researchers propose the color model, an abstract mathematical model that represents colors as tuples
19 of numeric values. The most commonly used color model is RGB, which is wildly used in various digital image formats such as JPEG, PNG, etc. RGB is an additive color model, which means the color is created by mixing a number of different primary colors. RGB refer to three primary colors which is “Red”, “Green”, “Blue” respectively. We can create millions of colors based on these three colors. In this thesis, all the icon images are either
JPEG or PNG format. Therefore, all of them adopt the RGB model.
A color in the RGB color model expressed as an RGB triplet (r, g, b), where “r”,
“g”, and “b” are the numeric values that describe how much red, green, and blue are included in the color, respectively. Each value for a primary color can vary from zero to a defined maximum value. If all the values are zero, the resulting color is black; if all the values are maximum, the resulting color is the brightest color, i.e., white. Therefore, the geometric representation of RGB color model is a cube, where each color is a point within the cube, on its face, or along its edges.
Figure 2.4 RGB color model mapped to a cube [5]
Figure 2.4 shows a cube that the RGB color model is mapped to. The horizontal x- axis represents the values for the red color, the y-axis represents the blue color, and the z-
20 axis represents the green color. The origin, representing the black color, is the vertex hidden from view.
The value of primary color could be quantified in different ways. In computers, each primary color is often represented as an integer number ranging from 0 to 255, so that it could be represented use a single 8-bit byte. For example, the RGB triple value of black is (0, 0, 0), red is (255, 0, 0), and white is (255, 255, 255). In this thesis, we utilize this kind of representation to help us manipulate the color of icons.
RGBA is a color space based on RGB, which provides an extra alpha channel. The alpha channel is normally used to represent the opacity degree of the color. The value could also be represented using an integer between 0 and 255. If a pixel has 0 in its alpha channel, it is fully transparent (invisible); if it has 255, the pixel has fully opaque color which is the same as traditional RGB.
In our studies, we find that many PNG images describe their contents via color opacity instead of using different colors. Thus, we need to transform the opacity differences to color differences to make the image content distinguishable by OCR.
2.4 Optical Character Recognition
Optical Character Recognition (OCR) is an important research filed of computer vision. The task of OCR is to identify characters in images of printed or handwritten texts and convert them into machine-encoded text (e.g. ASCII), so that they could be recognized and edited by computer programs.
Currently, many types of OCR libraries are publicly available [7], such as Tesseract
OCR, FreeOCR, Asprise OCR, etc. We build DroidIcon upon Asprise OCR, which provides high performance open source APIs for common OCR tasks. It has a very high
21 detection accuracy as mentioned in [8]: “By running a sample of 200 image e-mails, we determined that Asprise OCR was performing with an accuracy of 95%. It had the best detection rate among the approaches we analyzed”. It supports various kinds of image formats such as JEPG, PNG, etc. It also provides different SDKs to support multiple programming languages and could recognize texts in more than 20 languages such as
English, Spanish, French, etc. In this thesis, we use its Java SDK and focus only on English texts.
Although Asprise OCR is introduced as a high performance OCR engine, it does not perform well on Android icons for several reasons. First of all, due to the size limitation of smartphone screens, the icon size is usually small. For example, the smallest size is 48 x 48 in our collected icon dataset. This leads to low resolutions of the texts, which in turn affects the OCR accuracy. Second, the OCR engine works much better on icons with deep color characters in bright color background than bright color characters in deep color background. However, due to diversified icon styles in Android apps, there are many icons that have bright color characters in deep color background, posing challenges for OCR.
Third, in order to provide better user experience, there exits many icons with low contrast and ghost buttons (icons designed via opacity differences) in the UI. These icons are also difficult for the OCR engine to correctly recognize their embedded texts. Therefore, we propose to use different image mutations to convert all these kinds of images to OCR friendly icons.
22
Chapter 3
Design of DroidIcon
In this chapter, we first present an overview of our DroidIcon and then explain each component of DroidIcon in details.
3.1 Overview
Figure 3.1 Overview of DroidIcon
Figure 3.1 is the overview of DroidIcon. It consists of three major components:
Image Mutation, Optical Character Recognition, and Text Icon Classification. DroidIcon takes an APK icon as input and outputs the classification of the icon.
23 The image mutation component accepts an APK icon image and applies different mutations to the icon iteratively. The mutated icons are used as the input to the optical character recognition component. The character recognition component detects and extracts the texts embedded in the icon. The extracted texts are sent to the text icon classification component. The text icon classification component determines the semantic category of the icon by checking the extracted texts against keywords in each semantic category. We predefined 9 semantic categories where 7 of them indicate the uses of sensitive information and the remaining two categories (Non-sensitive text and Non-text) indicate that there are no sensitive texts in the icon. 3.2 Image Mutation
To address the challenges faced by OCR, we leverage different image mutation techniques to convert icons to OCR friendly images. As shown in Figure 3.1, we have five techniques, which are Image Rescaling, Color Inversion, Opacity Conversion, Grayscale
Conversion, and Contrast Adjustment.
3.2.1 Image Scaling
Resolution (pixels per inch) [16] is an important factor to control the image quality, and thus directly affects the accuracy of OCR. Lower resolutions typically produce images where pixels of a character are condensed in a small region, compromising the accuracy of character recognition. On the other hand, higher resolutions produce larger images where pixels of a character are spread to different areas, also affecting the accuracy of character recognition. Therefore, it is important to scale the image size so that it is neither too big nor too small. We crawl icons from apps downloaded from Google Play, and these icons have variable resolutions. In our dataset, small icons have the size of only 48x48 and large
24 icons have the size of 300x300. Based on our empirical observations in applying OCR on these icons, the OCR engine performance better when the image size is around 100x100.
Thus, we adopt 100 as the standard for image scaling.
Enlarging or shrinking images can be interpreted as a form of resampling or image reconstruction. Currently, many image scaling algorithms have been proposed.
Theoretically, sinc cardinal resampling [10] provides the best performance. However, the assumptions behind sinc resampling are not completely met in real-world digital images.
Therefore, we implement Lanczos resampling, an approximation to the sinc cardinal method, as our image scaling algorithm, also it yields better results than the sinc cardinal sampling.
Lanczos resampling is an interpolation algorithm. Interpolation is a method of constructing new data points within the range of a discrete set of known data points. Given a set of input samples, the effect of each input sample on the interpolated values is defined by the reconstruction kernel L(x), called Lanczos kernel. The kernel is composed of two parts: normalized sinc function sinc(x), windowed (multiplied) by the Lanczos window function sinc(x/a) where −� ≤ � ≤ �, it is defined as [11]:
(3.1) where a is the size of the kernel, the sin function is defined as:
sin(��) sinc(�) = �� (3.2) and the window function is defined as:
�∗sin(��/�) sinc(�/�) = �� (3.3)
25 Figure 3.2 is the plot of normalized sinc function, we can see the function is symmetic based on x = 0. The lobe has smaller absolute value of its peak when it has larger distance from y-axis. The plot tells us that the farther the sample is, the lower its effect is.
Figure 3.2 Normalized sinc function
Figure 3.3 is the plot for Lanczos window when a = 1, 2, 3. When � is outside the range [−�, �] , the � values are set to zero, limiting the Lanczos kernel within [−�, �].
Figure 3.3 Lanczos window for a = 1, 2, 3
26 Based on (3.2) and (3.3), (3.1) could be written as [11]:
(3.4)
Based on (3.4), given a one-dimensional input samples � , we can define the effects of all the input samples on the interpolated value as S(x) for arbitrary real argument x. It can be represented as the discrete convolution of these input samples with the Lanczos kernel [11]:
(3.5) where � is the floor function. Based on the one-dimensional Lanzcos kernel, we extend it to a two-dimensional kernel that is [11]:
(3.6) where x, y each represents a dimension. And we can conduct the two-dimensional discrete convolution S(x, y) based on (3.5) [11]:
(3.7)
An image represented in the RGB model can be interpreted as a two-dimension grid, where each pixel is a point in this grid and the coordinates of the pixel correspond to the � (row) and � (column) values in the grid. In this way, we can apply two dimensional
Lanczos kernel to scale the image into a given size.
27 As claimed by Turkowski and Gabrial [12], the Lanczos filter (with a = 2) achieves
"the best compromise in terms of reduction of aliasing, sharpness, and minimal ringing".
Therefore, we use Lanczos kernel with a = 2 for scaling down icons.
According to Jim Blinn [13], the Lanczos kernel (with a = 3) "keeps low frequencies and rejects high frequencies better than any (achievable) filter we've seen so far”. Therefore, we use Lanczos kernel with a = 3 for scaling up icons. Figure 3.4(a) and
3.4(b) show the Lanczos kernel when a = 2 and a = 3.
Figure 3.4 (a) Lanczos kernel for a = 2 Figure 3.4 (b) Lanczos kernel for a = 3
To scale an image with W width (row) and H height (column), the image is represented as discrete function � (�, �) as:
� �, � = (� , � ) (3.8) where i ∈ [0, W-1] and j ∈ [0, H-1]. Each pair (� , � ) represents the coordinates of a pixel in the image. The pair (0, 0) represents the coordinates of the upper left pixel and the pair
(W-1, H-1) represents the coordinates of the lower right pixel in the image. For each pixel, we define each channel of its color as (� , � ) for red, (� , � ) for blue, (� , � ) for green, and (� , � ) for transparent if the image has transparent channel.
Accordingly, the output image with W’ width and H’ height is defined as:
� �, � = (� , � ) (3.9)
28 where p ∈ [0, W’-1] and q ∈ [0, H’-1]. To obtain the pixels in the output image, we use the Lanczos kernel to compute each channel value of the pixel based on the corresponding channel of a set of pixels from the input image.
For example, if we want to compute (� , � ) , we compute S(� , � )first: