A Smart Targeting System for Online Advertising
Total Page:16
File Type:pdf, Size:1020Kb
778 JOURNAL OF COMPUTERS, VOL. 4, NO. 8, AUGUST 2009 A Smart Targeting System for Online Advertising Weihui Dai School of Management, Fudan University, Shanghai 200433, P.R.China Email: [email protected] Xingyun Dai and Tao Sun School of Management, Fudan University, Shanghai 200433, P.R.China Email: {xingyundai, stephensun2004}@163.com Abstract—With the rapid increase of online advertisements solved by traditional advertising, such as “I know I have in marketing, as well as the user's attention resources are wasted half of the money in advertising, but I don’t know becoming scarce, targeting technique is applied to improve which half”. Therefore, if we send advertisements to the the efficiency and effectiveness of online advertising by right audience through online advertisements targeting delivering the right advertisement to the right audience, at technology, it can improve the efficiency of the online the right time and situation, and with the right method. The attention of current targeting is transformed from content advertisements platform, make visitors become buyers, targeting, frequency targeting, time targeting and save the cost of advertisers and improve customer’s geographical targeting to the extended targeting based on loyalty. It is greatly valuable for today’s companies, who user’s characteristics, with data mining technique and is under the short-life and highly complicated and behavior modeling technique on the analysis of users. The competitive business environments [2] [3]. To reach the protection of privacy has been an argued issue along with goal, the system needs thoroughly understand user’s the application of above technique. current interests, establish proper system architecture, and This paper presents a smart targeting system with high protect the user’s privacy at the same time. efficiency, effectiveness and protection of privacy. It uses Some scholars carry out a large number of quantitative Web content mining and Web usage mining techniques to track and mine the user behaviors hiding in the historical and modeling researches on online advertising targeting and current user sessions, and designs interfaces for system. Aggarwal C.C. (1998)pointed out that most of the maintaining the advertising rules, which can control the online advertising system used user-based targeting targeting system to automatically deliver the personalized method for the banner advertisements, and introduced a advertisements. All the related knowledge and rules are number of statistics, optimization and scheduling model integrated by one unified vector space model. The process of [4]. Mobasher B. (2001) proposed the use of associated targeting doesn’t require any sensitive data input by users, mining rules based on user behaviors to provide an so their privacy is protected. effective and extendable customized webpage technology [5]. Milani A. (2004, 2005) presented the use of fuzzy Index Terms—online advertising, web advertising, personalization, behavioral targeting, web mining similarity algorithm [6] [7] and gave a general adaptive online service architecture based on evolutionary genetic algorithm [8]. Scholars in the research of personalized I. INTRODUCTION network services argued that it is necessary to have targeting system from content targeting, frequency The emergence and development of online targeting, time targeting and geographical targeting to the advertisements are subjected to three direct influences, extended targeting based on user’s characteristics [9]-[12]. such as the quantity of websites, the quantity of online Recently, scholars have expressed their concern about customer services and the promotion of broadband. At user’s privacy issues in personalized recommendation present, these three conditions have become mature with system [13]-[16]. Kazienko P.and Adamski M.(2004) the rapid application of Internet. According to introduced AD ROSA system for integrating user iResearch’s prediction, for example, China's online behavior and content mining technologies to reduce the advertising market will have 30% compound growth in user's input and protect the privacy[16]. the next few years, and reach 21.6 billion in 2009[1]. Therefore, our paper is aimed to address the further With the growing proportion of online advertisements in research on ST (Smart Targeting) system to meet the marketing, as well as the user's attention resources are current development trend of online advertisements, becoming scarce, people are eager to rely on the displaying the advertisements to the right audience and advanced technology to solve the problems that can’t be protecting their privacies efficiently and effectively. It uses Web content mining and Web usage mining Manuscript received Feb 20, 2009; revised April 30, 2009. This techniques to track and mine the user behaviors hiding in research was supported by National High-tech R & D Program (863 the historical and current user sessions, and designs Program) of China (No.2008AA04Z127), National Natural Science Foundation of China (No.70401010),and Shanghai Leading Academic Discipline Project (No.B210). © 2009 ACADEMY PUBLISHER JOURNAL OF COMPUTERS, VOL. 4, NO. 8, AUGUST 2009 779 interfaces for maintaining the advertising rules, which After user’s first request for a webpage, start a new can control the targeting system to automatically deliver session and then the system will collect every step of the the personalized advertisements. All the related user’s assess behavior including URL, time and behavior knowledge and rules are integrated by one unified vector of clicks. We can know the content that he/she interested space model. The process of targeting doesn’t require any in and the access mode he/she prefer. These factors can sensitive data input by users, so their privacy is protected. provide the parameters when we want set the right advertisements to the user. II. TARGETING SYSTEM FOR ONLINE Fig.2 shows the operation of ST system. We have ADVERTISING following steps to reach the goal: 1) Data Collection A. Concept and Operation Through the use of Web data mining technology to In order to realize targeting for advertising, the smart extract webpage content and the history of the user targeting (ST) system we presented here must have some session data, as well as track and dig the current user’s basic functions: webpage and the user related data behaviors, we can get the user’s interest and preferences. collection, web content modeling and model updating, 2) Pattern Recognition user modeling and model updating, and targeting [25]. We can get the typical behavior of a group of users by We can see the concept of ST system in Fig.1. clustering the keywords and user session after keyword extraction. 3) User Modeling For each group of users who have similar access behaviors, we establish their short-run and long-run user model. 4) User Matching For each user, we match him/her to the proper user model based on the user's current interest or long-run interest. 5) Targeting We set different advertisement options for the users by different user model with appropriate advertisement rules Fig.1 Concept of ST System to achieve the purposes of targeting. Fig.2 Operation of ST System © 2009 ACADEMY PUBLISHER 780 JOURNAL OF COMPUTERS, VOL. 4, NO. 8, AUGUST 2009 importance coefficients in title, description and the whole B. Data Collection t j ST system uses the web log format that is compatible keywords. n is the quantity of webpage that include with W3C [17]. The format is as follows: the keyword. A method named HACM can be used to find the Jaccard coefficient which can help us to get the group. Jaccard coefficient tp tp L w ∗w ∑k =1 ik jk sim(t p,t p) = i j tp tp tp tp ST system uses sid (SessionID) to record and L (w )2 + L (w )2 − L w ∗w ∑k =1 ik ∑k =1 jk ∑k =1 ik jk recognize the user’s session [18] , uid to record and recognize user’s unique ID. Referer is the coming path of i, j∈[1,L]. current page. We can know the time the user spended at The whole keywords become K groups and the mean different pages with referrer and uri. The other 1 keyword web vector ctp = ∑ n t p can parameters please refer to the document [17] k i =k 1 ik nk The display and click of the advertisements have the same log format: represents the feature of group k. nk is the number of keywords of group k. Similarly, we can get mean keyword advertisement 1 ta ta ta vector n . t a={w ,w , ,w } is the ctak = ∑i k= 1tika j j1 j2 L jM nk Type is used to distinguish display from click. Time, c- advertisement vector of the keyword i belong to group k. ip and referrer can be judged by the advertisement engine Similarly, we can have the users’ sessions clustered. while other parameters are transferred by links. Referer u = {u ,u , , u } represents all the users and represents the parent link, advid represents the ID of 1 2 L N advertisement and camid represents the ID of sa sa sa sa s a = {w , w ,L, w } ( wij represents the number advertisement activity. i i1 i2 iM that advertisement j is being requested during the session C. Pattern Recognition sa of use i. wij can be normalized to[0,1])is all user’s At this stage, we should get all the keywords and then cluster them. Here we use word segmentation method to session vector. get the keywords. Firstly, we can get the segmentation After clustering with HACM, we can get the Jaccard result from both sides based on dictionary processing. coefficient, group mean user’s session vector and mean Secondly, some rules are used to disambiguate based on advertisement access vector as follows: sp sp statistics processing. Finally save the preliminary L w ∗w ∑k =1 ik jk keywords d = {t , t , , t } from the web space sim(s p,s p) = , 1 2 L Q i j sp sp sp sp L (w )2 + L (w )2 − L w ∗w ∑k =1 ik ∑k =1 jk ∑k =1 ik jk w ={p1, p2,L, pL} to the database.