Online Forgetting Process for Linear Regression Models

Online Forgetting Process for Linear Regression Models Yuantong Li Chi-Hua Wang Guang Cheng Department of Statistics Department of Statistics Department of Statistics Purdue University Purdue University Purdue University Abstract Privacy Act Right (CCPA) (State of California Depart- ment of Justice, 2018), which also stipulates users to require companies and organizations such as Google, Motivated by the EU’s "Right To Be Forgot- Facebook, Twitter, etc to forget and delete these per- ten" regulation, we initiate a study of sta- sonal data to protect their privacy. Besides, users also tistical data deletion problems where users’ have the right to request the platform to delete his/her data are accessible only for a limited period obsolete data at any time or only to authorize the of time. This setting is formulated as an on- platform to hold his/her personal information such as line supervised learning task with constant photos, emails, etc, only for a limited period. Unfor- memory limit.Weproposeadeletion-aware tunately, given these data are typically incrementally algorithm FIFD-OLS for the low dimensional collected online, it is a disaster for the machine learn- case, and witness a catastrophic rank swinging ing model to forget these data in chronological order. phenomenon due to the data deletion oper- Such a challenge opens the needs to design and analyze ation, which leads to statistical inefficiency. deletion-aware online machine learning method. As a remedy, we propose the FIFD-Adaptive Ridge algorithm with a novel online regular- In this paper, we propose and investigate a class of ization scheme, that effectively offsets the un- online learning procedure, termed online forgetting pro- certainty from deletion. In theory, we process,toadaptusers’requeststodeletetheirdatabefore vide the cumulative regret upper bound for a specific time bar. To proceed with the discussion, both online forgetting algorithms. In the ex- we consider a special deletion practice, termed First periment, we showed FIFD-Adaptive Ridge In First Delete (FIFD), to address the scenario that outperforms the ridge regression algorithm the users only authorize their data for a limited period. with fixed regularization level, and hopefully (See Figure 1 for an illustration of the online forgetting sheds some light on more complex statistical process with constant memory limit s.) In FIFD, the models. agent is required to delete the oldest data as soon as receiving the latest data, to meet a constant memory limit. The FIFD deletion practice is inspired by the 1Introduction situation that, the system may only use data from the past three months to train their machine learning model to offer service for new customers (Jon Porter, Today many internet companies and organizations are 2019; Google, 2020). The proposed online forgetting facing the situation that certain individual’s data can process is an online extension of recent works that no longer be used to train their models by legal require- consider offline data deletion (Izzo et al., 2020; Ginart ments. Such circumstance forces companies and organi- et al., 2019; Bourtoule et al., 2019) or detect data been zations to delete their database on the demand of users’ forgotten or not (Liu and Tsaftaris, 2020). willing to be forgotten. On the ground of the ‘Right to be Forgotten’, regularization is established in many In such a ‘machine forgetting’ setting, we aim to deter- countries and states’ laws, including the EU’s General mine its difference with standard statistical machine Data Protection Regulation (GDPR)(Council of Euro- learning methods via an online regression framework. pean Union, 2016), and the recent California Consumer To accommodate such limited authority, we provide solutions on designing deletion-aware online linear re- Proceedings of the 24th International Conference on Artifi- gression algorithms, and discuss the harm due to the cial Intelligence and Statistics (AISTATS) 2021, San Diego, "constant memory limit" setting. Such a setting is California, USA. PMLR: Volume 130. Copyright 2021 by challenging for a general online statistical learning task the author(s). Online Forgetting Process for Linear Regression Models ✓[1,s] ✓[t s,t 1] − − b b x1 y1 x2 y2 …… x ys xs+1 ys+1 …… xt s yt s …… xt 1 yt 1 xt yt s − − − − time ✓[2,s+1] b Figure 1: Online Forgetting Process with constant memory limit s. since the "sample size" never grows to infinity but denotes the maximum value of vi, where i is in the stays a constant size along the whole learning process. indexes of negative coordinate. As an evil consequence, statistical efficiency is never d d For a positive semi-definite matrix Φ R ⇥ , λmin(Φ) improving as the time step grows due to the constant 2 0 denotes the minimum eigenvalue of Φ.Wedenote⌫ memory limit. Φ− as the generalized inverse of Φ if it satisfies the Our Contribution. We first investigate the online condition Φ=ΦΦ−Φ. The weighted 2-norm of vec- forgetting process in the ordinary linear regression. We tor x Rd with respect to positive definite matrix 2 find a new phenomenon defined as Rank Swinging Phe- Φ is defined by x = pxTΦx . The inner product k kΦ nomenon, which exists in the online forgetting process. is denoted by , and the weighted inner-product h· ·i If the deleted data can be fully represented by the is denoted by xTΦy = x, y .Foranysequence h iΦ data memory, then it will not introduce any regret. x 1 ,wedenotematrixx = x ,x ,...,x { t}t=0 [a,b] { a a+1 b} Otherwise, it will introduce extra regret to make this and x[a,b] = supi,j x[a,b] (i,j).Foranymatrixx[a,b], online forgetting process task not online learnable. The 1 b | | notation Φ[a,b] = t=a xtxt> represents a gram matrix rank swinging phenomenon plays such a role that it with constant memory limit s = b a +1 and notation indirectly represents the dissimilarity between the dele- b P − Φλ,[a,b] = xtxt> +λId d represents a gram matrix t=a ⇥ tion data and the new data to affect the instantaneous with ridge hyperparameter λ. regret. Besides, if the gram matrix does not have full P rank, it will cause the adaptive constant ⇣ to be un- stable and then the confidence ellipsoid will become 2StatisticalDataDeletionProblem wider. Taking both of these effects into consideration, At each time step t [T ], where T is a finite time the order of the FIFD-OLS’s regret will become linear 2 in time horizon T . horizon, the learner receives a context-response pair d zt =(xt,yt), where xt R is a d-dimensional context 2 The rank swinging phenomenon affects the regret scale and yt R is the response. The observed sequence of 2 and destabilizes the FIFD-OLS algorithm. Thus, to context xt t 1 are drawn i.i.d from a distribution of remedy this problem, we propose the FIFD-Adaptive { } ≥ d with a bounded support R and xt 2 L. Ridge to offset this phenomenon because when we add PX t 1 X⇢ k k Let D[t s:t 1] = zi i=−t s denote the data collected at − − { } − the regularization parameter, the gram matrix will have [t s, t 1] following the FIFD scheme. full rank. Different from using the fixed regularization − − We assume that for all t [T ],theresponsey is a parameter in the standard online learning model, we 2 t use the martingale technique to adaptively select the linear combination of context xt;formally, regularization parameter over time. y = x ,✓ + ✏ , (1) t h t ?i t [T ] Notation. Throughout this paper, we denote as d where ✓? R is the target parameter that sum- the set 1, 2,...,T . S denotes the number of ele- 2 { } | | marizes the relation between the context xt and re- ments for any collection S.Weuse x p to denote d k k sponse yt. The noise ✏t’s are drawn independently from the p-norm of a vector x R and x = supi xi . d 2 k k1 | | σ subgaussian distribution. That is, for every ↵ R, For any vector v R ,notation (v) i vi > 0 − 2 2 2 2 P ⌘{| } it is satisfied that E[exp(↵✏t)] exp(↵ σ /2). denotes the indexes of positive coordinate of v and (v) i vi < 0 denotes the indexes of negative Under the proposed FIFD scheme with a constant N ⌘{| } coordinate of v. (v) min vi i (v) denotes memory limit s,thealgorithm at time t only can Pmin ⌘ { | 2P } A the minimum value of vi, where i is in the indexes of keep the information of historical data from time step positive coordinate and (v) max vi i (v) t s to time step t 1 and then to make the prediction, Nmax ⌘ { | 2N } − − Yuantong Li, Chi-Hua Wang, Guang Cheng and previous data before time step t s 1 are not 3.1 FIFD-OLS Algorithm − − allowed to be kept and needs to be deleted or forgotten. The algorithm is required to make total T s number The FIFD-OLS algorithm uses the least square es- A − of predictions in the time interval [s +1,T]. timator based on the constant data memory from ˆ time window [t s, t 1],definedas✓[t s,t 1] = To be more precise, the algorithm first receives a con- 1 t 1 − − − − − A Φ[−t s,t 1] i=t s yixi . Then an incremental update text xt at time step t,andmakeapredictionbasedonly − − − ✓ˆ [t s, t 1] [t s+1,t] on the information in previous s time steps D[t s:t 1] formula for⇥ Pfrom time⇤ window to − − − − − ( D[t s:t 1] = s)andhencetheagentforgetstheinfor- is showed as follows.

Load more