Application of the Truncated Triangular and the Trapezoidal Distributions

Section on Survey Research Methods Application of the Truncated Triangular and the Trapezoidal Distributions Jay J. Kim ORM, NCHS, 3311 Toledo rd, Hyattsville, MD, 20782 Abstract them and investigate some of their properties in this paper. It has been suggested that the truncated triangular distribution be used for masking Multiplicative noise has the following form: microdata. The random variable which follows the truncated triangular distribution serves as a yxeiiii==,1,2, n, multiplicative noise factor. The natural candidate distribution is the one which is th where y is the masked variable for the i unit centered at 1, but truncated symmetrically i around 1, because it is not desirable to use such as person, household, establishment or some values which are too close to 1 due to governmental unit, xi is the corresponding un- confidentiality concerns. Instead of truncating masked variable and ei is the noise. the middle section of the distribution, we can assign low probabilities for that part of the Since noise is generated independent of the distribution. This procedure will render some original data, x and e are independent. Thus variations of the trapezoidal distribution. Two i i other approaches are to use the upside-down triangular distribution or parabloa for the Ey()= ExEe ()(). middle section of the distribution. In this paper, we will derive the probability density Usually E(e)= 1 is used, and thus function for each of the four distributions and investigate their properties. Ey()= Ex (). KEY WORDS: confidentiality, masking, Also truncated triangular distribution 22 Vy()=++ VxVe () ()µµ Vx () Ve (). 1 Introduction ex Since around 1980, the U.S. Energy Information Thus 2 Administration (EIA) has been using Vy()− µx Ve () Vx()= 2 (1) multiplicative noise for masking the number of Ve()+ µe heating and cooling days and marginal rates for utility, such as electricity and gas in their public Since the data disseminating agencies provide use micro data file from the Residential Energy the mean and variance of the noise distribution to Consumption Survey. Evans, et al (1998) the data users, users can estimate the variance of proposed use of multiplicative noise to mask characteristics from the masked data using the economic data. They considered noise which formula in equation (1). follows distributions such as normal and truncated normal distributions. Kim and Winkler 2. Noise Distributions. (2002, 2003) considered a noise distribution following the truncated normal distribution. 2.1 Triangular Distribution Massell and Russell (2006) proposed to use a truncated triangular distribution for masking Since a truncated triangular distribution is a economic data. In this paper, we will consider modified form of a triangular distribution, we noise that follows the truncated triangular, consider a triangular distribution first. modified trapezoidal, double triangular and parabola-filled truncated triangular distributions. As the above probability density functions are not available at this moment, we will derive 2723 Section on Survey Research Methods 2.2 Truncated Triangular Distribution When noise following a triangular distribution is used, using 1 for m is not desirable. This is because multiplication by 1 does not change the original value at all. That is, if the original value is multiplied by 1 for a unit, the unit does not get e any protection from disclosure. Even worse, the a m b probability of noise being 1 is the highest in this Figure 1. Triangular Distribution type of distribution. This implies that the largest number of units do not get protection. Thus Massell and Russell suggested using noise that follows the triangular distribution whose middle Triangular distribution of e, as shown above has portion, or the part near 1, is truncated. The the following form. truncated triangular distribution mentioned above is shown in Figure 2. ⎧ 2 ea− ⎪ , aem≤< ⎪bama−− fe()= ⎨ (2) ⎪ 2 be− , meb≤< ⎩⎪babm−− Note that m is the mode of this distribution. If it is a symmetric distribution around m, m is also the mean and median of the variable e. If the distribution is symmetric around m, i.e., ma−=− bm then equation (2) reduces to e m cd ab ⎧ ea− ⎪ , aem≤< Figure 2. Truncated Triangular Distribution ⎪()ma− 2 fe()= ⎨ (3) ⎪ be− , meb≤< ⎩⎪()bm− 2 The expected value of e is Suppose the distribution is truncated at b and c , cb> , as shown above, which are around m. mab++ One of the vertices, denoted by b in equation (2), Ee()= (4) is now denoted by d. In this case, the probability 3 density function (pdf) of e has the following form: Suppose the triangular distribution is symmetric around m, i.e., ma bm or ab2 m, −=− += ⎧ 2(ea− ) then the above expectation reduces to m. ⎪ , aem≤< ⎪km()()−− a d a fe()= ⎨ (7) The variance of e is ⎪ 2(de− ) , med≤< ⎩⎪kd()()−− m d a b222−++ abam − mab() + Ve()= . (5) 18 where k needs be determined. If ma−=− bm or ab+=2 m, then the above Now variance reduces to bd2(ea−− ) 2( de ) ∫∫de+ de ()bm− 2 ackmada()()−− kdmda ()() − − Ve()= (6) 6 2724 Section on Survey Research Methods ()ba−−22 () dc symmetric triangular distribution with m=1 will = + . (8) provide an unbiased mean of the original data. kmada()()()()−− kdmda − − Example 1. In order for the function in equation (7) to become a pdf, equation (8) needs to be 1. Thus If a = 0.5, d = 1.5, m = 1, b = 0.75 and c = 1.25, then E(e) without truncation is ()()()()dmba−−+−−22 madc k = ()()()damadm−−− 1++ 0.75 1.25 Ee()== 1. 3 Using the above k, the truncated triangular distribution of e, truncated at b and c is, The expected value of e after the truncation is ⎧ 2(dm− ) Ee()= 1, since m = 1. ⎪ 22(),ea− ⎪()()()()ba−−+−− dm dc ma ⎪ for a≤< e b As shown in the formulas, they are the same. fe()= ⎨ ⎪ 2(ma− ) (),de− The variance of e of the truncated triangular ⎪()()()()ba−−+−−22 dm dc ma ⎪ distribution is, ⎩ forced≤< (9) 1 Ve()= ⎡⎤222 If the triangular distribution is symmetric around 18⎣⎦()()()()ba−−+−− dm dc ma m and the truncation is also symmetric around m, 62 62 then equation (9) can be simplified as follows: {()()()()ba−−+−− dm dc ma 22 2 +−()()()()()ba dmdc − − ma −⎣⎡ 3 ba + ⎧ ea− ⎪ , aeb≤< 2 22 ⎪()dc− 2 ++()32cd +() a + d − 422()() ba + cd +⎦⎤} (13) fe()= ⎨ (10) ⎪ de− , ced≤< ⎩⎪ ()dc− 2 If the truncation is symmetric around m and the triangular distribution is also symmetric around The expected value of e without assuming m, then symmetry of the distribution and truncation is 22 2 16mc+−++ 8 md 2( d 2 dc 3 c ) Ve()=− m (14) ()()(2)dmba−−2 ba + 12 Ee()= 22 3[(ba−−+−− ) ( dm ) ( dc ) ( ma )] Example 2. Using the data in Example 1, V(e) without truncation is .04167, but after truncation, ()()(2)madc−−2 cd + + (11) it is .1146. The variance of e after truncation is 3[(ba−−+−− )22 ( dm ) ( dc ) ( ma )] 2.75 times that of the variable e without truncation. Suppose again that the triangular distribution is symmetric around m and the mid-section of the The variance of the truncated triangular distribution is truncated symmetrically around m. distribution is greater, since the values around Then the mean, which have the highest frequencies, are not there anymore. E(e)=m (12) 2.3 Trapezoidal Distribution Thus, the mean of the symmetrically truncated triangular distribution, where the mid-section of If we connect the top two points in the truncated the distribution is symmetrically truncated, is the part of the truncated triangular distribution, we same as that of the un-truncated triangular have a trapezoidal distribution. Instead of not distribution. Consequently, the data masked by allowing noise (e) to have any positive noise following the symmetrically truncated probability between b and c, we can allow some 2725 Section on Survey Research Methods small (equal) probability. In that case, we have a Suppose the trapezoidal distribution is modified trapezoidal distribution, which is symmetric around m, then Ex()= m . shown in Figure 4. To derive the probability density function for the modified trapezoidal 2.4 Modified Trapezoidal Distribution distribution, we start with the trapezoidal distribution in Figure 3. When numbers around 1 are multiplied for masking records, it is better to avoid using 1 or numbers too close to 1. This is the reason why the truncated normal and triangular distributions are considered as candidates for a noise distribution, where truncation is done very near to 1. An alternative approach to the truncated distribution is using a distribution which allows a small probability, instead of a zero probability, in the mid-section of the noise distribution. The shape of the noise distribution then will have a variation of the trapezoidal distribution (see a b m c d Figure 4 below), where there is no connectivity Figure 3. Trapezoidal Distribution in the two highest points and the mid-section is dropped down. Let the trapezoidal distribution have the following form. ⎧ 2 k ⎪ 1 (),ea−≤< aeb ⎪()()daba−− ⎪ ⎪ ⎪ 2 k fe()=≤<⎨ 2 , b e c (15) da ⎪ − e ⎪ a b m c d ⎪ Figure 4. Modified Trapezoidal Distribution ⎪ 2 k3 (),de−≤< ced ⎩⎪()()dadc−− Depending on values of k’s, the height of top line k changes. Suppose kkk13== and k2 = . Then q da− If f (b) = f (c), kkk=== , 123dcba ⎧ 2kea− +−− ⎪ , aeb≤< ⎪daba−− then the distribution has the following form: ⎪ 2k fe()=≤<⎨ , b e c (17) ⎪ ()daq− ⎧ 2 ea− ⎪ ⎪ , aeb≤< 2kde− dcbaba+−− − ⎪ , ced≤< ⎪ ⎩dadc−− ⎪ 2 fe()=≤<⎨ , b e c (16) dcba ⎪ +−− 2k b ⎪ 2 de− ∫∫f() e de=− ( e a ) de + ⎪ , ced≤< ()()daba−−a ⎩dcbadc+−− − Ω 22kkcd The above is the same form as given by van ∫∫1()de+− d e de (18) Dorp and Kotz. qd()−−− abc ()() d a d c 2726 Section on Survey Research Methods The equation above needs to sum up to 1 in order for the function in equation (17) to be a probability density function.

Application of the Truncated Triangular and the Trapezoidal Distributions

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support