Symposium on Engineering and Operation Excellence Through Technology Innovation
May 19, 2017
How did Facebook grow? Did Twitter, LinkedIn, WeChat and others grow in the same way? — A Study of Data and a Universal Growth Law
Prof. Michael Tse The Hong Kong Polytechnic University https://www.webpagefx.com/internet-real-time/
The estimated total amount of data in the world was 4.4 zettabytes* (1021) in 2013. That is set to rise steeply to 44 zettabytes by 2020.
Everyday, we produce 2.5 exabytes (1018) of data.
______http://www.northeastern.edu/levelblog/2016/05/13/ how-much-data-produced-every-day/ Every day hundreds of millions of people take photos, make videos and send texts. Across the globe businesses collect data on consumer preferences, *1 zettabyte = 44 trillion gigabytes purchases and trends. At 23:55 April 29, 2017 ❖ 7,595 Tweets sent in 1 second ❖ 779 Instagram photos uploaded in 1 second ❖ 1,235 Tumblr posts in 1 second ❖ 2,563 Skype calls in 1 second ❖ 44,161 GB of Internet traffic in 1 second ❖ 60,038 Google searches in 1 second ❖ 69,054 YouTube videos viewed in 1 second ❖ 2,584,903 Emails sent in 1 second 3 http://www.internetlivestats.com/ What are people searching on right now? “Data by itself is useless. Data is only useful if you apply it.”
–Todd Park Former Chief Technology Officer of the United States and Technology Advisor for U.S. President Barack Obama
5 Our purpose
❖ To turn data into information. ❖ To apply the science of networks to help make sense out of the data collected.
❖ Data Analytics [the process of examining datasets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software]
http://www.infinitdatum.com/
6 A glimpse at our process
Model Parameter estimation
INFORMATION
7 Growth Data
8 Do you drink coffee?
x = number of coffee drinkers
Everyone is either a prospective coffee drinker OR a coffee drinker 0 100 200 Time (days)
N = whole population 9 Number of monthly active Facebook users worldwide iPhone 7 Pace of Adoption of Products
12 Smartphone apps growth
www.6mobiles.com
13 EV growth in Hong Kong
5000
3000
1000
1/2012 1/2013 1/2014 1/2015 1/2016 http://www.charged.hk/node/99?language=zh-hant 14 What science tells us: Universal Law
x is any growth data you want to study
dx =(C x + C ) (N x) dt 1 2 ⇥ Entire population
How fast x grows Effect of human Effect of personal interaction (network) decision on growth, on growth, i.e., i.e., promotion, word-of-mouth, advertisement, customer service branding How do we get this?
16 The Basis: Network
link node
Specific problem can be formulated by proper definitions of nodes and links • Parameters: • number of nodes = n • average number of links per node = k • average shortest path lengths between nodes = L • other properties studied in network science 17 “Transition” Network
❖ Each node assumes a state. ❖ There are more than one possible states a node can assume. ❖ The state of a node changes, as influenced by others connected to it, and/or other external factors.
18 The problem is similar to
behaviour spreading problem
Key feature: mutual interaction / peer influence / word of mouth
19 Classic Study: Epidemic Spreading
❖ Red, gray and yellow nodes represent infected I, susceptible S and recovered R, respectively. ❖ Disease spreads through the links [interactions].
The SIR model
M. Small, C. K. Tse, and D. M. Walker, “Super-spreader and the rate of transmission of the SARS virus”, Physica D, 215 (2006) 146-158. M. Small, D. M. Walker and C. K. Tse, “Scale-free distribution of avian influenza outbreaks”, PRL 99, 188702 (2007)
20 For adoption of products and services, however,
personal choice plays an important role in additional to peer influence (word of mouth).
21 User Growth Model
❖ Consider a social network with N nodes. Each node represents an individual who may assume one of two states: U (user) or P (prospective user).
❖ Links mimic relationship among the individuals, e.g., they are friends, relatives, family members, etc.
❖ Each link connecting a user and a prospective user is a connection along which a node can transit from state P into U.
22 Transition Channel: Word of Mouth
! c ! ❖ Transition channel: T : P +U !!→U +U.
❖ Transition rate c1
U U c1 Word of mouth: a prospective user is influenced by other P U users
23 Transition Channel: Personal Choice
c2 ❖ T :(P ) (U) Transition channel: 2 !
❖ Transition rate c2
P P P P Personal choice: c2 a prospective user P U makes his own decision
24 UserU Growth Transition
P P
P P P P P P P P
U influenced U P U transition U (word of mouth) P P U P U P P P P P P P P P P P
self transition U (personal choice)
P U user
P P P P P prospective user U U connection of mutual influence P U P P P
U P P 25 A bit of high-school math (just 2 pages)
❖ Number of users = X at time t
❖ Probability of a person making a transition in the next ∆t time = c1 t or c2 t
❖ The first math question we are asking:
What is the probability that there are n users in the next ∆t time if we know the conditional probability given there are m users now?
n 1 N P (t + t)= P (t) P (t, t)+P (t) 1 P (t, t) n m ⇥ n,m n ⇥ m,n m=1 m=n+1 ! X 26 X Change of time scale
❖ Of course, ∆t can be made very small! The smallest is when there is exactly one transition in a ∆t.
❖ So, the probability becomes much simpler:
Pn(t + t)=Pn 1(t) Pn,n 1(t, t)+Pn(t) (1 Pn+1,n(t, t)) ⇥ ⇥ ❖ In continuous time, it is dE[X(t)] var(X(t)) = N↵c E[X(t)] ↵c 1+ (E[X(t)])2 + c N c E[X(t)] dt 1 1 (E[X(t)])2 2 2 ✓ ◆
E(X) = Expectation of user population
27 Universal Growth Equation
x˙(t)= ↵c Nx ↵c (1 + (x))x2 +(c N c x) 1 1 2 2 Word of mouth Personal choice where (x) = var[X(t)]/E[X(t)]2 1 for| large{z networks.} | {z ⌧} x˙(t) (↵c x + c ) (N x) ⇡ 1 2 ⇥ Purchase incentive Market size c1 = word-of-mouth growth rate ↵ = k /N | {z } | h{zi } c2 = personal-choice growth rate a network property I did not mention 28 Data fitting
29 22 Datasets
❖ Facebook: One of the most popular online social networking (OSN) service
❖ LinkedIn: A social network website for people in professional occupations and focused on work relationships.
❖ Evernote: A suite of software and services designed for notetaking and archiving.
❖ Tencent QQ: Popularly known as QQ, is an instant messaging software service developed by Tencent Holding Limited
❖ Twitter: An online social networking (OSN) and microblogging service that enables users to send and read “tweets”.
❖ US Hospital account on Youtube: Some researchers count the number of account of US hospital on Youtube.
❖ Line: A Japanese proprietary application for instant messaging on smartphones and PCs.
❖ WeChat: A mobile text and voice messaging communication service developed by Tencent in China.
❖ Kakao Talk: A free mobile messenger application for smartphones with free text and free call features.
❖ World of Warcraft: One of the most popular MMORPG created by Blizzard Entertainment.
❖ Sina Weibo: One of the most popular Chinese microblogging websites.
❖ Tencent Weibo: Another Chinese microblogging website launched by Tencent in Apr/ 2010.
❖ AND MORE … 30 Fitting of datasets
7 8 8 6 x 10 x 10 x 10 x 10 18 12 3 9 Facebook 16 LinkedIn 8 10 QQ 2.5 Car sales in China 14 7 Facebook (model) LinkedIn (model) 12 8 2 6 QQ (model) Car sales in China (model) 10 5 6 1.5 8 4
6 4 1 3
4 2 2 0.5 2 1
0 0 0 0 Feb/01 Oct/03 Jun/06 Feb/09 Oct/11 Jun/14 Fitting of datasets
5 5 10 x 10 x 10 x 10 10 10 6 Available iOS apps 9 9 Download iOS apps Available Android apps 5 8 8 iOS app (model) Download iOS apps (model) 7 7 Android apps (model) 4 6 6
5 5 3
4 4 2 3 3
2 2 1 1 1
0 0 0 Sep/08 Oct/09 Nov/10 Nov/11 Dec/12 Jan/14 Fitting of datasets
8 7 8 5 x 10 x 10 x 10 x 10 5 10 2 18 Kakao Talk Line 4.5 9 1.8 16 Wechat 4 8 Plug−in car sales 1.6 14 Kakao Talk (model) 3.5 7 Line (model) 1.4 12 Wechat (model) 3 6 Plug−in car sales (model) 1.2 10 2.5 5 1 8 2 4 0.8 6 1.5 3 0.6 4 1 2 0.4
0.5 1 0.2 2
0 0 0 0 Jan/10 Dec/10 Oct/11 Sep/12 Jul/13 Jun/14 Fitting of datasets
7 7 7 7 x 10 x 10 x 10 x 10 4 16 4.5 3.5
4 3.5 14 3
3.5 3 12 2.5 3 2.5 10 2.5 2 2 8 2 PS2 sales 1.5 1.5 6 GBA sales 1.5 GameCube sales 1 1 4 XBOX sales 1 PS2 (model) GBA (model) 0.5 0.5 2 0.5 GameCube (model) XBOX (model) 0 0 0 0 Mar/01 Apr/03 Jun/05 Jul/07 Aug/09 Sep/11 Fitting of datasets
6 6 8 x 10 x 10 x 10 3 8 6 Evernote Sina Weibo 7 2.5 TC Weibo 5 Evernote (model) 6 Sina Weibo (model) TC Weibo (model) 2 4 5
1.5 4 3
3 1 2
2
0.5 1 1
0 0 0 Feb/08 Feb/09 Feb/10 Feb/11 Feb/12 Feb/13 Fitting of datasets
8 x 10 2 140 Alipay 1.8 Hpl. Youtube AC Alipay (model) 120 1.6 Hpl. Youtube AC (model)
1.4 100
1.2 80 1 60 0.8
0.6 40
0.4 20 0.2
0 0 Jun/05 May/06 Mar/07 Jan/08 Nov/08 Sep/09 What’s behind the data?
37 Implication for business
How can we use this model?
Finding N, c1, c2 from historical growth data.
These parameters have profound meanings to business.
38 Parameter estimation problem
Model Parameter estimation
Parameters are
39 INFORMATION ZU064-05-FPR user-growth˙2015˙07˙22˙blinded 23 July 20158:47
Network Science 9
Table 1. Estimated model parameters from fitting of historical datasets. Value is fractional −3 number of transitions to users per day, e.g., c1 = 2.25 × 10 means that 2.25 out of 1000 prospective users become users per day.
Data Market size Macro growth Micro growth Growth rate by NRMSE (N) rate by word of rate by word of personal choice (%) c1⟨k⟩ mouth (c ⟨k⟩) (c ) mouth ( N ) 1 2
Facebook 1.29×109 1.77×10−12 2.27×10−3 2.25×10−5 5.22 LinkedIn 4.42×108 3.63×10−12 1.60×10−3 7.83×10−6 4.42 Tecent QQ 2.70×108 5.60×10−12 1.51×10−3 9.33×10−13 11.6 Evernote 1.35×108 2.40×10−11 3.23×10−3 2.31×10−6 5.09 Line 3.87×108 1.29×10−11 4.99×10−3 1.44×10−4 3.63 Kakao Talk 3.86×108 4.03×10−12 1.56×10−3 1.26×10−4 3.27 Playstation 2 1.72×108 2.67×10−12 4.60×10−4 3.19×10−4 3.80 Gameboy 4.38×107 4.66×10−11 2.04×10−3 8.11×10−4 6.93 Advantage SP XBOX 2.98×107 4.52×10−11 1.34×10−3 4.36×10−4 8.18 Nintendo 2.22×107 1.20×10−10 2.66×10−3 4.43×10−30 9.86 GameCube Available iOS apps 1.66×106 8.67×10−10 1.44×10−3 1.24×10−4 8.72 Downloaded iOS 1.02×1011 2.32×10−14 2.36×10−3 3.48×10−5 7.33 apps Available Android 1.36×106 2.14×10−9 2.91×10−3 1.09×10−4 5.50 apps Sales of US brand 3.32×107 2.27×10−11 7.56×10−4 1.58×10−5 2.48 cars in China Sales of plug-in 6.73×106 1.63×10−10 1.10×10−3 8.13×10−5 3.84 vehicles in US US hospital 1.39×102 4.30×10−5 5.93×10−3 1.38×10−4 4.62 accounts on Youtube Wechat 6.16×108 8.99×10−12 5.53×10−3 8.76×10−5 2.54 Sina Weibo 5.95×108 7.06×10−12 4.19×10−3 1.23×10−4 9.80 Tencent Weibo 3.63×108 2.36×10−11 8.57×10−3 2.37×10−25 12.4 Alipay 4.24×108 7.38×10−12 3.12×10−3 3.01×10−26 7.20 Internet users 4.93×109 8.91×10−12 4.39×10−4 2.42×10−5 6.47 40
markets and attracting users from different communities. Some datasets contain over a decade of user growth information. For instance, Tencent, 13yearsago,hadonlyseveral servers and was valued less than 50,000 US dollars. Now, Tencent’s market value rose to 101 billion dollars. Facebook, as another example, began with only admitting members from the student body of Harvard, and it has now more than 1100 million users worldwide. In the past decade, many factors have contributed to the business growth of online products and services. However, their user growth profiles are all governed by a growth equation. The estimated parameters of the 21 products or services are given in Table 1. Let us take alookatc1 ⟨k⟩ and c2,whichisthemicro-levelgrowthratecorrespondingtoinfluenced Implication to business strategy
❖ Customer service ❖ Word of mouth
❖ Promotion and ❖ Personal choice advertisement
❖ Effective market ❖ Market potential
41 What the model tells us?
❖ Combined incentive: C = ↵c1x + c2 ❖ Effective (remaining) market: N x ❖ Rate of growth of user population:
x˙(t) (↵c x + c ) (N x) ⇡ 1 2 ⇥ Purchase incentive Market size | {z } | {z }
42 We saw tremendous growth in Wii, having Nintendo Wii is sold 1.5 million around Christmas time, significantly stronger! while only 1 million for XBOX and PS3. C = ↵c x + c (a) Console worldwide total cummulatvie sales in 2010 1(b) Combined2 incentive rate C(t) 90 0.01 Estimated XBOX360 sales XBOX360 Estimated PS3 sales Playstation 3 0.008 Estimated Wii sales Nitendo Wii Xbox360 sales 80 PS3 sales 0.006 Wii sales 0.004
70 0.002
0 Feb10 Apr10 Jun10 Aug10 Oct10 Dec10 Feb11
60
2010 (c) Console worldwide weekly sales in 2010 1.5 Sales (in million) Estimated XBOX360 sales Estimated PS3 sales 50 Estimated Wii sales 1 Xbox360 sales PS3 sales Wii sales 40 0.5 Sales (in million)
30 0 Feb10 Apr10 Jun10 Aug10 Oct10 Dec10 Feb11 Feb10 Apr10 Jun10 Aug10 Oct10 Dec10 Feb11
43 Wii is still very strong but its growth shrank!
C = ↵c1x + c2 (a) XBOX360 worldwide total cummulatvie sales in 2012 (d) Combined incentive rate C(t) 80 0.01 Estimated XBOX360 sales XBOX360 75 Playstation 3 Xbox360 sales 0.008 Nitendo Wii 70 0.006 65 Sales (in million) 60 0.004 Jan12 Mar12 May12 Jul12 Sep12 Nov12 Jan13
(b) PS3 worldwide total cummulatvie sales in 2012 0.002 80 Estimated PS3 sales 0 75 PS3 sales Feb12 Apr12 Jun12 Aug12 Oct12 Dec12 Jan13
70
65 (e) Console worldwide weekly sales in 2012
Sales (in million) 1.4
2012 60 Estimated XBOX360 sales Feb12 Apr12 Jun12 Aug12 Oct12 Dec12 Feb13 1.2 Estimated PS3 sales Estimated Wii sales 1 (c) Wii worldwide total cummulatvie sales in 2012 Xbox360 sales 100 Estimated Wii sales 0.8 PS3 sales Wii sales Wii sales 98 0.6
Sales (in million) 0.4 96 0.2 Sales (in million) 94 Feb12 Apr12 Jun12 Aug12 Oct12 Dec12 Jan13 0 Feb12 Apr12 Jun12 Aug12 Oct12 Dec12 Jan13
44 Why?
In 2012
❖ N for Wii, XBOX and PS3 ≈ 100, 104 and 106 million
❖ N – x for Wii ≈ only 8 million in 2012.
❖ N – x for XBOX and PS3 ≈ 30 million and 36 million, respectively.
Nintendo knew the Wii market was small and made a decision to launch Wii U!
45 ZU064-05-FPR user-growth˙2015˙07˙22˙blinded 23 July 20158:47
12
70 80 c2 = c1 ⟨k⟩ −4 c2 =10 c1 ⟨k⟩ 70 80 60
70 60 50 60 50 50 40 40 40
30 30 30 Stetting time (year) Lifespan time (year) 20 20 10 20
0 0 10 −2 10 −10 −2.5 Application:−20 −3 growth span of products −3.5 0 −30 −4 −4 −3.8 −3.6 −3.4 −3.2 −3 −2.8 −2.6 −2.4 −2.2 −2 log (c ) c k 10 2 log10( 1 ⟨ ⟩) c k log10( 1 ⟨ ⟩) −4 −2 −30 −2 Fig. 4. Lifespan of products versus c1 ⟨k⟩∈[10 10 ] and c2 ∈ [10 10 ] (left panel). Lifespan −4 −2 −4 of products for 10 ≤ c1 ⟨k⟩≤10 and 10 c1 ⟨k⟩≤c2 ≤ c1 ⟨k⟩.Greenareaisthedesirableregion ❖forWhat lifespan of 5 tois 10 the years (right time panel). needed to reach 95% of the final the(saturated) micro-level word-of-mouth user component population? of the growth rate varies very little, while the personal-choice component varies widely from product to product. Aquestionarisesatthispoint.Whydothe21datasetshavesuch consistent (similar) −2.8 −4 ❖word-of-mouthcomponentCan we have of thesome growth rate quantitativec1 ⟨k⟩ (all being ranged from indicators 10 to 10 )? for success? Here, we borrow the concept of settling time from control theory, which is used for analysis of the dynamic property of a system (Phillips and Habor, 1995). Specifically, we define the ❖lifespanAnyof ahints product or from service as thedata? time required for its user population to grow from a small initial value x(t0)=αN at t = t0 to reach its final value, namely, x(te)=βN at t = te, within a small tolerance range (e.g., 5%) around x(te) for the first time. The time duration Ts = te −t0 is the lifespan of a product. Thus, the lifespan is basically the settling time commonly used in control theory, and in our case, if the user network is homogeneous and uncorrolated network, and assuming ❖ Growth span of a product or service is δ ≈ 0, the lifespan of a product is given by
log(1 − α)+log(βc1 ⟨k⟩ + c2) − log(1 − β) − log(αc1 ⟨k⟩ + c2) Ts(c1 ⟨k⟩,c2) ≈ (12) c1 ⟨k⟩ + c2 The derivation of the above result is given in the Appendix. Inthispaper,wesetα = 0.02 and β = 0.95. Figure 4 shows the lifespan of a product versus c1 ⟨k⟩ and c2.Notethatinreality,a successful product or new industry always takes several yea46 rs to develop from the intro- duction stage to the maturity stage. Here, we assume that the lifespan of a product is about 5to10years(i.e.,5≤ Ts ≤ 10) with the personal-choice component of the growth rate given as c2 = λc1 ⟨k⟩,whereλ is the ratio of the personal-choice component compared to the word-of-mouth component of the growth rate. Here, we set 10−4 ≤ λ ≤ 1. In particular, λ = 1meansthatpersonalchoiceisasimportantaswordofmouth,while λ = 10−4 means that the impact of personal choice is negligible. Figure 4 shows that with a constant λ, the smaller the value of c2,thelongerthelifespanofaproduct.Thegreenareainthe left panel of Figure 4 is the desirable region with lifespan Ts ranging from 5 to 10 years (black segment in the y-axis). Note that under this condition, the suitable range of c1 ⟨k⟩ is about 10−3.8 to 10−2.8 (black segment in the x-axis), which agrees with the result shown in ZU064-05-FPR user-growth˙2015˙07˙22˙blinded 23 July 20158:47
Growth12 span (5-10 years for successful products)
70 80 c2 = c1 ⟨k⟩ −4 c2 =10 c1 ⟨k⟩ 70 80 60
70 60 50 60 50 50 40 40 40
30 30 30 Stetting time (year) Lifespan time (year) 20 20 10 20
0 0 10 −2 10 −10 −2.5 −20 −3 −3.5 0 −30 −4 −4 −3.8 −3.6 −3.4 −3.2 −3 −2.8 −2.6 −2.4 −2.2 −2 log (c ) c k 10 2 log10( 1 ⟨ ⟩) c k log10( 1 ⟨ ⟩) −4 −2 −30 −2 Fig. 4. Lifespan of products versus c1 ⟨k⟩∈[10 10 ] and c2 ∈ [10 10 ] (left panel). Lifespan −4 −2 −4 of products for 10 ≤ c1 ⟨k⟩≤10 and 10 c1 ⟨k⟩≤c2 ≤ c1 ⟨k⟩.Greenareaisthedesirableregion for lifespan of 5 to 10 years (right panel).
Nowthe micro-level check the word-of-mouth table again! component You of thewill growth see rateamazingvaries very consistency! little, while the personal-choice component varies widely from product to product. Aquestionarisesatthispoint.Whydothe21datasetshavesu47 ch consistent (similar) −2.8 −4 word-of-mouthcomponent of the growth rate c1 ⟨k⟩ (all being ranged from 10 to 10 )? Here, we borrow the concept of settling time from control theory, which is used for analysis of the dynamic property of a system (Phillips and Habor, 1995). Specifically, we define the lifespan of a product or service as the time required for its user population to grow from a small initial value x(t0)=αN at t = t0 to reach its final value, namely, x(te)=βN at t = te, within a small tolerance range (e.g., 5%) around x(te) for the first time. The time duration Ts = te −t0 is the lifespan of a product. Thus, the lifespan is basically the settling time commonly used in control theory, and in our case, if the user network is homogeneous and uncorrolated network, and assuming δ ≈ 0, the lifespan of a product is given by
log(1 − α)+log(βc1 ⟨k⟩ + c2) − log(1 − β) − log(αc1 ⟨k⟩ + c2) Ts(c1 ⟨k⟩,c2) ≈ (12) c1 ⟨k⟩ + c2 The derivation of the above result is given in the Appendix. Inthispaper,wesetα = 0.02 and β = 0.95. Figure 4 shows the lifespan of a product versus c1 ⟨k⟩ and c2.Notethatinreality,a successful product or new industry always takes several years to develop from the intro- duction stage to the maturity stage. Here, we assume that the lifespan of a product is about 5to10years(i.e.,5≤ Ts ≤ 10) with the personal-choice component of the growth rate given as c2 = λc1 ⟨k⟩,whereλ is the ratio of the personal-choice component compared to the word-of-mouth component of the growth rate. Here, we set 10−4 ≤ λ ≤ 1. In particular, λ = 1meansthatpersonalchoiceisasimportantaswordofmouth,while λ = 10−4 means that the impact of personal choice is negligible. Figure 4 shows that with a constant λ, the smaller the value of c2,thelongerthelifespanofaproduct.Thegreenareainthe left panel of Figure 4 is the desirable region with lifespan Ts ranging from 5 to 10 years (black segment in the y-axis). Note that under this condition, the suitable range of c1 ⟨k⟩ is about 10−3.8 to 10−2.8 (black segment in the x-axis), which agrees with the result shown in The growth equation is applicable to general growth data! We perceive applications in Data driven business strategy formulation Population estimation Study of growth and behaviour spreading
http://www.cio.com/article/2462414/big-data/why-analytics-makes-tesla-better-than-jaguar.html
48 Applications of data analytics will continue to proliferate. However, real-world data are often deceptive when viewed partially or interpreted with inappropriate tools.
“Partial truth is not truth.” — Prof. Leon Chua
Applying the right scientific methods and using sufficient and right sets of data are often essential for generating information.
49 M.C. Escher, Dutch graphic artist (1898-1972) 50