<<

Symposium on Engineering and Operation Excellence Through Technology Innovation

May 19, 2017

How did grow? Did , LinkedIn, WeChat and others grow in the same way? — A Study of Data and a Universal Growth Law

Prof. Michael Tse The Hong Kong Polytechnic University https://www.webpagefx.com/internet-real-time/

The estimated total amount of data in the world was 4.4 zettabytes* (1021) in 2013. That is set to rise steeply to 44 zettabytes by 2020.

Everyday, we produce 2.5 exabytes (1018) of data.

______http://www.northeastern.edu/levelblog/2016/05/13/ how-much-data-produced-every-day/ Every day hundreds of millions of people take photos, make videos and send texts. Across the globe collect data on consumer preferences, *1 zettabyte = 44 trillion gigabytes purchases and trends. At 23:55 April 29, 2017 ❖ 7,595 Tweets sent in 1 second ❖ 779 photos uploaded in 1 second ❖ 1,235 posts in 1 second ❖ 2,563 calls in 1 second ❖ 44,161 GB of traffic in 1 second ❖ 60,038 searches in 1 second ❖ 69,054 YouTube videos viewed in 1 second ❖ 2,584,903 sent in 1 second 3 http://www.internetlivestats.com/ What are people searching on right now? “Data by itself is useless. Data is only useful if you apply it.”

–Todd Park Former of the and Technology Advisor for U.S. President Barack Obama

5 Our purpose

❖ To turn data into information. ❖ To apply the science of networks to help make sense out of the data collected.

❖ Data Analytics [the process of examining datasets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software]

http://www.infinitdatum.com/

6 A glimpse at our process

Model Parameter estimation

INFORMATION

7 Growth Data

8 Do you drink coffee?

x = number of coffee drinkers

Everyone is either a prospective coffee drinker OR a coffee drinker 0 100 200 Time (days)

N = whole population 9 Number of monthly active Facebook users worldwide iPhone 7 Pace of Adoption of Products

12 Smartphone apps growth

www.6mobiles.com

13 EV growth in Hong Kong

5000

3000

1000

1/2012 1/2013 1/2014 1/2015 1/2016 http://www.charged.hk/node/99?language=zh-hant 14 What science tells us: Universal Law

x is any growth data you want to study

dx =(C x + C ) (N x) dt 1 2 ⇥ Entire population

How fast x grows Effect of human Effect of personal interaction (network) decision on growth, on growth, i.e., i.e., promotion, word-of-mouth, advertisement, customer service branding How do we get this?

16 The Basis: Network

link node

Specific problem can be formulated by proper definitions of nodes and links • Parameters: • number of nodes = n • average number of links per node = k • average shortest lengths between nodes = L • other properties studied in network science 17 “Transition” Network

❖ Each node assumes a state. ❖ There are more than one possible states a node can assume. ❖ The state of a node changes, as influenced by others connected to it, and/or other external factors.

18 The problem is similar to

behaviour spreading problem

Key feature: mutual interaction / peer influence / word of mouth

19 Classic Study: Epidemic Spreading

❖ Red, gray and yellow nodes represent infected I, susceptible S and recovered R, respectively. ❖ Disease spreads through the links [interactions].

The SIR model

M. Small, C. K. Tse, and D. M. Walker, “Super-spreader and the rate of transmission of the SARS virus”, Physica D, 215 (2006) 146-158. M. Small, D. M. Walker and C. K. Tse, “Scale-free distribution of avian influenza outbreaks”, PRL 99, 188702 (2007)

20 For adoption of products and services, however,

personal choice plays an important role in additional to peer influence (word of mouth).

21 User Growth Model

❖ Consider a with N nodes. Each node represents an individual who may assume one of two states: U (user) or P (prospective user).

❖ Links mimic relationship among the individuals, e.g., they are friends, relatives, family members, etc.

❖ Each link connecting a user and a prospective user is a connection along which a node can transit from state P into U.

22 Transition Channel: Word of Mouth

! c ! ❖ Transition channel: T : P +U !!→U +U.

❖ Transition rate c1

U U c1 Word of mouth: a prospective user is influenced by other P U users

23 Transition Channel: Personal Choice

c2 ❖ T :(P ) (U) Transition channel: 2 !

❖ Transition rate c2

P P P P Personal choice: c2 a prospective user P U makes his own decision

24 UserU Growth Transition

P P

P P P P P P P P

U influenced U P U transition U (word of mouth) P P U P U P P P P P P P P P P P

self transition U (personal choice)

P U user

P P P P P prospective user U U connection of mutual influence P U P P P

U P P 25 A bit of high-school math (just 2 pages)

❖ Number of users = X at time t

❖ Probability of a person making a transition in the next ∆t time = c1t or c2t

❖ The first math question we are asking:

What is the probability that there are n users in the next ∆t time if we know the conditional probability given there are m users now?

n 1 N P (t + t)= P (t) P (t, t)+P (t) 1 P (t, t) n m ⇥ n,m n ⇥ m,n m=1 m=n+1 ! X 26 X Change of time scale

❖ Of course, ∆t can be made very small! The smallest is when there is exactly one transition in a ∆t.

❖ So, the probability becomes much simpler:

Pn(t + t)=Pn 1(t) Pn,n 1(t, t)+Pn(t) (1 Pn+1,n(t, t)) ⇥ ⇥ ❖ In continuous time, it is dE[X(t)] var(X(t)) = N↵c E[X(t)] ↵c 1+ (E[X(t)])2 + c N c E[X(t)] dt 1 1 (E[X(t)])2 2 2 ✓ ◆

E(X) = Expectation of user population

27 Universal Growth Equation

x˙(t)= ↵c Nx ↵c (1 + (x))x2 +(c N c x) 1 1 2 2 Word of mouth Personal choice where (x) = var[X(t)]/E[X(t)]2 1 for| large{z networks.} | {z ⌧} x˙(t) (↵c x + c ) (N x) ⇡ 1 2 ⇥ Purchase incentive Market size c1 = word-of-mouth growth rate ↵ = k /N | {z } | h{zi } c2 = personal-choice growth rate a network property I did not 28 Data fitting

29 22 Datasets

❖ Facebook: One of the most popular online social networking (OSN) service

❖ LinkedIn: A social network for people in professional occupations and focused on work relationships.

❖ Evernote: A suite of software and services designed for notetaking and archiving.

❖ Tencent QQ: Popularly known as QQ, is an instant messaging software service developed by Tencent Holding Limited

❖ Twitter: An online social networking (OSN) and service that enables users to send and read “tweets”.

❖ US Hospital account on Youtube: Some researchers count the number of account of US hospital on Youtube.

❖ Line: A Japanese proprietary application for instant messaging on smartphones and PCs.

❖ WeChat: A mobile text and voice messaging communication service developed by Tencent in .

❖ Kakao Talk: A free mobile messenger application for smartphones with free text and free call features.

❖ World of Warcraft: One of the most popular MMORPG created by Blizzard Entertainment.

: One of the most popular Chinese microblogging .

: Another Chinese microblogging website launched by Tencent in Apr/ 2010.

❖ AND MORE … 30 Fitting of datasets

7 8 8 6 x 10 x 10 x 10 x 10 18 12 3 9 Facebook 16 LinkedIn 8 10 QQ 2.5 Car sales in China 14 7 Facebook (model) LinkedIn (model) 12 8 2 6 QQ (model) Car sales in China (model) 10 5 6 1.5 8 4

6 4 1 3

4 2 2 0.5 2 1

0 0 0 0 Feb/01 Oct/03 Jun/06 Feb/09 Oct/11 Jun/14 Fitting of datasets

5 5 10 x 10 x 10 x 10 10 10 6 Available iOS apps 9 9 Download iOS apps Available Android apps 5 8 8 iOS app (model) Download iOS apps (model) 7 7 Android apps (model) 4 6 6

5 5 3

4 4 2 3 3

2 2 1 1 1

0 0 0 Sep/08 Oct/09 Nov/10 Nov/11 Dec/12 Jan/14 Fitting of datasets

8 7 8 5 x 10 x 10 x 10 x 10 5 10 2 18 Kakao Talk Line 4.5 9 1.8 16 Wechat 4 8 Plug−in car sales 1.6 14 Kakao Talk (model) 3.5 7 Line (model) 1.4 12 Wechat (model) 3 6 Plug−in car sales (model) 1.2 10 2.5 5 1 8 2 4 0.8 6 1.5 3 0.6 4 1 2 0.4

0.5 1 0.2 2

0 0 0 0 Jan/10 Dec/10 Oct/11 Sep/12 Jul/13 Jun/14 Fitting of datasets

7 7 7 7 x 10 x 10 x 10 x 10 4 16 4.5 3.5

4 3.5 14 3

3.5 3 12 2.5 3 2.5 10 2.5 2 2 8 2 PS2 sales 1.5 1.5 6 GBA sales 1.5 GameCube sales 1 1 4 sales 1 PS2 (model) GBA (model) 0.5 0.5 2 0.5 GameCube (model) XBOX (model) 0 0 0 0 Mar/01 Apr/03 Jun/05 Jul/07 Aug/09 Sep/11 Fitting of datasets

6 6 8 x 10 x 10 x 10 3 8 6 Evernote Sina Weibo 7 2.5 TC Weibo 5 Evernote (model) 6 Sina Weibo (model) TC Weibo (model) 2 4 5

1.5 4 3

3 1 2

2

0.5 1 1

0 0 0 Feb/08 Feb/09 Feb/10 Feb/11 Feb/12 Feb/13 Fitting of datasets

8 x 10 2 140 Alipay 1.8 Hpl. Youtube AC Alipay (model) 120 1.6 Hpl. Youtube AC (model)

1.4 100

1.2 80 1 60 0.8

0.6 40

0.4 20 0.2

0 0 Jun/05 May/06 Mar/07 Jan/08 Nov/08 Sep/09 What’s behind the data?

37 Implication for

How can we use this model?

Finding N, c1, c2 from historical growth data.

These parameters have profound meanings to business.

38 Parameter estimation problem

Model Parameter estimation

Parameters are

39 INFORMATION ZU064-05-FPR user-growth˙2015˙07˙22˙blinded 23 July 20158:47

Network Science 9

Table 1. Estimated model parameters from fitting of historical datasets. Value is fractional −3 number of transitions to users per day, e.g., c1 = 2.25 × 10 means that 2.25 out of 1000 prospective users become users per day.

Data Market size Macro growth Micro growth Growth rate by NRMSE (N) rate by word of rate by word of personal choice (%) c1⟨k⟩ mouth (c ⟨k⟩) (c ) mouth ( N ) 1 2

Facebook 1.29×109 1.77×10−12 2.27×10−3 2.25×10−5 5.22 LinkedIn 4.42×108 3.63×10−12 1.60×10−3 7.83×10−6 4.42 Tecent QQ 2.70×108 5.60×10−12 1.51×10−3 9.33×10−13 11.6 Evernote 1.35×108 2.40×10−11 3.23×10−3 2.31×10−6 5.09 Line 3.87×108 1.29×10−11 4.99×10−3 1.44×10−4 3.63 Kakao Talk 3.86×108 4.03×10−12 1.56×10−3 1.26×10−4 3.27 Playstation 2 1.72×108 2.67×10−12 4.60×10−4 3.19×10−4 3.80 Gameboy 4.38×107 4.66×10−11 2.04×10−3 8.11×10−4 6.93 Advantage SP XBOX 2.98×107 4.52×10−11 1.34×10−3 4.36×10−4 8.18 Nintendo 2.22×107 1.20×10−10 2.66×10−3 4.43×10−30 9.86 GameCube Available iOS apps 1.66×106 8.67×10−10 1.44×10−3 1.24×10−4 8.72 Downloaded iOS 1.02×1011 2.32×10−14 2.36×10−3 3.48×10−5 7.33 apps Available Android 1.36×106 2.14×10−9 2.91×10−3 1.09×10−4 5.50 apps Sales of US brand 3.32×107 2.27×10−11 7.56×10−4 1.58×10−5 2.48 cars in China Sales of plug-in 6.73×106 1.63×10−10 1.10×10−3 8.13×10−5 3.84 vehicles in US US hospital 1.39×102 4.30×10−5 5.93×10−3 1.38×10−4 4.62 accounts on Youtube Wechat 6.16×108 8.99×10−12 5.53×10−3 8.76×10−5 2.54 Sina Weibo 5.95×108 7.06×10−12 4.19×10−3 1.23×10−4 9.80 Tencent Weibo 3.63×108 2.36×10−11 8.57×10−3 2.37×10−25 12.4 Alipay 4.24×108 7.38×10−12 3.12×10−3 3.01×10−26 7.20 Internet users 4.93×109 8.91×10−12 4.39×10−4 2.42×10−5 6.47 40

markets and attracting users from different communities. Some datasets contain over a decade of user growth information. For instance, Tencent, 13yearsago,hadonlyseveral servers and was valued less than 50,000 US dollars. Now, Tencent’s market value rose to 101 billion dollars. Facebook, as another example, began with only admitting members from the student body of Harvard, and it has now more than 1100 million users worldwide. In the past decade, many factors have contributed to the business growth of online products and services. However, their user growth profiles are all governed by a growth equation. The estimated parameters of the 21 products or services are given in Table 1. Let us take alookatc1 ⟨k⟩ and c2,whichisthemicro-levelgrowthratecorrespondingtoinfluenced Implication to business strategy

❖ Customer service ❖ Word of mouth

❖ Promotion and ❖ Personal choice advertisement

❖ Effective market ❖ Market potential

41 What the model tells us?

❖ Combined incentive: C = ↵c1x + c2 ❖ Effective (remaining) market: N x ❖ Rate of growth of user population:

x˙(t) (↵c x + c ) (N x) ⇡ 1 2 ⇥ Purchase incentive Market size | {z } | {z }

42 We saw tremendous growth in Wii, having Nintendo Wii is sold 1.5 million around Christmas time, significantly stronger! while only 1 million for XBOX and PS3. C = ↵c x + c (a) Console worldwide total cummulatvie sales in 2010 1(b) Combined2 incentive rate C(t) 90 0.01 Estimated XBOX360 sales XBOX360 Estimated PS3 sales Playstation 3 0.008 Estimated Wii sales Nitendo Wii Xbox360 sales 80 PS3 sales 0.006 Wii sales 0.004

70 0.002

0 Feb10 Apr10 Jun10 Aug10 Oct10 Dec10 Feb11

60

2010 (c) Console worldwide weekly sales in 2010 1.5 Sales (in million) Estimated XBOX360 sales Estimated PS3 sales 50 Estimated Wii sales 1 Xbox360 sales PS3 sales Wii sales 40 0.5 Sales (in million)

30 0 Feb10 Apr10 Jun10 Aug10 Oct10 Dec10 Feb11 Feb10 Apr10 Jun10 Aug10 Oct10 Dec10 Feb11

43 Wii is still very strong but its growth shrank!

C = ↵c1x + c2 (a) XBOX360 worldwide total cummulatvie sales in 2012 (d) Combined incentive rate C(t) 80 0.01 Estimated XBOX360 sales XBOX360 75 Playstation 3 Xbox360 sales 0.008 Nitendo Wii 70 0.006 65 Sales (in million) 60 0.004 Jan12 Mar12 May12 Jul12 Sep12 Nov12 Jan13

(b) PS3 worldwide total cummulatvie sales in 2012 0.002 80 Estimated PS3 sales 0 75 PS3 sales Feb12 Apr12 Jun12 Aug12 Oct12 Dec12 Jan13

70

65 (e) Console worldwide weekly sales in 2012

Sales (in million) 1.4

2012 60 Estimated XBOX360 sales Feb12 Apr12 Jun12 Aug12 Oct12 Dec12 Feb13 1.2 Estimated PS3 sales Estimated Wii sales 1 (c) Wii worldwide total cummulatvie sales in 2012 Xbox360 sales 100 Estimated Wii sales 0.8 PS3 sales Wii sales Wii sales 98 0.6

Sales (in million) 0.4 96 0.2 Sales (in million) 94 Feb12 Apr12 Jun12 Aug12 Oct12 Dec12 Jan13 0 Feb12 Apr12 Jun12 Aug12 Oct12 Dec12 Jan13

44 Why?

In 2012

❖ N for Wii, XBOX and PS3 ≈ 100, 104 and 106 million

❖ N – x for Wii ≈ only 8 million in 2012.

❖ N – x for XBOX and PS3 ≈ 30 million and 36 million, respectively.

Nintendo knew the Wii market was small and made a decision to launch Wii U!

45 ZU064-05-FPR user-growth˙2015˙07˙22˙blinded 23 July 20158:47

12

70 80 c2 = c1 ⟨k⟩ −4 c2 =10 c1 ⟨k⟩ 70 80 60

70 60 50 60 50 50 40 40 40

30 30 30 Stetting time (year) Lifespan time (year) 20 20 10 20

0 0 10 −2 10 −10 −2.5 Application:−20 −3 growth span of products −3.5 0 −30 −4 −4 −3.8 −3.6 −3.4 −3.2 −3 −2.8 −2.6 −2.4 −2.2 −2 log (c ) c k 10 2 log10( 1 ⟨ ⟩) c k log10( 1 ⟨ ⟩) −4 −2 −30 −2 Fig. 4. Lifespan of products versus c1 ⟨k⟩∈[10 10 ] and c2 ∈ [10 10 ] (left panel). Lifespan −4 −2 −4 of products for 10 ≤ c1 ⟨k⟩≤10 and 10 c1 ⟨k⟩≤c2 ≤ c1 ⟨k⟩.Greenareaisthedesirableregion ❖forWhat lifespan of 5 tois 10 the years (right time panel). needed to reach 95% of the final the(saturated) micro-level word-of-mouth user component population? of the growth rate varies very little, while the personal-choice component varies widely from product to product. Aquestionarisesatthispoint.Whydothe21datasetshavesuch consistent (similar) −2.8 −4 ❖word-of-mouthcomponentCan we have of thesome growth rate quantitativec1 ⟨k⟩ (all being ranged from indicators 10 to 10 )? for success? Here, we borrow the concept of settling time from control theory, which is used for analysis of the dynamic property of a system (Phillips and Habor, 1995). Specifically, we define the ❖lifespanAnyof ahints product or from service as thedata? time required for its user population to grow from a small initial value x(t0)=αN at t = t0 to reach its final value, namely, x(te)=βN at t = te, within a small tolerance range (e.g., 5%) around x(te) for the first time. The time duration Ts = te −t0 is the lifespan of a product. Thus, the lifespan is basically the settling time commonly used in control theory, and in our case, if the user network is homogeneous and uncorrolated network, and assuming ❖ Growth span of a product or service is δ ≈ 0, the lifespan of a product is given by

log(1 − α)+log(βc1 ⟨k⟩ + c2) − log(1 − β) − log(αc1 ⟨k⟩ + c2) Ts(c1 ⟨k⟩,c2) ≈ (12) c1 ⟨k⟩ + c2 The derivation of the above result is given in the Appendix. Inthispaper,wesetα = 0.02 and β = 0.95. Figure 4 shows the lifespan of a product versus c1 ⟨k⟩ and c2.Notethatinreality,a successful product or new industry always takes several yea46 rs to develop from the intro- duction stage to the maturity stage. Here, we assume that the lifespan of a product is about 5to10years(i.e.,5≤ Ts ≤ 10) with the personal-choice component of the growth rate given as c2 = λc1 ⟨k⟩,whereλ is the ratio of the personal-choice component compared to the word-of-mouth component of the growth rate. Here, we set 10−4 ≤ λ ≤ 1. In particular, λ = 1meansthatpersonalchoiceisasimportantaswordofmouth,while λ = 10−4 means that the impact of personal choice is negligible. Figure 4 shows that with a constant λ, the smaller the value of c2,thelongerthelifespanofaproduct.Thegreenareainthe left panel of Figure 4 is the desirable region with lifespan Ts ranging from 5 to 10 years (black segment in the y-axis). Note that under this condition, the suitable range of c1 ⟨k⟩ is about 10−3.8 to 10−2.8 (black segment in the x-axis), which agrees with the result shown in ZU064-05-FPR user-growth˙2015˙07˙22˙blinded 23 July 20158:47

Growth12 span (5-10 years for successful products)

70 80 c2 = c1 ⟨k⟩ −4 c2 =10 c1 ⟨k⟩ 70 80 60

70 60 50 60 50 50 40 40 40

30 30 30 Stetting time (year) Lifespan time (year) 20 20 10 20

0 0 10 −2 10 −10 −2.5 −20 −3 −3.5 0 −30 −4 −4 −3.8 −3.6 −3.4 −3.2 −3 −2.8 −2.6 −2.4 −2.2 −2 log (c ) c k 10 2 log10( 1 ⟨ ⟩) c k log10( 1 ⟨ ⟩) −4 −2 −30 −2 Fig. 4. Lifespan of products versus c1 ⟨k⟩∈[10 10 ] and c2 ∈ [10 10 ] (left panel). Lifespan −4 −2 −4 of products for 10 ≤ c1 ⟨k⟩≤10 and 10 c1 ⟨k⟩≤c2 ≤ c1 ⟨k⟩.Greenareaisthedesirableregion for lifespan of 5 to 10 years (right panel).

Nowthe micro-level check the word-of-mouth table again! component You of thewill growth see rateamazingvaries very consistency! little, while the personal-choice component varies widely from product to product. Aquestionarisesatthispoint.Whydothe21datasetshavesu47 ch consistent (similar) −2.8 −4 word-of-mouthcomponent of the growth rate c1 ⟨k⟩ (all being ranged from 10 to 10 )? Here, we borrow the concept of settling time from control theory, which is used for analysis of the dynamic property of a system (Phillips and Habor, 1995). Specifically, we define the lifespan of a product or service as the time required for its user population to grow from a small initial value x(t0)=αN at t = t0 to reach its final value, namely, x(te)=βN at t = te, within a small tolerance range (e.g., 5%) around x(te) for the first time. The time duration Ts = te −t0 is the lifespan of a product. Thus, the lifespan is basically the settling time commonly used in control theory, and in our case, if the user network is homogeneous and uncorrolated network, and assuming δ ≈ 0, the lifespan of a product is given by

log(1 − α)+log(βc1 ⟨k⟩ + c2) − log(1 − β) − log(αc1 ⟨k⟩ + c2) Ts(c1 ⟨k⟩,c2) ≈ (12) c1 ⟨k⟩ + c2 The derivation of the above result is given in the Appendix. Inthispaper,wesetα = 0.02 and β = 0.95. Figure 4 shows the lifespan of a product versus c1 ⟨k⟩ and c2.Notethatinreality,a successful product or new industry always takes several years to develop from the intro- duction stage to the maturity stage. Here, we assume that the lifespan of a product is about 5to10years(i.e.,5≤ Ts ≤ 10) with the personal-choice component of the growth rate given as c2 = λc1 ⟨k⟩,whereλ is the ratio of the personal-choice component compared to the word-of-mouth component of the growth rate. Here, we set 10−4 ≤ λ ≤ 1. In particular, λ = 1meansthatpersonalchoiceisasimportantaswordofmouth,while λ = 10−4 means that the impact of personal choice is negligible. Figure 4 shows that with a constant λ, the smaller the value of c2,thelongerthelifespanofaproduct.Thegreenareainthe left panel of Figure 4 is the desirable region with lifespan Ts ranging from 5 to 10 years (black segment in the y-axis). Note that under this condition, the suitable range of c1 ⟨k⟩ is about 10−3.8 to 10−2.8 (black segment in the x-axis), which agrees with the result shown in The growth equation is applicable to general growth data! We perceive applications in Data driven business strategy formulation Population estimation Study of growth and behaviour spreading

http://www.cio.com/article/2462414/big-data/why-analytics-makes-tesla-better-than-jaguar.html

48 Applications of data analytics will continue to proliferate. However, real-world data are often deceptive when viewed partially or interpreted with inappropriate tools.

“Partial truth is not truth.” — Prof. Leon Chua

Applying the right scientific methods and using sufficient and right sets of data are often essential for generating information.

49 M.C. Escher, Dutch graphic artist (1898-1972) 50