STATISTICS CONFERENCE (MyStats 2015) PROCEEDINGS

Malaysia Statistics Conference (MyStats 2015) Proceedings Proceedings 2015) (MyStats Conference Statistics Malaysia Enhancing Statistics in an Inter-connected and Digital World

17 November 2015, Sasana Kijang

Organised by Contents

Foreword...... 5

Conference Summary...... 7

Keynote Address Senator Dato' Sri Abdul Wahid Omar...... 12

Closing Remarks Dr. Mohd Uzir Mahidin...... 17

Advancement of Data Transmission through METS Tan Bee Bee...... 19

Synergising GST Rate with Direct Tax Rate in Sustaining Economic Growth in Malaysia: Is There A Laffer Curve? Sherly George and Y.B.Prof Dr Syed Omar Syed Agil ...... 23

Development of ICT in Garnering Statistics Sabri Omar...... 39

DESA: Growing the Digital Economy from a National Perspective Mohd Jalallul @ Jasni Zain Mohd Isa, Syahida Ismail and Nur Asyikin Abdul Najib...... 53

A Fuzzy Approach to Enhance Uncertain Post Flood Damage Assessment for Quality Risk Analysis Dr. Sharifah Sakinah Syed Ahmad and Emaliana Kasmuri ...... 61

Malaysian Household Consumption Expenditure: Rural vs Urban Dr. Wan Zawiah Wan Zin and Siti Fatin Nabilah...... 69

Open Data and Challenges Faced by National Statistics Office Siti Haslinda Mohd Din, Nur Aziha Mansor and Faiza Rusrianti Tajul Arus...... 77

Multivariate Time Series Similarity-Based Complex Network In Stocks Market Analysis: Case of NYSE During Global Crisis 2008 Professor Dr. Maman Abdurachman Djauhari and Gan Siew Lee...... 83

MyStats 2015 Proceedings 1 Perceived Happiness and Self-rated Health: The Twins? A Bivariate Ordered Probit Models Analysis Using World Value Survey Koay Ying Yin, Eng Yoke Kee and Wong Chin Yoong...... 89

Prediction of Stock Index Using Autoregressive Integrated Moving Average and Artificial Neural Network Angie Tan Sze Yiing and Chan Kok Thim...... 95

The Cyclical Extraction Method and the Causality Model in Business Cycle Analyses: Do they complement or clash to one another? Abdul Latib Talib...... 103

Inflation of a Type II Error Rate in Three-Arm Non-Inferiority Trials Dr. Nor Afzalina Azmee...... 115

Generalized autoregressive moving average: An application to GDP in Malaysia Dr. Thulasyammal Ramiah Pillai...... 121

A Survey on User's Perceptions of Electric Vehicles for Mobility in a Malaysian University Siti Azahirah Asmal, Safiah Sidek, Sabrina Ahmad, Massila Kamalrudin, Aminah Ahmad and Mohamad Tahkim Salahudin...... 125

Academic Achievement on Student Motivation: Latent Class Analysis Across Gender Group Dr. Nurulkamal Masseran, Zainudin Awang, M.A.M. Asri, Ahmad Nazim Aimran and Hidayah Razali...... 133

Chikungunya Disease Mapping in Malaysia: an Analysis Based on SMR Method, Poisson-gamma Model, SIR-SI Model 1 and SIR-SI Model 2 Dr. Nor Azah Samat and S. H. Mohd Imam Ma’arof...... 143

Defect Identification: Application of Statistical Process Control in Production Line Erni Tanius, Noorul Ameera Adnan, Sharifah Zuraidah Syed Abdul Jalil, Che Manisah Mohd Kasim ...... 149

Capture, Organize and Analyze Big Data for Research Enterprise Rakhisah Mat Zin...... 155

Pattern of Leverage Regions Iffah Atqa, Sonia Hajar Marophahita, Leilya Kartika, Khodija Kamila, Risna Yuliani and Tudzla Hernita...... 161

2 MyStats 2015 Proceedings MyStats 2015 Proceedings 3 4 MyStats 2015 Proceedings Foreword

On 17 November 2015, Bank Negara Malaysia, in Frontier in Statistics; (ii) Fulfilling Statistical collaboration with the Department of Statistics, Needs of Digital Citizens; (iii) Evolution in Data Malaysia and Institut Statistik Malaysia, hosted and Information Capturing; (iv) Open Data the Third National Statistics Conference, MyStats to Maximise Usage and Value of Statistics; 2015, at Sasana Kijang, Bank Negara Malaysia. (v) Statistical Methodology & Application; and The theme of the conference was “Enriching (vi) Big Data Analytics. For the first time, a session Statistics in an Inter-connected and Digital with young statisticians was introduced to World”. MyStats 2015, in comparison with the provide the opportunity for young statisticians, second conference held in 2013, has attracted a students and practitioners in their early career to larger number and wider range of participants, discuss challenges, opportunities and the future which include statistical compilers, statisticians, of statistics. analysts, economists, policy makers and academicians in Malaysia. Twenty nine papers were presented and about five hundred participants took part in the The key objective of MyStats is to provide a conference. The Keynote Address was delivered collaborative platform for compilers and users of by YB Senator Dato’ Sri Abdul Wahid Omar, statistics to share, discuss and highlight issues Minister in the Prime Minister’s Department, in statistical analysis and policy formulation Malaysia. This conference volume is a collection as well as challenges in the compilation of the conference summary, Keynote Address, and communication of statistics. The topics Closing Remarks and the eighteen papers discussed during MyStats 2015 include: (i) New presented during the conference.

MyStats 2015 Proceedings 5 6 MyStats 2015 Proceedings Conference Summary

The theme of the Third National Statistics on market prices from a broad base of digital Conference (MyStats 2015), “Enriching Statistics sources’ and hence, helps to form a measure of in an Inter-connected and Digital World”, is very contemporaneous inflation rate. In this sense, relevant as the world today is highly inter- technology offers a mean to fill up data gaps, connected due to the greater financial and particularly in meeting the urge for reliable, up- economic integration, and advancement in to-date and high-frequency statistics for a better information technology. The borderless world, understanding of economic conditions. the fast pace of information exchange and demand for more information have created the Professor Rigobon argued that there are two need for additional or new statistics for analysis, types of main data relevant for this purpose, diagnosis and decision making. The twenty nine namely Designed Data and Organic Data. The presentations, which were organised into six former is compiled by mean of administration, sessions, one roundtable discussion and a panel survey or based on compliance reporting while session, focused on six main areas: the latter is akin to big data, which is further divided into transactional and aspirational sub- 1. New Frontier in Statistics; categories. Aspirational data are collected using 2. Fulfilling Statistical Needs of Digital Citizens; tracking technology from digital social networks, 3. Evolution in Data and Information Capturing; such as Facebook and Twitter accounts, and 4. Open Data to Maximise Usage and Value of Professor Rigobon contended that although Statistics; transactional data are ideal for analysis, it is not 5. Statistical Methodology and Application; and readily available. 6. Big Data Analytics. Professor Rigobon also shared with the audience his involvement in the three projects Session 1: New Frontier in Statistics in measuring inflation rate, namely the “Billion Prices Project”, “Online Data on Real Exchange The first session of the conference, entitled Rate and Purchasing Power Parity”, and “Natural “New Frontier in Statistics”, served as a prelude Disaster vis-à-vis Economic Demands. to the discussions of the conference theme. The “Billion Prices Project” is an academic This session, with Professor Roberto Rigobon initiative that uses prices collected from from the MIT Sloan School of Management hundreds of online retailers around the world and Zakiah Jaafar from the Economic Planning on a daily basis but the setback is that the data Unit, as speakers, was chaired by Professor collected are very difficult to digress as these Kamarulzaman Ibrahim, President of Institut are aspirational data. The Purchasing Power Statistik Malaysia (ISM). In light of the significant Parity project emphasised on a price index that impact of data revolution in this digital world, represents the value of comparable goods in Professor Roberto Rigobon in his presentation, different locations, and in this case, the famous “From Organic to Designed Data: the Billion Big Mac’s Index is used. The third project infers Prices Project”, discussed the use of technology data from natural disaster to the demand in an to measure life and economic developments economy. Consumption pattern changes when in the context of big data analytics. As an an economy is hit with natural disaster and in example, technology is used to analyse text this case, natural disaster data, be it in the form messages or the pitch of voice to detect and of aspirational or transactional, are both useful to prevent untoward incident for persons in gauge and optimise time to dispatch assistance distress. In measuring economic conditions, to affected economies. Professor Rigobon technology could facilitate the collection of data concluded that while technology facilitated

MyStats 2015 Proceedings 7 broader collection of data, it is equally important compilation, especially in this rapidly changing to build capacity in deciphering the voluminous external environment and realignment with data collected meaningfully to answer questions. policy priorities.

Acknowledging the impact of rapidly Session 3C : Roundtable Discussion for Young changing external environment and the Statisticians : How Could Young Statisticians Help need to realign policy priorities, Zakiah Jaafar in Raising Statistical Awareness? (Economic Planning Unit) discussed the topic on "Changing Needs of Statistics Towards The roundtable discussion on “How Could Becoming an Advanced Nation". She noted Young Statisticians Help in Raising Statistical that major global megatrends, such as rapid Awareness?”, was chaired by Toh Hock Chai urbanisation rate, accelerating technological (Bank Negara Malaysia) and Dr. Nurulkamal change, an ageing world and greater global Masseran (Universiti Kebangsaan Malaysia). connections are key challenges confronting the The session was well represented by a diverse success of the country in achieving developed group of young statisticians, comprising Mahdir nation status by 2020. She reckoned that it Bahar (Department of Statistics, Malaysia is necessary for compilation of statistics to (DOSM)), K. Megala Kumaran (DOSM), Ewilly J.Y. adapt to global developments, and Big Data Liew (Monash University), Dr. Zamira Hasanah is an alternative avenue for this purpose. In Zamzuri (Universiti Kebangsaan Malaysia), Dr. relation to this, she highlighted that the Norhaslinda Ali (Universiti Putra Malaysia), country is dealing with these challenges by Chia Yi Han (Frost & Sullivan Asia Pacific), Goh shifting the national development policy focus Wei Ping (CTOS) and Rayyan Teh Hassan (Bank from technological and growth generation Negara Malaysia). The session was divided to that of people economy. To facilitate this into sub-topics which discussed the statistics process, national statistical agencies will need programme in universities; profession as to enhance their data compilation to include statisticians; contributions and recognitions indicators on productivity, wellbeing as well as of statisticians; industries and practitioners’ social aspects. In addition, she discussed the network; and promoting statistics to the young six strategic thrusts of the Eleventh Malaysia generation and certification of statistician as Plan (2016 - 2020) to address the needs of the professionals. people and the six game changers to accelerate Malaysia's development. She also highlighted The first sub-topic talked about how to attract key challenges ahead in compilation of statistics outstanding students to enrol into statistics to support the development planning. These programme; and how to make statistics challenges include the need to compile data at programme in universities more interesting and entity level rather than at aggregated economy aligned with the industry needs. It is critical to as a whole; more environmental related data to ensure that a balance is struck between theory measure sustainable development; diverse data and practice in statistics programme’s curricula compilation perspective to people economy to prepare the students to meet the demands of and not just capital economy; higher frequency the industry. It will also facilitate the students of data under the fast paced environment; to connect the theories learnt and relate them employing the concept of data warehouse to the real world of statistics. To achieve this, whereby different datasets are to be integrated industrial components must be embedded for holistic analysis; and finally, it is timely into the syllabus in different forms such as to embrace big data analysis. In conclusion, project paper, industry case study, knowledge she acknowledged that the Official Statistical sharing by the practitioners, visits between Compilers have made substantive progress in industry and the university and etc. Internships meeting users’ needs but there are still room and fellowships are also ideal ways to gain for improvement on the method of statistical hands-on experience whilst still in school.

8 MyStats 2015 Proceedings Consequently, a successful industry-university it is easy to develop statistics programmes in partnership is fundamental. To cope with the universities that can shape students to fit to dynamic nature of statistics, the curricula are the industry, the supports from industry leaders suggested to be reviewed more frequently. and practitioners are essential. The industries Furthermore, students should also be exposed are also in need of statisticians as the use of to raw data and learn how to draw meaningful statistical analysis is becoming more critical to information from this data. DOSM has been support informed decisions. Industry players, signing the Memorandum of Understanding practitioners and universities have their role with the universities to facilitate sharing of and must work together in promoting statistical raw and granular data. As statisticians usually awareness and adoption. The industries can work with statistical software, computer also provide scholarships to encourage more programming courses are particularly helpful students to further their study in statistics. for the students. Profession as statisticians in Malaysia is commonly being associated to the The last sub-topic talked about how to national statistics office. The job opportunity promote statistics to young generation; and for statisticians was not so noticeable to the certification as professionals which is the students compared to other popular necessary to improve the career as statisticians. professions. The profession is in fact spread In order to familiarise the younger generation across many related fields and may be known as with statistics, the exposure ought to begin quantitative analysts, market research analysts, at an early stage. The publicity should start data analysts, or data scientists. For Malaysia, as early as at the primary or secondary school the percentage of statisticians as a ratio of total via activities such as mathematics school population is still very low compared to other camp or mathematics summer camp to get countries. Global job growth for statisticians the student familiar with the numbers. More are projected to be high and rising as a result promotion should be done in collaboration from more widespread use of statistical with various parties such as national statistics analysis to make data-informed decision- office, universities, regulators, policy makers making. Additionally, an extensive quantity of and industry players. For example, students data is produced from internet searching and in secondary schools should be introduced to the use of social media, smart-phones, and statistics as one of the areas of specialization other mobile devices will open up new areas offered by the universities. With regard to the of information for analysis. This will increase professional certification, even though there is the industrial needs towards statisticians to a body that offers a professional qualification make sense of these unstructured data and to in statistics, namely the Royal Statistical Society analyze large amount of data. Job opportunities in the United Kingdom, its awareness is still not are anticipated to be encouraging for those widespread among statisticians in Malaysia. This with high levels of statistical knowledge, data professional certification can be an added analysis skills and computer programming advantage to the practitioners of statistics skills. The universities and industries need to since it gives formal recognition of a member’s work together to promote the profession to the statistical qualifications as well as professional students. They in turn also need to promote training and experience. their skills and potentials to find suitable jobs for them. The chairs concluded by emphasising on the importance of close collaboration between Echoing from the previous sub-topics, the practitioners, industry leaders and universities sub-topic on 'Industries and Practitioners in raising statistical awareness. More promotion Network' addressed the role of industry leaders should be done, and in fact, should begin at and practitioners on how they should support an early stage. In order to ensure statistics statisticians in raising statistical-literacy. While are tailored to the needs of the various users,

MyStats 2015 Proceedings 9 the statisticians must prepare themselves by of the availability of abundance of data from increasing their knowledge and skills as well as various sources and consider the possibility the industrial statistical requirements. Computer of making use of these data in analysing and programming skills may provide a competitive making sense of economic, social and political advantage to statisticians. issues occurring around the globe. The rapid growth and development of data processing Panel Discussion: Statistics in Inter-connected and analytics can be utilised to tap into these and Digital World: Moving forward large data sets. In order to ensure the analysis of the data tapped is reflective of the real- The panel session on “Statistics in Inter- world situation, he cautioned the importance connected and Digital World: Moving forward”, of ensuring the accurate interpretation of was chaired by Dr. Mohd Uzir Mahidin these data. Hence, users need to be properly (Department of Statistics, Malaysia). Well educated of the various sources of macro-level represented by a diverse group of stakeholders and micro-level data. Similarly, data provider including researchers, practitioners, industry should also work towards gradually bridging professionals and policymakers, the session the gaps between these two data sources to discussed and called for the compilation better facilitate users. In his opinion, users in this of relevant and representative new data to country rely heavily on official statistics where meet users’ needs for analysis and policy only so much aggregated level information can formulation. Nazaria Baharudin (DOSM) as the be provided. Nevertheless, market sentiments first panellist talked about the transformation cannot be captured through aggregated data involved in the statistical business in the and should be evaluated instead via other digital age. She explained that DOSM has possible sources of big data. Moving forward, Dr. progressively embraced advanced technological Yeah talked about the needs of industry leaders infrastructure in statistical activity to reduce to develop the right tools to leverage on big cost of traditional data compilation and meet data. He stated three major areas that require increasingly complex demands of users. DOSM investment to tap the full potential of big data is no stranger to handling huge and voluminous within the industry and business communities data sets to produce trade and finance statistics. - firstly to establish awareness, understanding However, she believed that DOSM can leverage and skill to use big data; secondly, mutual data on the increase in technological capacity in sharing between academics, public and private big data and open data by tapping into other sectors that could help in identifying the full potential sources of transactional data to potential on the usage of big data; and finally complement the currently available official data utilizations of the abundance of statistics across sources. In this regard, she stressed upon the various disciplines. He concluded that there is a needs for efficient data mining to ensure non- need to move from data-centric to knowledge- violation of security, privacy and intellectual centric community and mutual data sharing property; as well as reliability and quality of between the different segments of the society. data. In her opinion, sound, sophisticated and up-to-date ICT infrastructure must be in The third panellist, Professor Dr. Azami Zaharim place to enable DOSM to embark into big data. (ISM) shared the five key words of statistics Additionally, the organisation must be equipped namely collection, organization, presentation, with skilled human resources to add value to the analyze and interpretation. Meanwhile, the five mined data. She concluded that big data is a big main ideas of the digital world are electronic opportunity for statistical committee. spectrum, create, store, synchronise and send. The philosophy of statistical science is a branch Dr. Yeah Kim Leng (Malaysia University of knowledge that is essentially the study of of Science and Technology) started by uncertainty. In his opinion, the heterogeneity acknowledging the involvement of DOSM in of course structure and skill sets has led to handling large and comprehensive data such the gaps of understanding for other fields in as Census data sets. Elaborating from the application of statistics. One of the roles of a perspective of a user, he viewed that industries, statistician is to assist others who encounter policy-makers and analysts have to be aware uncertainty in their work. To conclude, he

10 MyStats 2015 Proceedings believed that statisticians play the most pivotal role to make use of big data as one of the effective tools to enhance the production, analysis and use of relevance and up-to-date statistics.

The final panellist, Mr. Tan Yaw (Pivotal) provided industrial perspective on big data. He perceived big data as a bi-product of other machineries such as data from computer-based or computer- powered engines. These data captures the life cycle of the machineries that provide insight into other fields such as predictive maintenance of machineries, improvement in terms of the efficiency of these machines’ processing powers and endurance, among others. He stressed on the need to be more proactive with the availability of these data to facilitate future production plans, develop new tools and machineries, as well as venture into new business areas. He believed that big data will impact, statistical models i.e. provide new ways of perceiving data and develop the best business models. He concluded that the ability to communicate the data effectively is the most essential element that must be worked on to ensure maximum benefits of mining big data.

Bringing the session to end, the chair concluded that producers and users of statistics must move forward collectively to leverage on the advancement of technology and abundance of data around us in making it into meaningful information.

MyStats 2015 Proceedings 11 12 MyStats 2015 Proceedings Keynote Address

Senator Dato’ Sri Abdul Wahid Omar 1 Economic progress and development

Alhamdulillah, marilah kita sama-sama Ladies and Gentlemen, merafakkan setinggi-tinggi kesyukuran ke As an open economy, Malaysia is exposed to the hadrat Allah S.W.T kerana dengan limpah kurnia risks of global economic uncertainty. The dan keizinanNya jua kita dapat berhimpun decline of crude oil prices, the depreciation pada pagi ini sempena Persidangan Statistik of ringgit and the economic slowdown of Malaysia kali ketiga, MyStats 2015. Semoga Malaysia's major trading partners have affected segala usaha kita mendapat keberkatan dan the country's economy. keredhaanNya jua. Despite the global economic uncertainty, Terlebih dahulu saya ingin mengambil Malaysia maintained a positive economic kesempatan ini untuk mengucapkan terima outlook. The following are some of the kasih kepada jawatankuasa penganjur kerana statistics to illustrate Malaysia's most recent sudi menjemput saya untuk menyampaikan economic performance: ucaptama dan seterusnya merasmikan MyStats 2015 yang merupakan sebahagian daripada Malaysia's economy grew 4.7 percent in the aktiviti sambutan World Statistics Day 2015. third quarter of 2015;

I would like to congratulate the organizers, ii. Manufacturing sector contributed strong Department of Statistics, Malaysia (DOSM), Bank growth of 4.8 percent compared to 4.2 percent Negara Malaysia and the Institute of Statistics in the second quarter of 2015. This encouraging Malaysia (ISM) on their third successful joint performance was driven by the strengthening efforts. This demonstrates a good synergy and of Electrical, Electronic & Optical products which "breaking-the-silos" in improving the inter- rose to 10.3 percent from 4.5 percent in the agency co-operation to implement high-impact previous quarter. This was supported by the programmes. increase from external demand mainly from China and USA; The meeting of minds and exchange of ideas through this conference will enable us to iii. GDP for the first three quarters grew at 5.1 synergise collaboration and unleash the full percent. Selected countries demonstrated potential of statistical capacity within our economic moderation during similar period: agencies and corporations to benefit the the USA from 2.7 percent to 2.0 percent; Taiwan nation as a whole. On that score, I would like from 0.5 percent to negative 1.0 percent; and to record our appreciation to the speakers, Singapore from 2.0 percent to 1.4 percent; panellist and contributors to the event. iv. Gross Fixed Capital Formation (GFCF) grew My remark today will touch on three key areas: 4.1 percent for the three quarters of 2015 from the country's current economic progress and 4.9 percent of the same period in the preceding development; the challenges faced by the year. The performance of investment grew at statistical community in the inter-connected a higher rate of 4.2 percent from 0.5 percent and digital world; and finally the significance of in the second quarter of 2015, stimulated by MyStats2015. structure, in line with the strong growth of the

1 Minister at Prime Minister's Department

MyStats 2015 Proceedings 13 construction sector. Business 2016. These international recognitions demonstrate the government's successful The growth of private investment rose to 5.5 efforts to transform the economy through percent and public investment rebounded to various policies, programmes and blueprints. 1.8 percent from a negative growth of 8.0 in Moving forward, the economic fundamentals previous quarter; will be further strengthened through the implementation of the Eleventh Malaysia v. The Gross National Income per capita for the Plan with focus on Inclusivity, Social Welfare, third quarter 2015 was RM36,306 gradually Human Capital Development, Green Growth, moving towards the minimum World Bank’s Infrastructure and Innovation & Productivity. threshold (of USD12,736 as at 2014) for a high- income economy by 2020; In the short term, Budget 2016 will focus on maintaining growth as well as expanding vi. Balance of payments continued to record a inclusiveness and sustainability in order to current account surplus of RM5.1 billion in third maintain the performance of the national quarter 2015 as compared to RM7.6 billion in economy the previous quarter. During January-September The Challenges Faced by the Statistical 2015, current account also recorded a surplus Community in the Inter-connected and of RM22.6 billion which equal to 2.7 percent of Digital World GDP; Ladies and Gentlemen, vii. For the first nine months of 2015, Foreign "Enriching Statistics in an inter-connected and Direct Investment recorded a higher net inflow Digital World" is a significant theme that posts of RM27.1 billion from RM25.6 billion in the new challenges for the production and analysis same period last year; of good statistics as the tool for evidence-based decision making. I strongly urge the statistical viii. The international reserves of Bank Negara community to harness the potential of digital Malaysia was USD94.0 billion until the end revolutions and identify opportunities to further of September 2015, sufficient to finance 8.7 strengthen and enhance the national statistical months of retained imports and 1.2 times the system. short-term external debt; We live in an inter-connected and digital ix. The total trade performance for the first nine world characterised by rapid economic, months of 2015 was RM1,139.1 billion while social and cultural change. Thanks to the trade surplus during the same period was new technologies, the volume, level of detail, RM63.8 billion; and speed of data available on societies, the economy and environment is without x. Inflation, as measured by the annual change precedent. This is the “data revolution” and it has in the Consumer Price Index (CPI) for the month started to change our way of life and work. of September 2015 is 2.6 percent; xi. The unemployment rate in August 2015 The changes in our society drive the needs for remained at 3.2 percent similar with the more and quicker statistics. Quality is multi- previous month; and faceted with different users placing different emphasis on dimensions such as accuracy xii. ICT GDP registered a double-digit growth of and timeliness. The challenge for statistical 12.3 percent in 2014, and the share of ICT to the organisation is to be sufficiently flexible and economy boosted to 17.0 percent. This shows agile to provide statistics according to users' the importance of ICT to the economy. needs at acceptable cost.

This year, Malaysia is once again ranked at the We have to learn to look at the abundance of 18th place in the Global Competitiveness Report data and find the opportunities hidden in there. 2015-2016. Malaysia is also placed 18th out of As Malaysia embarks to meet the demands of 189 countries in the World Bank Report of Doing new statistical products such as Sustainable

14 MyStats 2015 Proceedings Development Goals (SDGs), Sustainable communication tools. In this sense, DOSM as Consumption & Production Indicators (SCPI); a national statistical office has worked towards and Green Indicators, data revolution can improving statistical communication through be mobilised to monitor progress and foster their portal to facilitate wide spectrums of users sustainable development. and practitioners. The portal has been enhanced to include easy and accessible information; I am happy to see the vital role of DOSM as user-friendly features; as well as info-graphics. one of the members in the High-Level Group Besides, DOSM has also leveraged on social (HLG) for SDG which serves to provide strategic media in widening the dissemination of leadership and coordination; as well as reach statistics to the community such as Facebook out and promote dialogue and partnerships and Twitter. This transformation has grabbed the between the statistical community and other media attention. stakeholders on the implementation and monitoring of SDG goals and targets. Ladies and Gentlemen, The internet has become an indispensable Big data is characterized as data sets of infrastructure for the society. The number of increasing volume, velocity and variety. Big active internet users in the country has now data has the potential to produce more relevant exceeded 20.1 million, with 16.8 million being and timely statistics than traditional sources active on social media. This demonstrated of official statistics. Big Data Analytics (BDA) that Malaysians are heading towards a digital initiatives play an important role in processing lifestyle. data into meaningful and useful information. The Strategic Intent of National BDA Initiative Ladies and Gentlemen, has been formulated to encompass governance Mobile and internet revolutions provide tools mechanisms, communication plan and capacity and opportunity for statistical community to building. review and transform business processes to increase relevancy, efficiency and effectiveness. We are in need of data scientists to analyse and We are in the age of urgent demands for reliable, mine data from a range of sources to unlock up-to-date and high-frequency statistics. valuable and predictive insight. I strongly urge However, it is difficult to satisfy these demands the statistical community to create innovations with traditional data collection methods due that provide high-impact and significant value to respondents' reluctance to participate in to the nation. surveys and limited resources. To better serve the needs, the statistical system and process The use of technology and the digitalisation of requires transformation and modernisation. our economy and administration have led to an I am happy to see that DOSM has taken a leap incredible gold mine - Open Data. Open Data to establish an integrated systems framework can be defined as information that is available and efficient data management. DOSM has for anyone to use for any purpose at no cost. shifted towards a digitalise way of garnering information such as e-census/surveys; To keep pace with the digitally driven global Computer Assisted Personal Interview (CAPI), economy, we need to use open data to drive Computer Assisted Telephone Interview efficiency and boost productivity to transform (CATI and Intelligent Character Recognition into a knowledge-intensive economy. While it (ICR). In addition, data warehouse system in opens up a lot of opportunities, there are many DOSM has been improved to account for the challenges that will come along with it such as governance, management and archiving of security threats, privacy and data sharing issues. micro data and statistics. The statistical community should discuss the possibilities in tapping data from the Statistical offices play a fundamental role administrative sources through data sharing for the communication of evidence-based arrangements between agencies. On this note, information to the widest possible audience, I urge private sectors and practitioners to play which is nowadays facilitated by the their roles to accelerate the potential of big data possibility to use modern visualisation and and open data.

MyStats 2015 Proceedings 15 For the next five years, DOSM will embark statistics. It is hoped that this session will spark on obtaining big scale data for the purpose more interests in the field of statistical science of benchmarking the economics and social and exploration of new statistical frontiers as indicators. This will be executed through the well as eliminate the stigma that “statistics is Economic Census 2016; Household Income boring”. & Expenditure Survey (HIES) 2016; and Population & Housing Census 2020. This also Ladies and Gentlemen, entails the co-operation of the private sectors In 2019, Malaysia will host the 62nd World and the society to ensure these projects are Statistics Congress (WSC), a gathering of successful with high responses. statistical community from all over the world Significance of MyStats 2015 to discuss the development, challenges and innovation in statistical science. This prestigious Ladies and Gentlemen, congress will put Malaysia as the centre of I was told that there will be a total of 30 attention and propel our country´s role in presentations by local and international enhancing the development of statistical presenters at the plenary and parallel science. The smart partnership between Bank sessions today. I hope that the exchange Negara Malaysia, ISM and DOSM has translated of ideas, thoughts and experiences in the into this great opportunity for Malaysia hosting conference will enable the strengthening of WSC 2019. statistical capacity within our agencies and corporations to benefit the nation as a whole. Concluding remarks

On a daily basis, we encounter statistical Ladies and Gentlemen, information: from news, advertisements, media Malaysia has achieved remarkable social and reports and even in general conversation. economic development progress through Knowledge of how to use and communicate a series of five year development plans and statistics is therefore necessary for making various national blueprints. Essentially, all the critical and informed decisions. Thus, I am measures and initiatives put forward by policy delighted to see that the organisers has makers must be strongly supported by a broad provided some beneficial programmes to spectrum of relevant, reliable and up-to-date increase the statistical literacy and awareness statistics. such as short courses in statistics, mini career fair and exhibition of statistical products & I wish to applaud the statistical community services. for your excellent contribution to the national statistical systems, and to encourage all of I was informed that DOSM has initiated efforts you to deepen your efforts. I urge you to open to increase data sharing and stimulate research up and share your honest thoughts, ideas, activities through intelligent networking with challenges and desires towards achieving the public and private universities. Selected sets of digital transformation in the statistical system micro data are shared with these universities to via public-private partnership. spur statistical capacity building and increase research and development activities within the Finally, I would like to once again congratulate field of statistical science. the organisers in making this conference such successful event. I believe one of the unique and exciting features of MyStats 2015 is a session with young statisticians. This session provides the Ladies and Gentlemen, opportunity for young statisticians, students In the Name of Allah, The Most Beneficent and and practitioners in their early career to discuss The Most Merciful, I declare MyStats 2015 open. challenges, opportunities and the future of

16 MyStats 2015 Proceedings Closing Remarks

Dr. Mohd Uzir Mahidin 2

Bismillahirrahmanirrahim

Distinguished Guest, Ladies and Gentlemen, statistical methodology and the new frontiers in Assalamualaikum Warahmatullahi Wabarakatuh statistics. A very good evening, Salam Sejahtera, Salam 1 Malaysia dan Salam 1 Statistik. We are indeed very fortunate to have the opportunity to listen to two distinguished First of all, thank you to all the participants speakers during the plenary session. The first for attending the Third National Statistics presenter highlighted on the importance of Conference, MyStats 2015. This year’s theme, data science. He emphasised on the utilisation “Enriching Statistics in an Inter-connected and of wealth of data meaningfully in measuring our Digital World” is in tandem with Malaysia’s shift wellbeing. The next presenter emphasized the to a more digitally-connected advanced era to pre-emptive measures taken by the government become a digital nation. in the face of global force. She also highlighted the challenges faced by the statistical This conference is a platform for the statistical community to be sufficiently flexible and agile community to highlight and discuss Malaysia’s to provide statistics according to the users’ changing needs of statistics towards needs. Both speakers encouraged the statistical becoming an advance nation. On this score, community to be innovative in compiling it is imperative for all of us to take up this data and leveraging on technology in view of challenge and exuberantly leverage on the resource constraint. In depth efforts need to advancement of technology and abundance of be geared in escalating the smart networking data around us in transforming these data into between the public and private sectors. meaningful and valuable information. Distinguished Guest As mentioned by YB Minister this morning, Ladies and Gentlemen, the collaboration of DOSM, BNM and ISM Besides the paper presentations, for the first demonstrated a good synergy to improve the time ever this year, we have a special slot called inter-agency cooperation in executing high Roundtable discussion for Young Statisticians, impact programmes. In this regards, this event a platform where eight speakers consisting marks a prestigious and renowned smart of practitioners in their early career from partnership between the public and private government and private sectors share their sectors in elevating our statistical system within views and experiences in dealing with statistics. the ASEAN region as well as at the international level. We are confident this partnership will I am delighted to highlight the tremendous continue to expand at a greater height for the participation mainly from the Universities in benefit of the statistical community. attending the mini career fairs and exhibition of statistical products. I hope this will enlighten Distinguished Guest, the vast future of statistical science among the Ladies and Gentlemen, young generation that will uphold the success We have successfully completed eight main of the nation. I also would like to thank all the sessions throughout the day which consists of exhibitors who have caught the eyes of the 29 presented papers that conforms to the main participants with the display of interesting theme. The presented papers encompassed the statistical products. I hope this will enlighten emerging statistical topics on Open Data; Big the vast future of statistical science among the Data; evolution in data capturing, latest young generation that will uphold the success

2 Deputy Chief Statistician at Department of Statistics, Malaysia

MyStats 2015 Proceedings 17 of the nation. I also would like to thank all the Muhammad Ibrahim the Deputy Governor exhibitors who have caught the eyes of the for his endless support to the organization participants with the display of interesting of National Statistical Conference since the statistical products. The finale session of the beginning of this wonderful journey in 2012. conference was Panel Discussion entitled I would like to thank Bank Negara Malaysia “Statistics in inter-connected and digital world: and ISM in playing a significant role together Moving forward”. This was a concluding session with DOSM in making all the statistical events a for compilers and users to discuss and call for success. I believe we have become a close knit actions to enhance statistics in integrated and family that work together and stay together for dynamic digital world. The panellist consists of the benefit of uplifting the Malaysia's statistical learned and experienced speakers system. who have profoundly put in their remarkable views. Thank you to the committee members and people behind the scenes, for the tireless efforts Distinguished Guest, Ladies and Gentlemen, in ensuring smooth arrangements throughout I believe all the audience and participants have the event. I believe the achievement of this had fruitful sessions in the conference. Finally, I event will translate into a great success for would like to express my heartfelt gratitude to Malaysia in hosting the World Statistics Congress all the contributors especially the chairpersons, 2019. speakers and panellists for making the Third MyStats Conference 2015 a success. Our thanks Distinguished Guest, Ladies and Gentlemen, to the media on the comprehensive coverage Once again I would like to applaud all members and overwhelming support of this conference of the statistical community for the excellence and we are happy that this conference has been collaboration and teamwork in organising this broadcasted this morning. This effort has event in creating a memorable and historical translated to a wider community and a way of event. Lastly, thank you again to all of your promoting statistical literacy. presence, participation and support. I wish all of you the best and a pleasant trip home. I also would like to extend my deepest and WabillahiTaufikWalhidayah Wassalaamualaikum most sincere appreciations to YBhg. Datuk Warahmatullahi Wabarakaatuh

18 MyStats 2015 Proceedings ADVANCEMENT OF DATA TRANSMISSION THROUGH MALAYSIA EXTERNAL TRADE STATISTICS (METS) ONLINE Tan Bee Bee 1

Abstract as Bureau of Statistics and the statistics produced then was just external trade and Policy makers, Malaysian businesses and the estate agriculture. The source document for ETS general public rely on external trade statistics to from then till now is from the Royal Malaysian assist them in making data-driven decisions on Customs Department (Customs). Being an trade and the economy. These data are essential open economy, external or international trade to understanding the Malaysian economy is very important to the economy of Malaysia. with respect to its major trading partners and In 2014, total trade was RM 1.5 trillion and the impact from global developments. This 135% of Gross Domestic Product. International is especially so in the case of Malaysia which merchandise trade also plays a crucial role in is a very open economy where in 2014 total economic development because the process trade was 135% of Gross Domestic Product. binds the producers and consumers who Understanding the importance of accessible are located in various locations and different and timely external trade statistics, Department countries into a global economic system. Thus of Statistics Malaysia (DOSM) has strived the production and dissemination of the data towards an online dissemination of these has to be efficient and timely. This is also in line statistics. This paper aims to share the effort and with some of the objectives of DOSM which are initiative of Department of Statistics Malaysia to improve and strengthen statistical services towards online dissemination of external trade and delivery system as well as to be highly statistics also known as METS online. Users responsive to customer needs in a dynamic and can now have access to timely external trade challenging environment. statistics through METS online anywhere, anytime and with any device be it laptop, DOSM is always concerned on the need of a desktop, tab or mobile. METS Online is easy to reliable, up to date and easy access to external use, is interactive and provide details of product trade statistics among the stakeholders and with classification up to 6 digit HS or 5-digit SITC public. Due to the limitation on the usage as well as covering all the trading partners and of ICT until the 90's era, data dissemination with a long time series from 1990. DOSM will was done manually through publications via continue to improve METS online with more conventional (snail) mail and fax. The increase coverage, scope and in line with international in the usage of ICT has slowly transformed the developments and bringing external trade data dissemination in stages from hardcopies to statistics to the doorstep of users. softcopies and then to online data. Users of trade data include policy makers, 1. Introduction Malaysian businesses, embassies, the general public as well as companies overseas who have External Trade statistics (ETS) or International interest to trade with Malaysia. Internally, trade Merchandise Trade Statistics refers to the data is an important input to the compilation provision of data on the movement of goods of Balance of Payments Statistics and Gross between countries and areas. In Malaysia ETS Domestic Product. Recognising the importance has been compiled since the formation of the to meet user’s needs for timely ETS with easy Department of Statistics, Malaysia (DOSM) in access, DOSM embarked to facilitate the matter 1949. At that time the Department was known with the first online dissemination in 2010.

1 Department of Statistics, Malaysia

MyStats 2015 Proceedings 19 2. Background which is a 10-digit code is adopted while the 9-digit HS is applied for extra-ASEAN Trade. 2.1 Processing Currently there are about 12,324 AHTN codes and 9,450 HS codes. Malaysia ETS is compiled by DOSM on a monthly basis from the manual or electronic forms or For publication and dissemination purposes, the declarations submitted to the Customs by Standard International Trade Classification (SITC) importers and exporters or their agents. Types is used. This in line with the recommendation of of Customs Declarations include Customs 1 for UNSD and International practices which views Imports, Customs 2 for Exports and Customs 3 the SITC as an analytical tool. SITC is under for both Imports and Exports. Other sources the purview of UNSD and it is comparable on of data include from the Free Zone Authorities a worldwide basis up to 5-digit referred to as and the source document is ZB1(Imports) basic heading and this classification facilitates and ZB2(Exports). Currently total number economic analysis. The detailed 9-digit code is of transactions is about 3.2 million a month to cater to national needs and there are about where more than 97% of the declarations are 16,345 codes at the moment. received electronically. The electronic forms has assisted to significantly reduce human 3. Importance and Users of Malaysia’s ETS error and time needed in data processing. ETS is now made available to the public about 5 to External trade is important to the economy 6 weeks after the reference month from 7 to 8 of Malaysia. Malaysia was ranked 23 out of weeks previously. Besides receiving the forms more than 200 countries for exports and 26 for electronically, the shorter processing time is also imports in 2014 for external trade. Also, total attributed to the modernisation of work process trade was RM 1.5 trillion and 135% of GDP. in DOSM where the processing is now done on These indicators reaffirm the importance of PCs instead of mainframe. external trade and the openness of the Malaysia economy. Also in the face of data driven The processing of Malaysia’s ETS follow the decisions, import/export data is much sought guidelines in the International Merchandise for trade promotion and negotiation, analyzing Trade Statistics 2010 Manual. sectoral performance as well as trade by partner The Department of Statistics Malaysia (DOSM) countries. The objective of these activities is and Customs work closely together in order to aimed to improve the economy of the country improve the quality, timeliness and reliability of and enhance its competitiveness. Globalization the External Trade statistics especially in terms has also led to complex data needs in a of quality reporting and at the same time DOSM dynamic and borderless environment. Thus will continue to comply with the international improvements in the delivery system must standards in the compilation of international always be the top priority. Trade data, should merchandise trade. be made accessible anywhere, anytime and on any device to meet user needs. And this can 2.2 Classification be done as technological advancement has erased many barriers to communication eg. with The classification reported in the source the introduction of the world wide web in the document is the Harmonised Commodity 1990’s, the enhancement in storage capacity Description and Coding System (HS) managed and the onset of mobile devices. by the World Customs Organisation. HS classification facilitates tariff collection and is Users of METS include the Government, internationally comparable up to 6 digits. HS is businesses, embassies and researchers. There regularly updated to be up to date with changes were 25,193 users on METS Online from January in technology or patterns of international trade. to September 2015. Besides that, the number Malaysia is currently adopting the latest HS of free download of external trade statistics classification which is HS2012. However, there monthly publication was 11,379 for the same are 2 classification system at the detailed tariff period. DOSM also handle 187 monthly level where for intra-ASEAN trade, the AHTN subscriber files per month while for ad-hoc

20 MyStats 2015 Proceedings requests, it averaged about 80-100 per month. database server. As for software, the system These are some of the statistics to show the uses Linux, Apache, Java and Microsoft SQL. The importance of trade data. total cost involved was estimated at RM200,000.

4. Objective of METS Online 6. Accessing METS Online

Recognising the importance of ETS and METS Online is accessed via DOSM portal (www. the need for timely online data amidst the statistics.gov.my) under online services. It can scenario of globalisation, IT advancement and be accessed via Internet Explorer 8.0+, Mozilla sophisticated users, METS Online which is the Firefox 3.6+ or Google Chrome with a resolution online access to trade data was developed :- of at least 1024 by 768 and on any device be it To improve DOSM’s delivery system by laptop, desktop or mobile devices. Thus users providing timely merchandise trade information can have immediate and direct access to trade online in line also with open data initiatives data at anytime and anyplace and not only To enable that merchandise trade information during office hours or at counters. be accessible anywhere, anytime and on any device to facilitate users and thus able to serve 7. Features of METS Online users in the most effective way possible, and To meet user needs METS Online offers 5 main As part of Innovation activities of DOSM to modules with the details as follows:- increase productivity. Prior to METS Online, ETS was requested via mail, email, fax and 7.1 Overall purchase at DOSM where the service delivery is constrained to office hours. There are 2 output for this module. For the first output, users can obtain the time series 5. Development of METS Online from 1990 to current available month for total exports, imports, total trade and trade balance. The first version of METS Online was developed The second output has a similar time series but in-house by DOSM in 2010. This version was portrays import and export data at SITC 1-digit. also launched on DOSM website in the same This is to assist users who need to look at more year and enables users to obtain trade statistics details of the import export composition. Data between Malaysia and its trading partners in for all the modules can be exported to ‘excel’ the form of fixed template from 2006 to latest format. (until the launch of Version 2) with details up to HS 4 digit/ SITC 3 digit. Version 2, which 7.2 Search by product code is an enhanced/improved version was also developed in-house by DOSM and was launched This module offers the users to search for in September 2014. It offers more details (up products at 2, 4 or 6 digit HS code as well as to HS 6 digit/ SITC 5 digit) and the time series 1 to 5 digit SITC codes. There are about 5,000 starts from 2000. It is also more interactive and 6-digit HS codes and 3,000 5-digit SITC codes has a Search function. The increase in details for of which about 80% of them are active. The HS product codes saw it increasing from about search can also combine with country details, 1200 to 5000 while for the SITC classification, whether single country or multiple countries. the increase was from 262 product codes to The module also provides a Search function by almost 3000. For the development of this keyword and users can also combine it with a version, benchmark reviews were done and specific country. Eg. If you would like to search the countries that were reviewed were USA, for ‘gloves’, all codes which are related to gloves Japan, Indonesia , Thailand and UN Comtrade. will be displayed. This module covers data from This version was mostly benchmarked on UN 2000 to latest available month or year. The same Comtrade. period is also covered for all the other modules except for the overall module which has a The development specification for the IT side in longer reference period from 1990. terms of hardware were the need for two servers which function as the application server and

MyStats 2015 Proceedings 21 7.3 Search by Partner Country improvement in data dissemination and information sharing must always be a key and Under this module, users are able to search for important consideration as users become the partner countries they are interested in. more and more sophisticated. Official statistics They are also able to match up the details they must bring its audience what they are looking want, be it 3-digit SITC or 4-digit HS. Another for. The need for easy accessibility to timely feature is that they can also select multiple comprehensive data is of utmost importance to countries and whether the data is needed at assist better decision making and policy making monthly or annual level. This is useful for users in a timely manner. The effort of improving who need to know the major products traded data transmission through METS is a continuous with a respective country. process. Thus with timeliness, accessibility and comprehensiveness in mind, Mets Online has 7.4 Search by Geographical Grouping continued to improve offering more details from its first launch in 2010 to its second launch in Geographical grouping refers to area location 2014 and also spanning a longer time series. eg. whether it is North America, South Asia, But the journey does not end there. Among the North Africa etc. There are 13 selections proposals for improvement are:- available and users can also opt to select for all the 13 groupings either at total level or by HS/ 8.1 To enhance the scope and SITC breakdown. comprehensiveness for dissemination like increasing to more product codes or 7.5 Search by Economic Grouping disseminating based on the tariff lines which is 9 digit HS, 10- digit AHTN and 9-digit SITC as well This module facilitates the search by economic as major products grouping like ASEAN Free Trade Area (AFTA), North American Free Trade Agreement(NAFTA), 8.2 Increasing the use of data visualisation for European Union (EU) etc. Users who would like better data presentation and communication to analyse from the perspective of economic grouping will find this module useful. There 8.3 And to continuously move in-line with are 7 selections of Economic Grouping for this global developments on online databases and module. As with the 3 modules above, this open data policy selection can be combined with product details to get a more in-depth output. The layout screen 9. Conclusion for all the modules are in the Appendix. Making timely, comprehensive data easily 8. Way Forward accessible and available to users which include stakeholders or the general public continues Disseminating data online is a vast to be a very important agenda for DOSM. The improvement to what used to be conventional Department recognises the importance of methods of dissemination like hardcopies and timely and accessible data for better decision softcopies. However in the era of globalisation making and will continue to strive to improve its and technology advancement in IT, continuous data dissemination activities.

References

United Nations(2011), International Merchandise Trade Statistics, Concepts and Definitions 2010

Department of Statistics Malaysia, Pelan Transformasi Jabatan Perangkaan Malaysia 2015-2020

Zeelenber Kees, and de Bie Steven, 2012, Trust and Dissemination in Official Statistics, Statistics New Zealand

22 MyStats 2015 Proceedings SYNERGISING GST RATE WITH DIRECT TAX RATE IN SUSTAINING ECONOMIC GROWTH IN MALAYSIA: IS THERE A LAFFER CURVE Sherly George 1, Prof Dr Syed Omar Syed Agil 2

Abstract This paper derives the optimum tax rates for both individual and corporate denoted as The current individual and corporate tax base Malaysia Optimum Individual Tax Rate (MOITR) rate imposed in Malaysia does not seem to and Malaysia Optimum Corporate Tax Rates generate the best possible tax revenue at its (MOCTR) and GST rate in generating maximum maximum point thus affecting the economic individual and corporate tax revenue. growth indirectly. The importance of generating higher tax revenue is to finance government 1 INTRODUCTION expenditures over the years. Insufficient tax revenue will lead to government borrowings The increasing indebtedness of financing and severe government debts over the years government expenditures is a burden to many if this issue is not adhered to immediately. economies globally. One way to reduce the With the appropriate tax rate for individual problem is to fine tune the taxation system with and corporate, the GST rate should then be rates that generates greatest revenue without synergized to ensure increased economic jeopardising the effect on the economy. The growth rate in Malaysia. level of income tax and the overall effectiveness of the system have been a major debate in The main objective of this research is to scientific and social circles (Karas M, 2012). determine optimum tax rate appropriate for Ibn Khaldun pondered the idea already in both individual and corporate tax. With the the 14th century, and so did John Maynard, optimum tax rate obtained, the Malaysian Keynes and others (Lévy-Garboua, Masclet, government is able to generate maximum Montmarquette, 2007) in the twenth century. tax revenue for both individual and corporate The matter got more attention in the 1970s, respectively. From these rates, the GST rate is when the prevailing Keynesian economics could also determined. not explain the phenomenon of stagflation and their methods were unable to deal with Optimum Tax Theory models using Laffer curve it. Advocates of the supply side theory of concept is used to estimate the tax rates for economics came up with the argument of individual and corporate where at this point excessive taxation articulated by Adam Smith the tax revenue is at its maximum point thus in his Inquiry into the Nature and Causes of the contributing to the economic growth. Wealth of Nations (Hsing, 1996). The correlation Data of individual and corporate tax rates, between the tax rate and the tax revenue tax revenue are gathered for over 34 years came to be known as the Laffer curve (Laffer, (1980-2013) from Data Stream, Department of 1981).The supply-side economists noted the Statistics, Bank Negara Malaysia and World Bank. reason for stagflation is due to excessive tax The data will be analyzed using Ms-Excel and burden and an economy over-regulated by EViews 8. the government ( Van Dujin, 1982). In order to

1, 2 Universiti Tun Abdul Razak Malaysia

MyStats 2015 Proceedings 23 solve the stagflation problem, the policy of a tax growth rate is above zero. He added society reduction and a deregulation of the economy needs to think on security, safety and honesty must be revisited (Burfa, Wyplosz, 1993). Supply- in courts. Becsi( 2000) summarised by saying, if side is an idea initiated by Laffer curve which government tax is reduced and at the same time shows the relationship between tax revenue there is an increase in government expenditure and tax rate. and decrease in public investments, there is a chance of losing optimal tax revenue as shown 1.1 Laffer Curve and Models in Figure 3.Henderson (1998) had a different opinion with regards to the Laffer curve. He The Laffer curve introduced by Arthur Laffer concluded that the people will not work in 1974, states that there is a parabolic hard even if there is a tax cut, thus creating a relationship between tax revenue and tax rates. complicated Laffer Curve as shown in Figure 4 In 1986, Arthur Laffer noted there were always below. People will tend to spend more leisure two tax rates that yield the same revenues. time then go to work as a result of cut in tax When tax rate is altered, the taxable income also rate, moving from Point A to Point B, thus changes but there is a point where a reduction resulting in lower tax revenue. On another note, in taxable income from higher tax is enough to Henderson (1998) discussed that a tax cut can completely offset the higher tax rate. This point increase inflation, which is also another tax, is called the “revenue maximizing” point. known as inflation tax. Should the tax cut not There are two types of models discussed with result in immediate increase in tax revenue and regards to the illustration of the Laffer curve. The at the same time government does not decrease static scoring model (refer to Figure 1) denotes government expenditure, budget deficit will the higher the tax, the higher the tax revenue definitely appear. Therefore, in countries which collection by the government. Figure 2 shows have issues with power of currency, inflation the dynamic scoring model which explains the would increase. He also added that an increase illustration of Laffer curve theory. At 0% tax rate, in tax revenues could be due to population the government will collect no tax revenue. growth. However, when the tax rate is 100%, people will not work and the tax revenue collected will be Another simple model was developed by (Feige, zero. This leads us to a notion that the Laffer Edgar L., Robert McGee, 1982) where the model curve is indeed a polynomial curve as shown in shows the shape and position of Laffer Curve Figure 2. depends on power supply and progressive tax system in Sweden. An empirical study of Laffer noted there were two main points transfer adjusted tax rates in OECD countries and that is the growth maximising point and was done to determine whether optimal tax revenue maximising point. The question is rate was applied in the countries. This study which point will benefit the country? There was backed by a simple endogenous growth were critical opinions from a few economists model designed by (Jonas Agell, Mats Persson, who argue that the dynamic scoring conclusion 2000). Jesus Alfonso Novales and Ruiz (2002) is overstated by Congressional Budget Office concluded that tax cuts on labour and capital (CBO)1 because of the inclusions of some income has a positive effect on the growth rate dynamic scoring elements. Including more in an economy. They managed to study how elements will result to the politicisation of to manage deficit by substituting government the department. Mitchell, (2009) stressed debt with taxes. that optimum tax revenue is not good for the economy as a whole. He said the ideal policy The main objective of this research is to is at the optimum growth rate where the point determine the optimum tax rate for both is situated on the upward slopping curve and individual and corporate tax rate in order to revenue is at an increasing level at that point. achieve higher individual and corporate tax There is no necessity for tax revenue to ensure revenue. market function because the maximising

24 MyStats 2015 Proceedings Figure 1 : Static Scoring Model Figure 2 : Dynamic Scoring Model

Figure 3 : Tax Revenue with or without Government Expenditure

Figure 4 : Complex Form of Laffer Curve

MyStats 2015 Proceedings 25 1.2 Laffer Curve and Models towards future analysis of how tax revenue and tax rates are related and how these In order to obtain the optimum point for both variables play an important role in financing individual and corporate tax rate, the Laffer government expenditure over the years. curve model is used. The Laffer curve introduced Without doubt, tax is an important by Arthur Laffer (1974), states that there is a instrument used by government to gain a parabolic relationship between tax revenue country’s national income. The income gained and tax rates. In 1986, Arthur Laffer noted there by a government through taxation is recognised were always two tax rates that yield the same as tax revenue and the rate imposed by a revenues. When tax rate is altered, the taxable government for both individual and corporate income also changes but there is a point where are called tax rates. Several authors have a reduction in taxable income from higher tax conducted researches on tax rates and tax is enough to completely offset the higher tax revenues in various aspects. Different authors rate. This point is called “revenue maximizing” have different opinions on the macroeconomic point. There are two types of models discussed indicators provided on significant impact on tax with regards to the illustration of the Laffer rates and tax revenues. curve. The static scoring model (refer to Figure 1) denotes the higher the tax, the higher the tax A study conducted by (Arthur B. Laffer, 2004) on revenue collection by the government. Figure tax rates and tax revenues led to the creation 2 shows the dynamic scoring model which of a new theory and model which illustrates as explains the illustration of Laffer curve theory. Laffer Curve. Laffer Curve shows that tax rates At 0% tax rate, the government will collect have two effects on tax revenues: no tax revenue. However, when the tax rate is 100%, people will not work and the tax revenue (a) Arithmetic Effect (AE): AE indicates the collected will be 0. This leads us to a notion that relationship between tax rates and tax the Laffer curve is indeed a polynomial curve revenues are positive. It can be said that when as shown in Figure 2. Mitchell (2009) stressed tax rates are lower, tax revenue will be lowered that optimum tax revenue is not good for the by the amount of the decrease in the constant economy as a whole. He said the ideal policy rate. is at the optimum growth rate where the point is situated on the upward slopping curve and (b) Economic Effect (EE): EE recognizes revenue is at an increasing level at that point. positive impact that lower tax rates have There is no necessity for tax revenue to ensure on work, output, employment and tax base market function because the maximising by providing incentives to increase these growth rate is above zero. He added society activities. needs to think on security, safety and honesty in courts. This research encompasses both Arithmetic and Economic Effect justifying the 1st and 3rd 2. LITERATURE REVIEW research objectives and hypothesis. The first hypothesis and research question is to find and Generally, many authors have highlighted justify the relationship between tax rates and the relationship between tax rates and tax tax revenue in Malaysia which is the arithmetic revenue which is uniquely discussed. effect. The third objective and hypothesis is to It is intriguing to know that the basic investigate and justify the determinants that underlying theory between tax rates and affect tax revenue in Malaysia which is the tax revenue was introduced by the Muslim economic effect. philosopher, Ibn Khaldun in the 14th century in his research entitled “The Muqaddimah: Therefore, arithmetic effects always work on It should be known that at the beginning of the opposite direction of the economic effect. the dynasty, taxation yields a large revenue Change in tax rates on the tax revenues has from small assessments. At the end of the no longer effect when arithmetic effect and dynasty, taxation yields small revenue from economic effect combine together. large assessments”. His contribution was vital The initial study of an inverse relationship

26 MyStats 2015 Proceedings between tax rates and revenue was conducted Winniski,1978) showered insights of every fiscal by Adam Smith in his book The Wealth of mishaps starting from the fall of the Roman Nations (1776) stating the point: Empire to the Great Depression were related to high tax rates over the years. (Grieson et al, High taxes, sometimes by diminishing the 1970) deduced in his research the possibility consumption of the taxed commodities, of inverse relationship between tax rates and and sometimes by encouraging smuggling, revenue for local government in New York. He frequently afford smaller revenue to quoted “The inclusion of state taxes lost when government that what might be drawn from economic activity leaves both the city and more moderate taxes (Book V, Chapter II)” the state would……raise the possibility of a Research by (Caves and Jones, 1973) has net revenue loss as a result of an increase in proven the existence of a revenue maximizing business income taxes”. From his research, it tariff which had a humped-shaped tariff was concluded that the non-manufacturing revenue curve identical to Figure 5. has lesser choices to the location in New York and should be taxed heavily compared to manufacturing sector where the responsiveness towards tax is elastic. In Philadelphia, the case was reverse for both sectors when nonmanufacturing are under greater competitive pressure. Philadelphia was close to the revenue optimization point before the income tax increase which has resulted in excess of the socially optimum point.

In Sweden, higher tax rates led to barter system and passive market activity resulted in Figure 5: "Humped Shape" : Don Fullerton (April 1980) the economy to be situated in the prohibitive range. This was based on (Charles Stuart, 1979) Jules Dupuit (1844) states: research where a two sector model used to “By thus gradually increasing the tax it will reach determine the 80% marginal tax wedge in a level at which the yield is at a maximum…. Sweden exceeds their revenue maximizing rate beyond the yield of tax diminishes…..lastly a tax by 10%. This research will focus on determining (which is prohibitive) will yield nothing”. the relationship between individual and corporate tax revenue and with its respective Without doubt, there were several debates by tax rates based on linear and quadratic equation politicians and economist on the unsupported using simple linear regression and quadratic claims and opinions on the supposed range of equation respectively. Upon determining the tax rates which will generate high tax revenue. relationship between tax revenue and tax rates, The notion that “all is well” with the current tax the next step is to determine the optimum range even under the prohibitive area does not individual and corporate tax rate that will seem to be the key issue in Malaysia despite maximize tax revenue. The next section will knowing the importance of tax as the main highlight literature review on optimum tax rates tool for fiscal policy. Simple theoretical models conducted by distinguish authors. depicting prohibitive range indeed does exist. This research uses the data for tax rates and tax 2.1 Optimum Tax Point revenues to identify the current range that the Malaysian tax base is situated and how it affects Optimum point merely means identifying one the tax revenue of Malaysia. distinct point or range that would maximise revenue or growth of a country. Known as The quality of debate on high taxes generates optimum tax theory or the theory of optimal lower tax revenues after the introduction of taxation, this theory is used in this research to Laffer curve and the writings of Smith-Dupuit identify a revenue maximising point for tax rates curve in 1974 deteriorated gradually. (Jude for both individual and corporate tax.

MyStats 2015 Proceedings 27 Implementing this tax rate will induce researchers’ view of the curve is the optimum reduction and distortions caused by taxation tax rate that would fetch maximum revenue. in the market under various economic Any rise from that would reduce tax revenue. conditions. According to (Bruce, Donlad; John Figure 6 shows this principle. It is a graphic Deskins and William Fox, 2005), using the illustration of the concept of the Laffer Curve. optimal taxation theory to reduce inefficiency Figure 6 show that the government would not and distortion in the market using Pareto collect any tax revenue when the tax rates are at optimal moves is debated constantly. It is 0%. Government would collect no tax revenue important for the government to reduce the when tax rates are at 100% because there would inequality and inefficiency in the economic be no one who would be willing to work to earn market in order to increase tax revenue. an income. The Laffer curve neatly illustrates Therefore the most vital purpose of the tax that there is an optimum point of tax rate system is to generate an amount to generate beyond which total tax revenues decline. There sufficient revenue to finance government is one point between 0% and 100% where a tax expenditures. The limelight of this research rate will maximize tax revenue. This optimal rate is to determine whether an optimum tax would lie between any percentage greater than point can be determined in Malaysia taxation 0 % and less than 100 %. By lowering tax rates scenario using the Laffer Curve concept through tax cuts will increase revenue and at the which leads us to the next section on detailed same time it stimulates the economy. Through literature on the Laffer Curve. this action, it leads to the increase of output, employment and production which is a positive 2.2 The Historical Origins and the Theory of the economy indicator. Arthur B. Laffer stated that Laffer Curve lower unemployment and higher income is the sign where the economy growth is rapid. Laffer contends that the theory expressed and experimented by him was first propounded by Ibn Khaldun, a 14th century Muslim philosopher. This was followed by the opinion of John Maynard Keynes: who stated that “reduction of taxation will run a better chance than an increase in balancing the budget.” In the words of Laffer, “The basic idea behind the relationship between tax rates and tax revenues is that changes in tax rates have two effects on revenues: the arithmetic effect and the economic effect. The arithmetic effect is simply that if tax rates are lowered, tax revenues (per dollar of tax base) will be lowered by the amount of the decrease in the rate. The reverse is true for an increase in tax rates. The economic Figure 6 : The Laffer Curve effect, however, recognizes the positive impact that lower tax rates have on work, output, and employment--and thereby the tax base-by 3. METHODOLOGY providing incentives to increase these activities. Raising tax rates has the opposite economic The methodology of this research is mainly a effect by penalising participation in the taxed single research method of study rather the usual activities. The arithmetic effect always works mix method used. This research encompasses in the opposite direction from the economic solely quantitative method. Data collected effect. Therefore, when the economic and are mainly secondary data collected from the arithmetic effects of tax-rate changes are reliable sources. The primary data is mainly combined, the consequences of the change in discussion with the tax officials with regards to tax rates on total tax revenues are no longer this research study which seemed unique and quite so obvious.” (Laffer 1st June, 2004). The challenging for the officials . This research study

28 MyStats 2015 Proceedings is divided into two main sections. the inverse ‘U’ shaped theory does not portray This research uses the Laffer curve concepts a linear relationship but rather a quadratic consisting of two main variables that is tax (polynomial) relationship between individual revenue and tax rates for both individual/ tax rates and individual tax revenue. Data was corporate tax to obtain the optimum point. collected on individual tax rates, individual The common methodology used in USA to tax revenue from the year 1980 until 2013. indicate the optimum point of tax rates based From here the individual tax rates ranged on tax revenues is by using the Laffer curve from approximately 35% to 25% indication a methodology. Laffer’s Curve is basically an non- decrease in individual tax rates over the years. linear equation that indicates the relationship However, each 1% unit decrease in individual between individual/corporate revenue and tax rates is sustained as long as 5 to 6 years. individual/corporate tax rates which produces The individual tax rate from 1980 to 1992 (12 an inverse “U” shape to denote the optimum years) was 35%, 1993 and 1994 was 34% and tax rate that will generate higher government 32% respectively. For the next four years, 1995 revenue. This research uses a different approach to 1999 the individual tax rate was 30% and it in detecting the optimum tax rates for Malaysia stood at 29% for the year 2000 and 2001. For for individual and corporate tax respectively the next 6 years (2002 to 2006), the individual using software Ms Excel and EViews 8. Tax rates tax rate was 28%. In 2007, the individual tax rate of 34 years (1980-2013) were gathered from was 27% followed by a decrease to 26% in 2008 Data Stream, Department of Statistics, Bank and 2009. From 2010 to 2011, the individual tax Negara Malaysia and World Bank, analysed and rate has dropped 1% to 25% in 2012 and 2013. plotted using polynomial trend line. From the The relationship between individual tax revenue plotted curve, the 1 Malaysia optimum points and individual tax rate is denoted by polynomial for individual and corporate tax rate can be quadratic model as shown below: determined. The model for the optimum point for individual and corporate are denote as Malaysia Optimum Individual Tax Rate (MOITR) and Malaysian Optimum Corporate Tax Rate The relationship between corporate tax revenue (MOCTR). and corporate tax rate is denoted by polynomial quadratic model as shown below: 4. EMPIRICAL ANALYSIS

Over the years the validity of the optimum tax Data was collected on corporate tax rates, rates was discuss by many political economists. corporate tax revenue from the year 1980 until Several concepts and variables were used 2013. Here the corporate tax rates ranged to explain the optimum tax theory. In this from approximately 35% to 25% indication a research, the optimum tax rate is obtained for decrease in corporate tax rates over the years. both individual and corporate tax rate using a However, each 1% unit decrease in corporate unique R-programming and Ms Excel software tax rates is sustained as long as 5 to 6 years. to obtain the graphical presentation of Laffer The corporate tax rate from 1980 to 1992 (12 curve concept. Determining an optimum tax years) was 35%, 1993 and 1994 was 34% and rate here defines clearly the tax rate that should 32% respectively. For the next four years, 1995 be charged to both individuals and corporates to 1997 the corporate tax rate was 30% and in order to maximise the generation of tax it stood at 28% for the year 1998 and 2007. revenue. In 2008, the corporate tax rate was 27%. In 2009, the corporate tax rate was 2% followed 4.1 Model for Individual/Corporate Tax Revenue by a decrease to 25% from 2010 till to date. and Individual/Corporate Tax Rate In the recent Malaysian Budget 2014, it was announced that the corporate tax rate will The vital section of finding the optimum drop another 1% in 2014 and an additional individual tax rate was to find the relationship 1% in 2015 with the introduction of Goods between individual tax rate and individual and Services Tax (GST) in April 2015. With the tax revenue. Based on the Laffer curve theory, incremental decrease of 1% to 2% for a period

MyStats 2015 Proceedings 29 of a few years, no doubt has resulted in an the regression line of each individual/corporate increase in corporate tax revenue over the years. tax revenue and individual/corporate tax rates However, the fluctuation values of corporate as shown below: tax revenue and is not at its optimal point. This Figure 7 depicts the both individual and research is to determine the corporate tax rate corporate tax rates exhibits a downward trend that can generate the maximum corporate tax from the year 1960 to 2013. And during the revenue for the Malaysian government. same period, both GDP and individual and corporate tax revenues show an upward trend This equation is produced based on inverted “U” during the same period. The scatter plot shows shape of the Laffer curve concept which showed an inverse relationship between tax revenue a polynomial relationship. The approximate and tax rates. In other words, tax revenues sample size of data obtain is 34. The objective of increases as tax rate decreases. collecting individual tax revenue and individual taxes is to determine the optimum point of the 4.3 Measures of Central Tendency individual tax rate that the government should charge the individuals in order to gain the The summary statistics of individual and highest individual tax revenue. EViews 8 and Ms corporate tax rates and tax revenues and GDP Excel is applied for analysis of 34 data collected are shown in Table 1. The average or mean tax to further enhance the research using a unique rate for individual tax rate and corporate tax software. rate is 29% and 29.5% respectively. Average individual and corporate tax revenues are 4.2 Empirical Test and Analysis – Optimum Tax RM 8,179.84 and RM 17,120.87 million, Rate respectively. The standard deviation for corporate tax revenue is larger than the This section discusses the trend of the data standard deviation for individual tax revenue for both individual/corporate revenue and thus indicating that there is a large variation in individual/corporate tax rates. It also explains corporate tax revenue.

Figure 7: Individual/Corporate Tax Rates vs Individual/Corporate Tax Revenues

30 MyStats 2015 Proceedings Table 2 shows correlation between the 4.4 Malaysian Optimum Individual Tax Rate dependent variables- individual and (MOITR) and Malaysian Optimum Corporate Tax corporate tax revenues with independent Rate (MOCTR). variables- individual/corporate tax rates. The correlation between individual tax revenue and With the empirical analysis given in this tax rates is inversely related with the value of section, we can conclude that it is appropriate 0.16170 indicating insignificant values at 1% to determine both optimum individual/ and 5% confidence level. These results indicate corporate tax rate. This is important for there is no relationship between individual/ the Malaysian government because at this corporate tax rates with individual/corporate tax optimum point, the government is able to revenues. The values depict a weak relationship generate the maximum tax revenue which is between tax rates and tax revenue. One possible vital to assist the government expenditures reason for this could be the results depicted and to reduce government debt. The generic the gradient of half side of the curve which relationship between tax revenue and tax is downward slopping indicating a negative rates can be expressed as a negative quadratic relationship. The coefficient values are low due relationship. to the range in the individual/corporate tax By using R-Programming, the empirical analysis rates charged is between 25% to 35% over 30 of best fit the model is as given below: years and the data is cluttered on the right side of the Laffer curve.

However, the correlation between individual/ From the above quadratic equation, β₁ indicates corporate tax revenues and GDP indicates the intercept of ITRev value when ITRev equals a positive and strong relationship. The to zero. So when ITRate is zero, ITRev will relationship between individual/corporate tax equal to RM 1136.2 million. The coefficientβ ₂ , revenue versus individual tax rates is fitted by of ITRate is the slope of the line that is tangent using the quadratic model (1). Table 3 shows the to the parabola as it crosses the Y-Axis. If β₂> 0, empirical analysis between both tax revenues i.e., 1857.3 > 0, then the parabola is downward ITRate versus tax rates. The R squared value for sloping at t = 0. The slope of the tangent ITRate β individual model is 0.7160 indicating the model line at an arbitrary t value equals to ( ₂ + β ITRate ITRate explains 71.6% variability of the individual tax 2 ₃ t ), that is, as t increases, this revenues. For corporate tax 78.16% variability of slope will change linearly. the corporate tax revenues were explained from When the slope is zero, the relationship changes the R squared value of 0.7816. F-statistics for direction from positive to negative or from ITRate β both models are significant at 5% and 1% levels negative to positive, this point is t = - ₂/ thus indicating that the models are appropriate 2 β₃ . In this case, the optimum point is to present the relationship between the tax revenues and tax rates. The Akaike Information Criterion (AIC) is defined as the log-likelihood term penalized by the number of model This is the point at which the mean of ITRev t parameters. From the table, the individual and will at its maximum revenue if the individual corporate tax revenues have larger AIC values; tax is 17.78%. From the curve plotted, the 518.095 and 639.051, respectively. Here, the scatter plots points shows as the government larger AIC values indicate the model is better decreases individual tax incrementally with to estimate the residual values. Hence the small amounts over the years, the individual tax polynomial model proposed is a good model revenue increases. However, the optimum point to find the optimum tax rate for both individual of tax rate is not reached in order to maximise and corporate tax. individual tax revenue.

MyStats 2015 Proceedings 31 Figure 8 MOITR Figure 9 MOCTR

Figure 10 Singapore GDP, Tax Revenue, Tax Rates and GST

32 MyStats 2015 Proceedings more to spending thus increasing the 4.5 Optimum GST rate (MOGST) consumption level in the economy. This in turn will increase the GDP of Malaysia. In conclusion, Upon obtaining the MOITR nad MOCTR, the suggested Malaysia Optimum Individual Tax the next stage is to obtain the appropriate Rate will have to be 17.58%. rate for Malaysia which will be denoted as As for the generic relationship between MGST to ensure the economic is sustained corporate tax revenue and tax rates can annually. Due to insufficient data for Malaysia, be expressed as a negative quadratic Singapore, Thailand and Indonesia data is relationship given by the equations below: used to estimate the appropriate GST for each country.

4.5.1 Singapore When CTRate n is zero, will ITRev equal to RM 1824.2 million. The coefficientβ ₂, of ITR is the Singapore implemented 3% GST since 1994 slope of the line that is tangent to the parabola with the individual tax rate of 30% and as it crosses the Y-Axis. If β₂> 0, i.e., 4749.1 > corporate tax rate of 27%. The economic growth 0, then the parabola is downward sloping at decreased from 11.54% yearly to 10.93 in 1994 CTRate t =0. The slope of the tangent line at and recorded 7.0-7.50% in 1996/7 and peaking CTRate β an arbitrary t value equals to ( ₂ + 2 to 8.29% in 1998 before hitting rock bottom β CTRate CTRate ₃ t ), that is, as t increases, this in 1998 to -2.23% due to Asian Financial Crisis. slope will change linearly. By dropping the individual and corporare tax When the slope is zero, the relationship changes rate to 22% has revived the economic growth direction from positive to negative or from to 6.1% and then it has been fluctuating since. CTRate β negative to positive, this point is t= - ₂/2 What was evident was when the GST was β₃ . In this case, the optimum point is increased to 7% the economic growth dropped CTRate t= - 4749.1/2(−136.85)=17.35%. to 1.79% from 9.11% in 2008 of course which This is the point at which the mean of could be due to Global Financial Crisis. However CTRev ttakes its maximum revenue when the with the individual tax rate dropping to 20% Malaysian optimum corporate tax rate is 17.35%. and corporate tax rate to 17%, the Singapore The Laffer curve is then plotted for both economy recovered slightly with the GST rate individual and corporate tax rate and the curves of 7%. are as shown in Figure 8 and 9 respectively. Therefore, it was evident, that the individual/ From the curve plotted, the scatter plots points corporate tax rate and GST was not synergised shows that both individual/corporate tax are in to ensure the sustainability of Singapore the prohibited range which is not generating economic growth rate. From the data obtained, maximum government revenue.Therefore it the concept of Laffer Curve was executed and is important that the government decreases the results are as shown in Figure 10. Using the both individual/corporate tax rate in order to Laffer Curve concept, it is predicted that in order generate higher tax revenue. The above curve for Singapore to achieve 8% GDP annually, the curve clearly indicates the existence of Laffer individual tax rate should be 34%, corporate tax curve in Malaysia. rate at 25% and GST rate should be 5% annually.

MyStats 2015 Proceedings 33 BIBLIOGRAPHY

Anthony Barnes Atkinson (1996) The Economics of the Welfare State.. Journal Title: American Economist. Volume: 40. Issue: 2. Publication Year: 1996. Page Number: 5+. COPYRIGHT 1996 Omicron Delta Epsilon; COPYRIGHT 2002 Gale Group

http://www.questia.com/read/5001641605?title=The%20Economics%20of%20the%20Welfare%20 State

Arthur B. Laffer (2004) The Laffer Curve: Past, Present, and Future Backgrounder #1765 June 1, 2004 http://www.heritage.org/Research/Taxes/bg1765.cfm [06-11-2009]

A. Laffer (June 1 2004) The Laffer Curve: Past, Present, and Future by Arthur Laffer

Viewed on line [ 03/15/2010] URL: http://www.heritage.org/Research/Reports/2004/06/The-Laffer- Curve-Past-Present-and-Future

Arthur B. Laffer (June 1, 2004), “The Laffer Curve: Past, Present and Future”. Retrieved from website: http://gates-home.com/files/Laffer%20Curve%20-%20Past%20Presetn%20and%20Future.pdf

Arvind Panagariya (1994) India: A New Tiger on the Block. Contributors: Arvind Panagariya (1994) Journal Title: Journal of International Affairs. Volume: 48. Issue: 1. Publication Year: 1994. Page Number: 193-221. COPYRIGHT 1994 Columbia University School of International Public Affairs; COPYRIGHT 2002 Gale Group

http://www.questia.com/read/5001708989?title=India%3a%20A%20New%20Tiger%20on%20the%20 Block Bartelsman, E.J., Beetsma, M.J. (2003). Why pay more? Corporate tax avoidance through transfer pricing in OECD countries. Journal of Public Economics 87 (9–10), 2225–2252.

Becsi, Z. (2000). The Shifty Laffer Curve. Federal Reserve Bank of Atlanta, Economic Review, 53-64.

Boskin, M., Gale, W. (1987). New results on the effects of tax policy on the international location of investment. In: Feldstein, M. (Ed.), The Effects of Taxation on Capital Accumulation. University Press, Chicago.

Busato, F., Chiarini, B., 2009.Steady State Laffer Curve with the Underground Economy,LUISS Lab of European Economics Working Document No. 85.

Charles L Ballard, Don Fullerton, John B.Shoven, John Willey, 2009, The Relationship between Tax Rates and Government Revenue, Economic Review, 56-63.

Chris Atkins July 9, 2007 In OECD Comparison of Wage Taxes, U.S. Ranking Would Slip Badly if 2001 Tax Cuts Expired Tax Foundation Fiscal Fact, No. 89 Fiscal Fact No. 89, PDF, 33.6 KB

http://www.taxfoundation.org/files/ff89.pdf Chris Edwards (2003) Replacing the Corporate Income Tax with a Cash-Flow Tax. Contributors: Journal Title: The Cato Journal. Volume: 23. Issue: 2. Publication Year: 2003. Page Number: 291+. COPYRIGHT 2003 Cato Institute; COPYRIGHT 2005 Gale Group

http://www.questia.com/read/97876452?title=Federal%20Tax%20Treatment%20of%20Foreign%20 Income

34 MyStats 2015 Proceedings Clausing, K. A. (2003). T ax-motivated transfer pricing and US intrafirm trade prices. Journal of Public Economics 87 (2003) 2207–2223

Conefrey, T. and Gerald, J.D.F. (2011). The macro-economic impact of changing the rate of corporation tax. Economic Modelling 28 (2011) 991–999

David Malin Roodman (1995) Public Money and Human Purpose: The Future of Taxes. Magazine Title: World Watch. Volume: 8. Issue: 5. Publication Date: September-October 1995. Page Number: 10+. COPYRIGHT 1995 Worldwatch Institute; COPYRIGHT 2002 Gale Group http://www.questia.com/read/5000352222?title=Public%20Money%20and%20Human%20 Purpose%3a%20The%20Future%20of%20Taxes

Dan Mitchell, 2009, Obama’s Class-Warfare Tax Policy Threatens America’s Economy, http:// danieljmitchell.wordpress.com/2009/06/15/obamas-tax-policy-threatens-americas-economy/, Retrieved 31st January 2014

Dan Mitchell, 19th May 2012, American Politicians Should Learn Some Policy Lessons from Hong Kong and Singapore, International Liberty, http://danieljmitchell.wordpress.com/2012/05/19/american- politicians-should-learn-some-policy-lessons-from-hong-kong-and-singapore/, Retrieved 31st January 2014

De Mooij, R. A., Ederveen, S. (2001). Taxation and Foreign Direct Investment: A Elitzur, R and Mintz, J. (1996). Transfer pricing rules and corporate tax competition. Journal of Public Economics 60 (1996) 401-422.

Ernesto Screpanti, Stefano Zamagni (2005) An Outline of the History of Economic Thought. Publisher: Oxford University Press. Place of Publication: Oxford, England. Page Number: 32. http://www.questia.com/read/108793096?title=A%20History%20of%20Japanese%20Economic%20 Thought

Francesco Busato, Bruno Chiarin (2009) Steady state Laffer curve with the underground economy. Department of Economic Studies University of Naples Discussion Paper No.2/2009 http://74.125.155.132/search?q=cache:8VPAOLByMVgJ:economia.uniparthenope.it/ise/sito/DP/ DP_2_2009.pdf+The+Laffer%E2%80%99s+Curve&cd=8&hl=en&ct=clnk&gl=in

Frank Ackerman et al (2004) The Flawed Foundations of General Equilibrium Theory: Critical Essays on Economic Theory. Routledge. New York. Publication Year: 2004. Page Number: 150. Authors Frank Ackerman, Alejandro Nadal, Carlo Benetti, Kevin P. Gallagher, Carlos. http://www.questia.com/read/107921724?title=The%20Flawed%20Foundations%20of%20 General%20Equilibrium%20Theory%3a%20%20Critical%20Essays%20on%20Economic%20Theory Frank J. Fabozzi (2002) The handbook of financial instruments Edition: illustrated Published by John Wiley and Sons, ISBN 0471220922, 9780471220923 pages 9,31 http://books.google.co.in/books?hl=en&lr=&id=R7YGTqsLVCYC&oi=fnd&pg=PR9&dq=procedure,+me rits+and+disadvantages+of+fair+valuation+of+financial+and+derivative+instruments.&ots=MffdaMs WGp&sig=0MXpsf5Jv9LzLypgZFuTGKDL3qc#PPA51,M1

Fullerton, Don (1980). On The Possibility On An Inverse Relationship Between Tax Rates And Government Revenues. NBER Working Paper 467 (April 1980)

MyStats 2015 Proceedings 35 Gary M. Woller, Kerk Phillips (1996) Commercial Banks and LDC Debt Reduction. Journal Title: Contemporary Economic Policy. Volume: 14. Issue: 2. Publication Year: 1996. Page Number: 107. http://www.questia.com/read/96396378?title=Commercial%20Banks%20and%20LDC%20Debt%20 Reduction

George, S. (2013). The Effects of Tax Rates on Its Revenue and Growth: A Case Study of Malaysia. Journal of Global Business and Economics (2013) Volume 6. Number 1.

Grupert, H., Mutti, J. (1991). Taxes, tariffs and transfer pricing in multinational corporate decision making. The Review of Economics and Statistics 73 (2), 285–293. Hartman, D.G. (1984). Tax policy and foreign direct investment in the United States. National Tax Journal 37, 475–488.

Heijman, W.J.M. and von Ophem, J.A.C. (2005). Willingness to pay tax The Laffer curve revisited for 12 OECD countries. The Journal of Socio-Economics 34 (2005) 714–723

Heijman,W.J.M. and van Ophem, J.A.C. (2005). Willingness to pay tax: The Laffer curve revisited for 12 OECD countries. The Journal of Socio-Economics 34 (2005) 714–723.

Henry J. Aaron, William G. Gale(1996) . Economic Effects of Fundamental Tax Reform. Brookings Institution Press. Place of Publication: Washington, DC. Publication Year: 1996. Page Number: 30. http://www.questia.com/read/35267189?title=Economic%20Effects%20of%20Fundamental%20 Tax%20Reform

Hsing, Yu (1996). Estimating the Laffer Curve and Policy implications. Journal of Socio-Economics, Volume 25, No. 3, pp. 395-401

James M. Buchanan, Yong J. Yoon (1995) Rational Majoritarian Taxation of the Rich: With Increasing Returns and Capital Accumulation.. Journal Title: Southern Economic Journal. Volume: 61. Issue: 4. Publication Year: 1995. Page Number: 923+. Southern Economic Association; COPYRIGHT 2002 Gale Group

http://www.questia.com/read/5001648314?title=Rational%20Majoritarian%20Taxation%20of%20 the%20Rich%3a%20With%20Increasing%20Returns%20and%20Capital%20Accumulation John Creedy (2001) Tax Modelling Journal Title: Economic Record. Volume: 77. Issue: 237. Publication Year: 2001. Page Number: 189. COPYRIGHT 2001 Economic Society of Australia; COPYRIGHT 2002 Gale Group

http://www.questia.com/read/5001038009?title=Tax%20Modelling%20[*] (important work) Karas, M. (2012). TAX RATE TO MAXIMIZE THE REVENUE: LAFFER CURVE FOR THE CZECH REPUBLIC. Acta univ. agric. et silvic. Mendel. Brun., 2012, LX, No. 4, pp. 189–194 Kimberly Amadeo, 2010 Obama Tax Cuts, Unemployment Benefits, College Tax Creidts Extended, US.Economy, About Us.Com,

http://useconomy.about.com/od/usfederaltaxesandtax/tp/Obama-Tax-Cuts.htm, Retrieved 31st January 2014

Laffer, Arthur (26th Oct 2013), Arthur Laffer: cuts succeeded where stimulus failed High government spending increases unemployment and slows economic recovery, The Spectator, http://www.spectator.co.uk/features/9063861/austerity-works/, Retrieved 31st January 2014

Laffer, A. B., Moore, S., & Tanous, P. J (2008). The end of prosperity: How higher taxes will doom the economy—If we let it happen. New York, NY: Threshold editions.

36 MyStats 2015 Proceedings Lee, Y., Gordon, R. (2005). Tax structure and economic growth. Journal of Public Economics 89, 1027–1043.

Leland G. Neuberg (1997) Doing Economic Research: Essays on the Applied Methodology of Economics. Journal Title: Eastern Economic Journal. Volume: 23. Issue: 4. Publication Year: 1997. Page Number: 498+. © 1997 Eastern Economic Association. Provided by ProQuest LLC. All Rights Reserved. http://www.questia.com/read/5036020695?title=Doing%20Economic%20Research%3a%20 Essays%20on%20the%20Applied%20Methodology%20of%20Economics

Teng, L. J. (17th Feb 2014). Govt offers tax perks to spur SME mergers. Malaysia Sun Newspaper, Sunbiz. Lindsey, L.B. (1986). Individual Taxpayer Response To Tax Cuts in 1982-1984 With Implications For The Revenue Maximizing Tax Rate. NBER Working Paper No. 2069.

Loganathan, Nanthakumar, Taha, Roshaiza (August, 2007), “Have Taxes Led Government Expenditure In Malaysia”. Retrieved from website: http://www.jimsjournal.org/11%20Loganathan.pdf

Lorraine Eden (1991) Retrospectives on Public Finance. Publisher: Duke University Press. Place of Publication: Durham, NC. Publication Year: 1991. Page Number: 306.,346 http://www.questia.com/read/107549005?title=Fiscal%20Policy%20Convergence%20from%20 Reagan%20to%20Blair%3a%20The%20Left%20Veers%20Right

Mashkoor, M., Yahya, S. and Ali, S.A. (2010). Tax Revenue and Economic Growth: An Empirical Analysis for Pakistan. World Applied Sciences Journal 10(11): 1283-1289, 2010.

Mason Gaffney (2006) A Simple General Test for Tax Bias.. Journal Title: The American Journal of Economics and Sociology. Volume: 65. Issue: 3. Publication Year: 2006. Page Number: 733+. COPYRIGHT 2006 American Journal of Economics and Sociology, Inc.; COPYRIGHT 2008 Gale, Cengage Learning http://www.questia.com/read/5028550233?title=A%20Simple%20General%20Test%20for%20Tax%20 Bias

Michael Morrison, April 2012, More Jobs through lower tax rates? A look at the Evidence, http://www. decisionsonevidence.com/2012/04/more-jobs-through-lower-tax-rates-a-look-at-the-evidence/, Retrived 22nd April 2013 Milagros Palacios, Kumi Harischandra (2008), “The Impact of Taxes on Economic Behavior”. 17-19. Retrieved from website: http://pirate.shu.edu/~rotthoku/Prague/ImpactofTaxesonEconomicbehavior. pdf

Milton Friedman, Walter W. Heller.(1969) Monetary vs. Fiscal Policy. Publisher: W.W. Norton. Place of Publication: New York. Publication Year: 1969. Page Number: 50. http://www.questia.com/read/98848905?title=Monetary%20vs.%20Fiscal%20Policy Mirrlees, J.A., (1971). An Exploration in the Theory of Optimum Income Taxation. The Review of Economic Studies, Vol. 38, No. 2, (Apr., 1971), pp. 175-208

Vito Tanzi (1995) Taxation in an Integrating World.. Publisher: Brookings Institution. Place of Publication: Washington, DC. Publication Year: 1995. Page Number: 11. http://www.questia.com/read/105579611?title=Tax%20Systems%20and%20Tax%20Reforms%20 in%20Europe

MyStats 2015 Proceedings 37 Vito Tanzi, Howell Zee 2001 Tax Policy for Developing Countries International Monetary Fund March 2001http://www.imf.org/external/pubs/ft/issues/issues27/index.htm

William H. Oakland, William A. Testa (1996) State-Local Business Taxation and the Benefits Principle.. Journal Title: Economic Perspectives. Volume: 20. Issue: 1. Publication Year: 1996. Page Number: 2+. COPYRIGHT 1996 Federal Reserve Bank of Chicago; COPYRIGHT 2002 Gale Group http://www.questia. com/read/5000313523?title=State-local%20Business%20Taxation%20and%20the%20Benefits%20 Principle

Yee Wing Ping, Pre-Budget 2013 ; Caught in the Middle, The Edge: Your Window to Malaysia, 25th September 2012, http://www.theedgemalaysia.com/first/221121-pre-budget-2012-caught-in-the- middle.html, Retrieve 18th March 2013

Young Lee, Roger H. Gordon (July 15, 2004), “Tax Structure and Economic Growth”. Retrieved from website: http://www.aiecon.org/advanced/suggestedreadings/PDF/sug334.pdf

Zsolt,,Besci, 2000, The Shifty Laffer Curve, Federal Reserve Bank of Atlanta, Economic Review,Volume 3, http://www.frbatlanta.org/filelegacydocs/becsi.pdf

Retreived 21st January 2014

Author’s Bibliography

Sherly George is born in . She obtained her BEconomics and MEcononomics from University Malaya, Kuala Lumpur. Currently she is pursuing her Doctorate in Philosophy majoring in Economics (viva presentation stage) under the supervision of Y.B. Professor Dr Syed Omar Syed Agil from University Tun Abdul Razak. Her major research is in microeconomic, macroeconomics, monetary economics, taxation, economic development and government policies.

38 MyStats 2015 Proceedings DEVELOPMENT OF ICT IN GARNERING STATISTICS Sabri Omar 1

Abstract data collected during field work. Since then, ICT became an integral part of the core Statistical organisations all over the world functions of DOSM. are facing increased demand to raise quality, often through innovative approaches towards 2. General Statistical Business Process Model modernising the office environment and operations. Technological advancement in The General Statistical Business Process Model ICT has provided the required platform for (GSBPM) [5] adopted by DOSM has always been Department of Statistics Malaysia (DOSM) to the reference in modernising the core functions move forward. The Generic Statistical Business through ICT. The 9-step GSBPM is shown Process Model (GSBPM) adopted by DOSM in Table-1. These ICT initiatives are further has always been the reference in modernising categorised as follows: our activities through ICT. The development of National Enterprise-Wide Statistical System a. Established ICT initiatives (NEWSS) has created the better governance and b. Rolling out ICT initiatives and housekeeping of enterprise and establishment c. Way forward ICT initiatives frame as well as the household and housing frame. Perhaps NEWSS has enabled DOSM to ESTABLISHED ICT INITIATIVES monitor the progress of field work activities and the entry of collected information into 3. National Enterprise-Wide Statistical System the system. Besides that, intelligent character recognition, online data entry and e-Survey The National Enterprise-Wide Statistical initiatives have improved the method of data System (NEWSS) was developed using the capture. Current efforts on leveraging computer Fujitsu Framework introduced by Fujitsu (M) assisted telephone interviewing in conducting Private Limited company in 2008 [8]. This household surveys and computer assisted JAVA Framework comprises of the Integrated personal interviewing using computer tablets Statistical System Framework and Information during field work are showing some benefits System Support. The modules of NEWSS are and are on the expansion. The development of shown in Figure-1. StatsDW data warehousing, StatsBDA big data analytics, open data, mobile applications are 4. Intelligent Character Recognition all geared towards new innovative initiatives leveraging on ICT technology. Intelligent Character Recognition (ICR) machine and software have been used intensively 1. Introduction in 2005 Agriculture Census. It has helped in capturing data from the physical paper This paper describes the development of based questionnaires into Microsoft Sqlserver ICT initiatives in Department of Statistics database through scanning, transfer and verify Malaysia (DOSM) in garnering statistics. of the scanned images. It tremendously reduced DOSM has embarked into the use of ICT in its the time in data capture compared to manual business function as early in 1962 [3][4]. The key-in by clerical staffs. The current model of inception of ICL mainframe computer system the scanner is Kodak i780 and the software is has deliberately enhanced the processing of Readsoft version SP5-2.

1 Department of Statistics, Malaysia

MyStats 2015 Proceedings 39 Table-1 : Generic Statistical Business Process Model (GSBPM)

40 MyStats 2015 Proceedings Table-1 : Generic Statistical Business Process Model (GSBPM)

MyStats 2015 Proceedings 41 Mapping of ICT initiatives against the GSBPM [11] is depicted in Table-2. Table-2 : GSBPM and ICT Initiatives

42 MyStats 2015 Proceedings Table-2 : GSBPM and ICT Initiatives

MyStats 2015 Proceedings 43 Figure-1 : Modules of NEWSS

Figure-2 : ICR Machine

44 MyStats 2015 Proceedings Simple calculations to show the capability of the a week. ICR Machine are as follows : Personalised services that facilitates users. Recommended daily volume is 130,000 pages Hassle-free services per day. Free Downloads Statistical Products Throughput speed is 130 pages per minute. Cashless Transactions Based on throughput speed, number of pages scanned per hour is 130ppm x 60 minutes Figure 3 shows the e-Services login page. equals 7,800 pages. If the machine scanned for 8 hours per day, the 7. Geographical Information System number of pages scanned per day is 7,800 pages x 8 hours equals 62,400 pages per day. A geographical information system (GIS) is a Assuming a set of questionnaires has system designed to capture, store, manipulate, 12 printed pages the number of sets of analyze, manage, and present all types of questionnaires scanned is 62,400 / 12 equals spatial or geographical data. In DOSM, NEWSS 5,200 (sets of questionnaires). GIS is used for delineation and digitizing of If there are 80,000 sets of questionnaires, the Enumeration Block (EB) boundaries. NEWSS number of days required to scan is 80,000 / GIS consist of two sub- modules, which is check 5,200 equal 15.4 days. in check out (CICO) that involves updating Based on the calculations, the ICR machine is EB by State Offices and approval process by able to speed up data entry. Headquarters and GIS Portal as a platform to display data in the form of thematic maps. It 5. Online eSurvey also provides a search facility for household and establishment based on the selection Modernising the date of collection method area​​ which is generated directly from NEWSS has been an important agenda in the ICT Household and Establishment frame. initiative program. Online eSurvey has been implemented since 2011 and currently there are Besides that, DOSM also published Census six online eSurvey and their performing status e-Atlas 2010 for public [6]. Census e-Atlas are shown in Table-3. 2010 was developed for the first time using data obtained from the Population and The combination of responses via eSurvey and Housing Census of Malaysia 2010. It is a other electronic form showed the potential mechanism that aims to show the main acceptance of respondents using eSurvey. theme of the Census in the form of thematic However substantial effort are needed to maps at the national and state levels. encourage the respondents to fully use eSurvey. The themes covered are the distribution of population, ethnicity, religion, age 6. e-Services Portal structure and marital status. e-Services is an online system provided for ROLLING OUT ICT INITIATIVES the convenience of users to get the latest DOSM products and services. e-Services offers 8. StatsDW Data Warehouse facilities such as user registration, free download publications, Advance Release Calendar, StatsDW data Warehouse is a rational database subscription of notification email for latest that is designed for query and analysis rather publication, purchase or subscribe publications, than for transaction system. It usually contains data requests for unpublished data, review historical data derived from transaction data transactions status, online payment systems but can include data from other sources. It via credit card, feedback and selected DOSM separates analysis workload from transaction e-survey [6]. workload and enable organisation to Seamless accessibility available 24 hours, 7 days consolidate data from several sources.

MyStats 2015 Proceedings 45 Figure-3: e-Services Login Page

Figure-4: Free download publications

46 MyStats 2015 Proceedings Figure-5: Online Payment via Credit Card

Figure-6: NEWSS GIS

Figure-7: Census e-Atlas 2010

MyStats 2015 Proceedings 47 Figure-8 : StatsDW Data Warehouse Model

In addition to a relational database, a interviewer to educate the respondents on the data warehouse environment includes an importance of timely and accurate data [2]. extraction, transportation, transformation, and loading (ETL) solution, an online analytical Currently, DOSM has set up three CATI centres processing (OLAP) engine, client analysis tools, which are located at the headquarters, and other application that manage the process ; Federal Territory Kuala Lumpur of gathering data and delivering it to business Department of Statistics which serves as the users [1]. Figure-3 shows the StatsDW Data centre for central zone covering respondents in Warehouse model. Kuala Lumpur and ; and Department of Statistics Melaka which serves the southern The modules that are developed under StatsDW zone covering , Melaka and are as follows [13,14]: . a. Visualization b. Analytics Based on the comparative study by Department c. Location Intelligence of Statistics Federal Territory Kuala Lumpur, d. Time Series there is cost saving in the use of CATI compared e. StatsDW Laboratory to the normal face to face interview with the e. Data Bank respondents. Table-4 shows the cost saving of f. Mobile Applications. CATI for Labour Force Survey.

9. Computer Assisted Telephone Table-4 : Cost Saving of CATI for Labour Force Interviewing Survey

Another modernisation initiative is the use of Assumptions: computer assisted telephone interviewing (CATI) technique in which the interviewer follows a No. of cases : 4,416 cases @ 552 EB script provided by a software application. It is a No. of Staff : 39 staffs (Frame, Field work, structured system of microdata collection Processing) by telephone that speeds up the collection and KPI for Field work : 2 cases per day per staff editing of microdata and also permits the KPI for CATI : 7 cases per day per staff

48 MyStats 2015 Proceedings Based on the above findings, the cost and location. It provides the average price of saving are as follows: product for the state; highest and lowest price for the current month; and comparing the Cost saving for field work is 49.7% average price between the current and previous Cost saving for processing is 25% month. More mobile apps are to be developed Total cost saving is RM266,504. and that includes the search engine for public to search statistical transactions. The benefits gained in the use of CATI have encouraged to the set-up of new centres for WAY FORWARD ICT INITIATIVES northern zone in Penang, eeastern zone in , Sabah zone and Sarawak zone. 12. Open Data

10. Computer-Assisted Personal By definition, open data means that anyone Interviewing can access, use or share. Simple as that. When big companies or governments release non- Computer-assisted personal interviewing personal data, it enables small businesses, (CAPI) is an interviewing technique in citizens and medical researchers to develop which the respondent or interviewer uses resources which make crucial improvements a computer to answer the question. It is to their communications [10]. For Malaysia, similar to computer-assisted telephone the open data initiative through the portal interviewing, except that the interview data.gov.my is spearheaded by Multimedia takes place in person instead of over the Development Corporation (MDeC) and Malaysia telephone. It has been classified as a Administrative Modernisation and Management personal interviewing technique because Planning Unit (MAMPU). This open data initiative an interviewer is usually present to serve is still new and serious thought and planning are as a host and to guide the respondent under going with many government agencies, [16]. For CAPI, DOSM has introduced the use of including DOSM. The ultimate open data must tablet for Customer Price Index (CPI) data be in comma-separated values (CSV) file that collection in five states, that is Federal stores tabular data (number and text) in plain Territory Kuala Lumpur, Selangor, Johor, text. Each line of the file is a data record. Each Perak and Penang. The tablet is used to record consists of one or more fields, separated replace the manual book in which Field by commas. The use of the comma as a field Enumerator has to fill in during field work separator is the source of the name for this file and re-key-in into CPI system for validation format [15]. when return to office. On the other hand, data are key-in directing into the tablet during field 13. Big Data Analytics work and uploaded into the CPI system when return to office. The rekey-in of the data is Big data is being generated by everything not necessary anymore. Table-5 shows the around us all the times. Every digital process benefits of using of CAPI. and social media exchange produces it. Systems, sensors and mobile devices transmit 11. Mobile Application it. Big data is arriving from multiple sources at an alarming velocity, volume and variety. Widely use of smart phones and tablets To extract meaningful value from big data, have triggered the development of mobile you need optimal processing power, analytics applications or commonly known as mobile capabilities and skills [7]. DOSM has embarked apps. Price check mobile application provides on a journey of big data by devising its big the public to search and compare monthly data model, named StatsBDA model. Figure-9 average price of products for certain category depicts the StatsBDA model.

MyStats 2015 Proceedings 49 14. Development of Application Framework initiatives have covered the three categories of established, rolling out and way forward ICT Current system development process uses the initiatives. These initiatives have significantly conventional system development life cycle helped DOSM to modernise its core functions (SDLC) which has seven stages namely project to produce statistical data and publications. planning and feasibility study; system analysis The ICT initiatives have shown benefits in terms and requirements definition; system design; of monetary value and time. However, some implementation; integration and testing; initiatives need further study on the benefit acceptance, installation and deployment; and realization in order to measure the outcomes. maintenance [12]. 16. Acknowledgements However, with the high demand and expectation from the users, stages of the SDLC I would like to express my gratitude to DOSM need to be innovated by adopting system Federal Territory Kuala Lumpur - Ms. Rozita Talha development framework. Furthermore, the and Miss Sa'adiah Ahmad for input on CATI; development of web 2.0 application requires Yii DOSM Selangor - Mr. Ibrahim Jantan for input as the high performance PHP Framework [9]. on CAPI; and Mr. Mior Norazman Mior Daud, for input on GIS and e-Services. My gratitude also 15. Conclusion to Miss Nazlah Mustaffa for the infographics and Miss Nazirah Nazarudin for the type setting of In this paper, the development of ICT this paper.

The phase 1 of the StatsBDA implementation will involve the time series data of Malaysian External Trade, Malaysia statistical business register, and Malaysia Statistical address register.

50 MyStats 2015 Proceedings Tables and Diagrams

1 Table 1 : Generic Statistical Business Process Model 2 Table 2 : GSBPM and ICT Initiatives 3 Table 3 : Online e-Survey 4 Table 4 : Cost Saving of CATI for Labour Force Survey 5 Table 5 : Benefits of using CAPI for CPI 6 Figure 1 : Modules of NEWSS 7 Figure 2 : ICR Machine 8 Figure 3 : e-Services Login Page 9 Figure 4 : Free download publications 10 Figure 5 : Online Payment via Credit Card 11 Figure 6 : NEWSS GIS 12 Figure 7 : Census e-Atlas 2010 13 Figure 8 : StatsDW Data Warehouse Model

Bibliography

1. Building the Data Warehouse, William Inmon (John Wiley and sons), 1996

2. CATI : Glossary U.S. Bureau of Labour Statistics Division of Information Services

3. DOSM 60 Year Book, 2009

4. DOSM Annual Report, 2014

5. DOSM - General Statistical Business Process Model, 2011

6. DOSM - Portal https://www.statistics.gov.my, 2015

7. IBM : What is Big Data?, 2015

8. Fujitsu Framework - Development Guidelines & Standard, 2008

9. Mark Safronov, Jeffery Winesett, Web Application Development with Yii 2 and PHP, 2014

10. Open Data Institute, United Kingdom, 2015

11. Presentation slides in Statistical Process Modernisation Committee, 2015

12. Rusell Kay, System Development Life Cycle, Computerworld, 2002

13. StatsDW DOSM - Functional Design Specifications, 2014

14. StatsDW DOSM - User Requirement Specifications, 2014

15. Wikipedia : Comma-separated values, 2015

16. Wikipedia : Computer-assisted personal interviewing, 2015

MyStats 2015 Proceedings 51 52 MyStats 2015 Proceedings DESA : GROWING THE DIGITAL ECONOMY FROM A NATIONAL PERSPECTIVE Mohd Jalallul Alam @ Jasni Zain 1, Syahida Ismail, Nur Asyikin Abdul Najib

Abstract remains higher than the national average at 1.6 times. Figure 2 shows the ICT industry employee The aim of this paper is to share the journey and average gross wages vs national is 1.7 times the the birth of unique national account specific national average and fairly consistent over the for ICT, named Information Communication time-series. Total trade of ICT shows positive net and Technology Satellite Account (DESA). It exports of RM102.2B, and in 2014 is bucking a explains the intent of satellite account and four year declining trend. includes the extensive use of data from DESA in national planning activities, advocacy roles in The Digital Economy shows an industry sector both local and international platforms as well that if it were a vertical, would be of the same as benchmarking the nation’s digital maturity size as the construction and mining & quarrying position against other nations. The paper also sectors combined 1 The Digital Economy has illustrates how the ICT industry can kick-start a more productive work force, better paid and further escalate the nation’s effort towards workers as compared to the national average open data initiatives. The paper suggests the and a positive Balance of Payments. The Digital way forward to ensure future challenges can Economy is catalytic as well; growth of the ICT be minimized and the immediate need for industry drives the demand for robust digital improved coordinated effort to sustainable infrastructure, fundamental to Malaysia’s Digital Economy growth. competitiveness. In addition, accessibility and affordability are equally critical in uplifting Intent of satellite account the economy and narrowing the socio- economic gap through the provision of digital The Malaysian Digital Economy is a significant opportunities to the have-nots 4. Hence the part of the overall larger national economy. With importance of this economic sector and it’s the latest data sourced from the preliminary continued growth. ICTSA 2010-2014, the Digital Economy contributed RM188.3B to the national economy, In 1996, ICT in Malaysia was at nascent stage. or a 17% contribution2. Figure 1 shows the The Multimedia Super Corridor (MSC) was time-series of GDP growth for both the national established by the Government of Malaysia to economy and the Digital Economy from 2010 to grow the ICT Industry 5. Evolution of this idea 2014. Within the ICT industries are five major ICT led to the approach of ICT as an enabler and Industry sub-sectors contributing a combined the building blocks of a key services based Gross Value Added of RM 132.5B. E-Commerce sector. For the large part, the scope of what was plays a major role as well, contributing RM63.8B referred to as ICT in Malaysia revolved around to the economy, or a 5.8% contribution. the MSC initiative under the purview of the Productivity of the ICT industry productivity Multimedia Development Corporation (MDeC).

1 Multimedia Development Corporation (MDeC) Sdn Bhd 2 DOSM, ICT Satellite Account, 2010-2014, DOSM 3 MDeC Analysis based on Annual National Accounts, Gross Domestic Product, 2010-2014, DOSM 4 ICT Satellite Account, 2010-2015, DOSM 5 ICT Industry Blueprint, 1996

MyStats 2015 Proceedings 53 By 2011, the size of the ICT industry was developed 8. To complement the perspective, estimated using MDeC figures from 20096 other proxies were used, including : until 2011 reported the GDP contribution of MSC Malaysia companies to be growing by 1. The intensity of technology including the 38.6%, from RM5Billion to RM9.6 Billion7. While e-Intensity Index by the Boston Consulting the figures were highly accurate, the MSC Group 9. In this Index, Malaysia was ranked calculations represented an industry cluster, within a cluster of nations with a similar diverse as it is with the mix of technology, technology profile of internet infrastructure, creative multimedia and high-tech shared expenditure and degree of involvement by services firms, rather than an entire industry government, businesses and citizens in the sector. In addition, they represent the supply- internet side impact of the Digital Economy, without factoring in national demand for digital products and services. 2. The index of digital components of supply, demand, institutions & innovation of 50 To address this gap in the collation of private countries tracked since 2008 by the Fletcher sector data, a singular account to capture data School, under Tufts University. This index, and statistics on ICT and e-Commerce activities named the Digital Evolution Index, saw Malaysia across all sectors in the country had to be as a fast-rising nation and keyed on innovation10.

Figure 1 – Digital Economy vs. the National Figure 2 - ICT industry average CoE per Economy employee vs national

6 MSC Malaysia Industry Report 2009, MDeC 7 MSC Malaysia Industry Report 2011, MDeC 8 Digital Malaysia Progress Report 2012, MDeC 9 The Connected World: The Internet Economy in the G-20, Boston Consulting Group 10 Digital Evolution Index, 2008-2015, Fletcher School

54 MyStats 2015 Proceedings 3. EPU estimated based on internal research account was envisioned to capture the supply largely driven by comparison to the E&E and and demand aspects of the Digital Economy to telco sector, an amalgamated ICT industry size build a cohesive picture the Malaysian Digital of 9.8% 11 contribution to GDP in 2010. This total Economy14. In 2012, the first pilot ICTSA was projected number was established as a measure established to enable the supply and use of ICT of influence of the ICT industry in the Tenth products to be analyzed from the economic Malaysia Plan. perspective.

These proxy figures were more holistic than 2. Application of DESA & ICTSA in national the MSC view, capturing both the demand and planning activities supply characteristics of a Digital Economy. However, these proxies were also limited in their The DESA and the chief core component ICTSA use as they were either one or a combination also has unique role in the highest levels of of several factors, including one-off figures governance. Since 2012, the DESA has been with no historical context, insights that a key input to the Implementation Council were more qualitative than quantitative and Meeting & International Advisory Panel that more often than not involved analysis that oversees the effective national-level strategy. required repeated primary research, market These platforms, both of which are chaired interpretation and other analyses to obtain by the Prime Minister, have members that a time-series figure. Often, these data-points include cabinet ministers, senior civil servant, incurred a large cost in their outlay, both in heads of the government agencies and terms of funds and resources and there were senior international advisors that are directly questions raise regarding to the sampling involved in the charter of MSC and Digital accuracy. Malaysia.

In 2011, MDeC launched the Digital Malaysia Key outcomes from these meetings include Initiative with the goal of national Digital projects that extend the benefits of the Digital Transformation. One key effort within Digital Economy to the Bottom 40% of the income Malaysia was the establishment of the Digital pyramid (B40) and to deepen the economic Economy Satellite Account (DESA) which impact of niche segments such as key levers consists of an ICT Satellite Account (ICTSA) and within ICT services such as a National Big Data other indicators to build the complete picture of Initiative, consolidated approach towards the impact of digital transformation. The ICTSA e-Commerce, multi-agency programmes in the was developed based on the national accounts Internet of Things (IOT) sector and more. The framework to present a picture of the value of fundamental issues and data-points stem from transactions in ICT products 12 within the frame the DESA analysis and deep-dive of the ICTSA. of the ICT industry sub-sectors of ICT services, ICT trade, ICT manufacturing, e-commerce, and content & media Other major uses for the satellite account in policy making are both diverse in range and Digital Malaysia was the driver of the DESA, but perspective. A key example was the earliest it was DOSM who ultimately conceptualized ICTSA figures used by EPU to further enhance the structure and compilation method in the collaboration required by the public sector accordance to SNA 200813. These compilation on a common frame of data at NDC 2013. EPU efforts were placed under the governance used the satellite account information for the structure of EPU, MOF, MCMC, MDeC, PIKOM first time to revise their original RMK-10 estimate and several other key actors. This satellite to the more accurate figure provided by DOSM.

11 Tenth Malaysia Plan, EPU 12 Digital Malaysia Progress Report 2012, MDeC 13 Transcending the Traditional Approach Through Satellite Accounts, 2013, DOSM 14 Eleventh Malaysia Plan, Chapter 7, Strategy Paper 15, EPU

MyStats 2015 Proceedings 55 From 2012 to 2013, MDeC initiated research into particular having the model that framed the the Digital Malaysia Aspirational Goals, a far- earliest ICT Satellite Account frame16. However, reaching target to stimulate growth in Malaysia’s Malaysia is unique in compiling this account digital landscape to increase technology & on an annual and sustained basis. Malaysia’s internet accessibility with relevant adoption current ICTSA compilation is from 2010-2015 on digital content and wider usage of digital and is still actively pursued for the immediate technology by government, businesses and future. Thus there is a peculiar challenge for communities15. The central data-set employed planning & strategy entities to comprehensively was the ICTSA and enhanced with further benchmark other nations within the ICTSA research on the global indices from WEF and framework owing to Malaysia being the first IMD. Overarching was the Digital Economy mover and clear leader in this field. It is hoped 18.2% contribution to the national GDP by 2020. that other nations can replicate this template In 2014, MATRADE also cited a key interest in and structure, improve on it where necessary the ICTSA, as it seeks to understand the ICT and replicate their own satellite accounts in this sector from an exports perspective. With a long field. history in analysis of the E&E manufacturing data, intra-agency research was conducted 3. DESA in the Open Data Environment between MDeC and MATRADE to inspect the comparison between ICT goods exports and the In deepening the usage of the DESA, it becomes E&E manufacturing exports. Code-matching apparent that certain challenges emerge from between the Standard International Trade two distinct angles: Classification (SITC) codes between the two segments has led to a greater understanding Fundamentally, the issue with macro- of the scope of ICT goods and E&E and the economic planning is a challenge, owing to discovery of an estimated 90% overlap between the very nature of national accounts. Macro the two sectors. It has also facilitated the aggregate demand and supply are composed necessary planning of future programmes by of fundamentally heterogeneous items, whose MATRADE and intervention efforts by MDeC precise magnitudes can never be accurately particularly in the areas of IOT. predicted.17 This challenge is especially heightened owing to the fact that the ICTSA is a Other on-going initiatives include the use of challenging compilation to begin with, and the the DESA in the size and growth of ICT Services observation of the economic impact is cross- exposrts under the stewardship of National industry as per the dual-nature of ICT as both an Export Council. This activity is still on-going, industry and an enabler. Thus, as policy-makers where MDeC and MATRADE are working dive down into the critical factors contributing closely with the Ministry of Communications & to the trends, to determine areas of interest & Multimedia (KKMM) in bringing the details of intervention, a deeper understanding of the sub-sector growth and providing a close watch aggregate components is necessary. on both the domestic output and talent supply needed. Another intra-agency activity is the The second challenge is in the granularity research of the labour productivity estimates of data needed for further investigation and for the ICT sector, currently coordinated by research. As the use of the ICTSA has spread the Malaysian Productivity Council (MPC) and beyond the macro-planning efforts, further MDeC. demands have been placed on it, including the breakdowns from the national ICT figures The ICTSA is a unique experiment on the scale to the industry sub-sector level and now to the and reach of the size of the ICT industry with MSIC activity level. This is a repeated theme respect to the national frame of context. There encountered with most stakeholders, partners have been a handful of nations that have and research efforts into the ICTSA; the need for attempted the ICTSA compilation; Australia in increasing detail to plot the key levers of policy,

15 DM Progress Report 2013, MDeC 16 ICT Satellite Account, 2010, Australia

56 MyStats 2015 Proceedings industry intervention is very high. Stanley Smith, Terry Sincich, 1990, Journal of the American Statistical Association. Before proceeding further, it must be noted that DOSM has produced a tremendous effort so far, starting with the pilot ICTSA 2005 & 2010 that was published in 2012, wherein ICT industry GDP contributions were first established and breakdowns of the ICT Industry sub-sector were first compiled. In 2013, the ICTSA 2005-2012 was released with introduction of the seven-year time series and refinements made to the MSIC demarcations. A year later, saw the introduction of an e-Commerce GDP calculation. 2014 also marked a landmark year for the ICTSA as it was first published to the public for the first time, Figure 3 - Open Data topology 19 being an experimental satellite account in years prior. In 2015, the latest publication of the ICTSA Open datasets are not a new phenomenon, with will see a rebase to the 2010 reference, thus the earliest concept as we know it today first further highlighting sustained DOSM effort in appearing in 2006. Open data has been used to keeping the ICTSA relevant for policy makers such an extent by industry and governments and industry analysts. to produce meaningful economic insights and has also held an economic value in and of However, there is still much effort to address itself 20. Research has shown that use of data the pressing need for more granular data. is “non-rivalrous” the fact that governments The most likely remedy for probing into the (or others) have used the data for the purpose three-digit and five-digit MSIC of the ICTSA for which it was originally collected does not will likely be a longer time-series for increased prevent that data being used for other purposes statistical accuracy, in particular the accuracy by others or, indeed, by other parts of the of the forecasts18. However, the time pressure government itself 21. Economic theory suggests on delivering a cohesive and sustainable plan that benefits are maximised when access to the of growing the Digital Economy is very high, information is priced at the marginal cost of as Malaysia is in a crucial growth stage. Thus, distribution - and the internet has made the this paper suggests the use of proxy data to marginal cost of distribution of digitized data by ameliorate the dearth of granular data of the download from the web effectively zero. ICTSA. We postulate that this gap can be served by the rich public sector information data-set Several national governments have created similar to, or at least largely synonymous with, platforms to distribute a portion of the data they Open Data. collect. It is a concept for a collaborative project in Government to create and organize a Culture The Meaning and the Implications of for Open Data or Open government data. A list Heterogeneity for Social Science Research of over 200 local, regional and national open The Relationship Between The Length of the data catalogues is available on the open source Base Period and Population Forecast Errors, Data Portals project, which aims to be

17 The Meaning and the Implications of Heterogeneity for Social Science Research 18 The Relationship Between The Length of the Base Period and Population Forecast Errors, Stanley Smith, Terry Sincich, 1990, Journal of the American Statistical Association 19 The Aid Management Journey to Transparency and Open Data, http://www.developmentgateway.or\g/ 20 Open data – Unlocking Innovation and Performance with liquid information, 2013, McKinsey Insights 21 Open Data for economic growth, 2014, World Bank

MyStats 2015 Proceedings 57 a comprehensive list of data catalogues from embodied within the portal of around the world 22 data.gov.my. At present, 373 datasets from 17 ministries are available with the Ministry of So pervasive is the use of open datasets that Agriculture as the biggest contributor, with a it has been cited as a potential threat to NSO, total of 108 datasets, follow by the Ministry of owing to the volume, variety and in some cases, Natural Resources and Environment with 66 the velocity of data. It is only the veracity of datasets 27. this open data that is still suspect and a big reason why the stringent function of NSO The mining of publicly available government data validation is one that can never truly be data was first brought up in 2012, with replaced. This is especially true in the Digital the introduction of the DESA Secondary Economy where there is significant challenge Indicators to the complement the ICTSA. faced by many national statistics offices (NSOs) This was a non-exhaustive list of potential in developing indicators on the very fast indicators that in concert with the six primary developments on ICTs and Internet adoption. indicators of the ICTSA could bring about Traditional statistics, when available, generally further insights and granular details with takes several years to prepare, and thus could regards to the Digital Economy. rarely capture current developments, which however were of high interest to ICT and To automate the process, the DESA System Internet policy makers 23. was envisioned to bring the initial active list of On a global level, several initiatives are 16 supply and demand indicators under one spearheading the open data movement. One common platform and the ultimate expansion example is the World Bank’s Open Data Initiative to the wish-list of 132 indicators. Several that was launched in April 2010 and provides challenges, including logistical, resource and free, open, and easy access to development budget changes led to an 18 month delay. data, and challenges the global community to However, in 2014, the DESA Steering Committee use the data to create new solutions to eradicate undertook an inter-agency workgroup to refresh poverty 24. Today, the World Bank’s Open Data the current list and determine the business Catalog includes over 8,000 development needs of the relevant ministries and industry indicators, of which 1,400 for 252 countries associations. Not only was the existing 132 and 36 aggregate groupings, going back over indicators validated, but an addition of but 50 years, in 50 languages, and is continuously an addition of another 16 indicators were expanding 25. requested from EPU, MOF, MOSTI and PIKOM to further understand the Digital Economy in More locally, Malaysia is currently ranked further detail. 41 among 86 countries, according to the Open Data Barometer, a United Nations-lead common assessment method for Open Data At its essence, the DESA System will assist in that analyses the readiness, implementation the compilation of necessary information to and impact of Open Data initiatives around bring in secondary components to support the the world 26. MAMPU is empowered to be the ICTSA and enrich the entire DESA itself. Figure 4 custodian of the Malaysian Open Data initiative exemplifies the entirety of the DESA.

22 Building the Digital Enterprise: A Guide to Constructing Monetization Models Using Digital Technologies (Business in the Digital Economy), 2015, Mark Skilton 23 The Proliferation of “Big Data” and Implications for Official Statistics and Statistical Agencies, 2015, OECD, Christian Reimsbach-Kounatze 24 Open data in development – What and How, 2014, World Bank 25 Et.al. 26 Open Data Barometer, 2nd Edition, 2015, World Bank 27 27th Implementation Council Meeting, 2015, MDeC

58 MyStats 2015 Proceedings The DOSM Household Income Survey 2012 puts households in various monthly household income brackets. It starts with households that earn less than RM1,000 a month, then those that earn more than RM1,000 but less than RM2,000, all the way to those that earn RM10,000 a month or more. It is apparent that the median income is much lower than the average income and distribution of household income in unusually high. This signals an unequal distribution of income which may hinder growth drivers by depriving the ability of lower income households to and accumulate wealth 30.

Figure 4 - DESA anchored by ICTSA and 4. Conclusion & Moving Forward supported by sub-indicators There are three primary items that will need further effort: Research of the DESA System project continues at present through 2016 and and beyond. The a. Economic research into the existing ICTSA potential findings of DESA Secondary Indicators compilation by all parties that look into policy are promising, some examples surrounding the of the supply and demand of ICT products and theme of wealth and income could include the services to determine potential findings and following: insight. This would include the public sector players such (e.g. EPU, MOF, DOSM, MCMC, Compensation of employees in the ICT KKMM, MDeC, etc), private sector industry assocations (e.g. PIKOM, etc), academia and Sector: Low compensation to employees is a research institutions. national issue. The GDP share of employees in Malaysia was 33.2% during 2010-2015 period, b. Utilise Open Data to fill in the necessary lower than that of high-income and middle- granular details on a sub-sector level. By income economies like Australia (47.8%), South implementing platforms such as the DESA Korea (43.2%) and South Africa (45.9%) 28. Early System, continued research into a non- analysis indicate that the ICT industry sector exhaustive list of economic, industry, social have a better distributive share. indicators is needed to provide the necessary Income inequality is a potential area of granularity for effective decision making research as there are a number of reasons why by policy-makers, economists and industry such inequality may harm a country’s economic analysts. performance. At a microeconomic level, income inequality may correlate with health spending that leads to health issues and reduces the c. Most crucial moving forward is the continued educational performance of the poor 29. These effort for the DOSM to continue the compilation two factors lead to a reduction in the productive of the ICTSA. As the maturity of the ecosystem potential of the work force. At a macroeconomic further develops, it is crucial that DOSM level, inequality can be a brake on growth and maintains this compilation effort owing to the can lead to instability. national data-points for effective policy-making

27 27th Implementation Council Meeting, 2015, MDEC 28 11th Malaysia Plan, Economics Research Malaysia, 2015, IB 29 International Monetary Fund’s Causes and Consequences of Income Inequality: A Global Perspective, 2015 30 State of Households Report, 2014, Khazanah Research Institute

MyStats 2015 Proceedings 59 and insight. By extension, this also entails the areas, increasing the periodicity, etc.

The rise of open data, (linked, as we have seen, to wider shifts towards openness and the developments of data-processing technologies), has introduced a new set of challenges, increasing the effectiveness and equitability of development through research production and communication. Active and engaged data curation, making connections between qualitative and quantitative resources, ensuring context of data is accessible to re-users, bridging data across linguistic and cultural divides, and attentively intervening in open data eco- systems is likely to be an important future role 31.

31 Emerging Implications of Open and Linked Data for Knowledge Sharing in Development”, Tim Davies and Duncan Edwards, 2012, IDS Bulletin

60 MyStats 2015 Proceedings A FUZZY APPROACH TO ENHANCE POST FLOOD DAMAGE ASSESSMENT FOR QUALITY RISK ANALYSIS Syarifah Sakinah Ahmad 1 and Emaliana Kasmuri 2

Abstract the flood relief center through the channels establish by the center. This method is prone Floods have put people; infrastructure, to error when data is collected and recorded building and other things at risk where it manually by center. This would affect the can be destructive thus bring serious losses damage estimation and assessment and the to the affected victim. Post flood damage amount of aid to be given to the victim. assessment requires crucial information for decision support, where it has gained more There are many research efforts that contributes importance with the evolving context for flood to mitigating and managing flood such as risk management. However due to various in flood early warning system (Melnikova, uncertainties that originate from data collection, Jordan, and Krzhizhanovskaya 2015; Pyayt et al. damage figure and damage function it is still 2015; Sättele, Bründl, and Straub 2015), flood insufficient to obtain accurate flood damage forecasting (Fuchs et al. 2013; Gaudiani et al. estimation with the required time lead. The 2014; Wang et al. 2013; Zhou and Chen 2013), objective of this research is to model post flood flood monitoring (Ancona et al. 2014; Bayraktar damage assessment model using artificial and Bayram 2009; Long et al. 2014; Memon et al. intelligent approach called fuzzy system. 2015) and flood prevention (Kalyuzhnaya and Uncertain parameters that include water depth, Boukhanovsky 2015). However very little effort water velocity, and type of debris and duration is found for managing post flood (Parsons et al. of inundated are identified and constructed 2015). as main determinants for damage assessment result. Fuzzy approach is used in the damage Study has divided flood damage into four assessment model due to the characteristics types: direct tangible, direct intangible, indirect of the parameters, which are identified as tangible and indirect intangible. Direct flood uncertain. This would improve the current damage measures the severity of damage assessment model that gives a better result. due item contact with flood water. Indirect flood damage measures the effect from direct damage on tangible and intangible item (Dutta, 1. Introduction Herath, and Musiake 2001). This paper measures the estimation of direct flood damage on house Flood becomes destructive when the rate and its content at post flood. Estimation of of rainfalls is heavy and the water overflows flood damage is a complex process that uses from the river, lake or sea to the dry land. It huge volume of hydrologic with consideration puts people, infrastructure, building and other of socioeconomic factor (Jongman et al. things at risk that can bring serious losses to the 2012). Other model complemented damage affected victim. Assessment of damage due to assessment model with supporting factors flood is a challenge for the flood victim when likes water velocity, flooding duration, water it is possible to assess it after the water has contamination, precaution and warning time. subsided and they return to their belongings. Information is crucial to assess damage at post This paper uses fuzzy as solution to post flood. Required information is to be collected flood damage assessment. It models flood from the flood victim and reported back to damage assessment and fuzzy-based decision

1, 2 Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka

MyStats 2015 Proceedings 61 techniques incorporate inherent imprecision, inputs and one output. The number of uncertainties and subjectivity of available data. output corresponds to the linguistic variables These attributes are propagated throughout the (indicators), which described the flood (Zlateva, model for more realistic results. Fuzzy approach Pashova, and Stoyanov 2011). The output modeling techniques can also be used in post represents a complex post flood damage flood damage assessment to assess the severity assessment. Post flood inference engine consist of damage in cases where the experts do not of three models as shown in Figure 2. have enough reliable data to apply statistical approach. Each model receives many inputs from input indicators. The input indicators are input 2. Framework for Post Flood Damage parameters of the designed fuzzy system. These Assessment parameters corresponds to flood and damage affected on house and household items in the The governance of flood management in house. Malaysia is divided into two areas that are flood risk management, which is concern in decision- The proposed fuzzy logic model is designed making process and flood management, that is with previously defined input parameters. related with life cycle of managing flood (Maidin Every subsystem gives an intermediate output et al. 2014). Recovery and development are variable. The output from each subsystem phases at post flood, the issue addressed in this is defined as Intermediate Variable 1 “House paper. The goal of post flood is to restore the Damage Factor”, Intermediate Variable 2 life of flood victim to normal. This paper has “Flood Damage Factor” and Intermediate designed a framework to assess damage at post Variable 3 “Appliances Damage Factor”. These flood using uncertain parameters that is the intermediate output variables will be processed flood data and socio economic data retrieved by Fuzzy Inference System (in Figure 2), which from its repositories as shown in Figure 1. will produce the complex post flood damage assessment value. The value is a criterion for The rule-based repositories stores damage final decision-making about the degree of assessment rule defined for socio economic damage for a particular area. The higher value data. The rules and data are retrieved into corresponds to the more severe post flood inference and query engine to produce damage. intermediate assessment variable. The architecture of the inference engine is shown 4. Design of Fuzzy Logic Model in Figure 2 and the detail it is discussed in the next section. Fuzzy damage assessment engine Linguistic variables are quantitative value that uses the value computed from inference engine corresponds to qualitative feature (Zlateva, to find monetary losses. The value will be Pashova, and Stoyanov 2011). These variables display in a form that can be understood by the are information and decision that are closely user assisting them in further step of decision linked make a decision from imperfect making. information using different method. Possible types of cases and damage assessment on living The paper uses house and household item property are defined by the expert that depends data published by Household Income and on quality and uncertainty of the available Basic Amenities Survey Report 2012 as socio information from various sources. economic data. The data is selected due to the major damage from direct water contact to In fuzzy logic house condition subsystem the the house and its content compared to others input linguistic variables for Input 1 and Input damages (Gasim et al. 2014). 2 are represented membership functions, that are, {“Very Small”, “Small”, “Medium”, “Big”, “Very 3. Fuzzy Rule-Based System for Post Flood Big”} and {“Bad”, “Medium”, “Good”}. The input Damage Assessment variables are assessed in the interval [0,1] and [0 – 100]. The fuzzy logic system output (house The fuzzy logic model is designed with several damage factor) is describes as {“Good”, “Fair”,

62 MyStats 2015 Proceedings Figure 1: Framework for Fuzzy Rule-Based System for Uncertain Post Flood

Figure 2: Fuzzy models for Post Flood

MyStats 2015 Proceedings 63 Figure 3: Membership functions of the input indicators for Fuzzy Logic House Condition Subsystem

Figure 4: Membership functions of the output indicators for Fuzzy Logic House Condition Subsystem

64 MyStats 2015 Proceedings The membership functions for fuzzy logic flood 5. Fuzzy Rule Based Model Application condition subsystem are {“Low”, “Medium”, “High”, “Very High”}, {“Short”, “Medium”, “Long”} Table 1, Table 2 and Table 3 summarize the result and {“Low”, “Medium”, “High”} for Input 3, Input of data set assessment using the proposed fuzzy 4 and Input 4. These input are assesses at logic model and the characteristics of the input. the interval of [0,5], [0,10] and [0-100] using In Table 1, it can be deduced that that house trapezoid membership functions. The fuzzy condition “Bad” and “Medium” contributes the logic system output (flood damage factor) most to the house damage factor. However, the is describes as {“Good”, “Fair”, “Risky”, “Very damage factor for house condition “Good” is Risky”}. The post flood damage assessment is reasonable. assessed in the interval [0,100] using triangular membership functions. The inference surfaces The minimum and the maximum water depth’s in 3D for the three fuzzy logic subsystems are value in Table 2 indicates that most houses given in Figure 5. will stay dry and it is possible to walk through the water, and both first floor and rood will be covered by the water as suggested by Japanese Flood Fighting Act 2001 for Water Depth Classification Suggestion. It is shown that type of debris gives major contribution to the damage condition from “Fair” to “Very Risky.

Table 3 summarizes severity of damage on household item for entertainment and electronic. The inputs are classified as low and high percentage for entertainment indicates the general number of entertainment appliances in the house. Result shows that the damage factor goes higher with higher percentage of entertainment and electronic appliance.

6. Conclusion

A fuzzy logic model for post flood damage assessment is proposed. The study covers house, flood condition and house hold items. It can be concluded that the developed fuzzy logic system can successfully evaluate post flood damaged from analyzing the obtained result. The advantage of fuzzy logic system impacts criteria in all parts of the system. This model is adjustable and databases made from various elements are required. It is easy to incorporate the knowledge dealing with post flood. The designed fuzzy system is part of the post flood integrated information system, which will be developed.

Figure 5: Surfaces if the fuzzy logic subsystems

MyStats 2015 Proceedings 65 Table 1: Fuzzy inference result for House Condition Subsystem

House Bad Medium Good Condition Size (%) 10 30 60 90 10 30 60 90 10 30 60 90 Damage 35 48 69 81.3 21.9 35 43.2 80.4 6.08 22 38.3 48.3 factor

Table 2: Fuzzy inference result for Flood Condition Subsystem

Water Depth 3 3 3 0.5 0.5 0.5 (meters) Flood Duration 3 3 3 1 1 1 (days) Type of Debris 10 50 80 10 50 80 (%) Damage Fair Risky Very Risky Good Good Fair condition

Table 3: Fuzzy inference result for Household Item Subsystem

Entertainment 10 (Low) 60 (High) (%) Electronic (%) 10 30 60 90 10 30 60 90 Damage Factor 14.8 34.0 43.3 65.9 43.3 56.1 65.9 87.8

7. References

Ancona, M., N. Corradi, A. Dellacasa, G. Delzanno, J.-L. Dugelay, B. Federici, P. Gourbesville, et al. 2014. “On the Design of an Intelligent Sensor Network for Flash Flood Monitoring, Diagnosis and Management in Urban Areas Position Paper.” Procedia Computer Science 32: 941–946.

Bayraktar, H., and B. Bayram. 2009. “Fuzzy Logic Analysis of Flood Disaster Monitoring and Assessment of Damage in SE Anatolia Turkey.” 2009 4th International Conference on Recent Advances in Space Technologies: 13–17.

Dutta, Sushmanta, Shrikanth Herath, and Katumi Musiake. 2001. “Direct Flood Damage Modeling Towards Urban Flood Risk Management.” Joint Workshop on Urban Safety Engineering: 127–143.

Gasim, Muhammad Barzani, Sani G. D/iya, Mohd Ekhwan Toriman, and Musa G Abdullahi. 2014. “FLOODS IN MALAYSIA Historical Reviews, Causes, Effects and Mitigations Approach.” International Journal of Interdisciplinary Research and Innovations 2 (4): 59–65.

Gaudiani, Adriana, Emilio Luque, Pablo García, Mariano Re, Marcelo Naiouf, and Armando Di Giusti. 2014. “Computing, a Powerful Tool for Improving the Parameters Simulation Quality in Flood Prediction.” Procedia Computer Science 29: 299–309.

66 MyStats 2015 Proceedings Jongman, B, H Kreibich, H Apel, J I Barredo, P D Bates, L Feyen, A Gericke, J Neal, J C J H Aerts, and P J Ward. 2012. “Comparative Flood Damage Model Assessment: Towards a European Approach.” Nat. Hazards Earth Syst. Sci. 12 (12) (December 19): 3733–3752.

Kalyuzhnaya, Anna V., and Alexander V. Boukhanovsky. 2015. “Computational Uncertainty Management for Coastal Flood Prevention System.” Procedia Computer Science 51: 2317–2326. http:// www.sciencedirect.com/science/article/pii/S1877050915012053.

Long, Di, Yanjun Shen, Alexander Sun, Yang Hong, Laurent Longuevergne, Yuting Yang, Bin Li, and Lu Chen. 2014. “Drought and Flood Monitoring for a Large Karst Plateau in Southwest China Using Extended GRACE Data.” Remote Sensing of Environment 155 (December): 145–160.

Maidin, Siti Sarah, Marini Othman, Mohammad Nazir Ahmad, and Noor Habibah Arshad. 2014. “Managing Information and Information-Related Technology: Enabling Decision-Making in Flood Management.” International Journal of Digital Content Technology and Its Applications 8 (2): 13.

Melnikova, N.B., D. Jordan, and V.V. Krzhizhanovskaya. 2015. “Experience of Using FEM for Real- Time Flood Early Warning Systems: Monitoring and Modeling Boston Levee Instability.” Journal of Computational Science 10 (September): 13–25.

Memon, Akhtar Ali, Sher Muhammad, Said Rahman, and Mateeul Haq. 2015. “Flood Monitoring and Damage Assessment Using Water Indices: A Case Study of Pakistan Flood-2012.” The Egyptian Journal of Remote Sensing and Space Science 18 (1) (June): 99–106.

Parsons, Sophie, Peter M. Atkinson, Elena Simperl, and Mark Weal. 2015. “Thematically Analysing Social Network Content During Disasters Through the Lens of the Disaster Management Lifecycle” (May 18): 1221–1226.

Pyayt, A.L., D.V. Shevchenko, A.P. Kozionov, I.I. Mokhov, B. Lang, V.V. Krzhizhanovskaya, and P.M.A. Sloot. 2015. “Combining Data-Driven Methods with Finite Element Analysis for Flood Early Warning Systems.” Procedia Computer Science 51: 2347–2356.

Sättele, Martina, Michael Bründl, and Daniel Straub. 2015. “Reliability and Effectiveness of Early Warning Systems for Natural Hazards: Concept and Application to Debris Flow Warning.” Reliability Engineering & System Safety 142 (October): 192–202.

Wang, Dawei, Wei Ding, Kui Yu, Xindong Wu, Ping Chen, David L. Small, and Shafiqul Islam. 2013. “Towards Long-Lead Forecasting of Extreme Flood Events.” In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’13, 1285. New York, New York, USA: ACM Press.

Zhou, Xiangmin, and Lei Chen. 2013. “Event Detection over Twitter Social Media Streams.” The VLDB Journal 23 (3) (July 19): 381–400.

Zlateva, P., L. Pashova, and K. Stoyanov. 2011. “Fuzzy Logic Model for Natural Risk Assessment in SW Bulgaria.” In 2nd International Conference on Education and Management Technology. Singapore: IACSIT Press.

MyStats 2015 Proceedings 67 68 MyStats 2015 Proceedings MALAYSIA HOUSEHOLD CONSUMPTION EXPENDITURE : RURAL VS URBAN Wan Zawiah Wan Zin 1, Siti Fatin Nabilah

Abstract the expenditure pattern between residents in rural and urban areas may be investigated.

The objective of this study is to identify the Household expenditure consists of expenditures determinants of household consumption incurred by a family to meet daily needs, expenditure in both the urban and rural areas in regardless of the needs of the individual and Malaysia. Data from the Household Expenditure the needs of the whole family. Examples of Surveys (HES) conducted by Department of household expenditure are on meals, clothing, Statistics Malaysia; collected in 1998/1999, utility bills, health, transportation, education etc. 2004/2005 and 2009/2010 time-frame is used in Factors affecting the household expenditure this analysis. Seven factors which may affect the are age, gender, ethnicity, marital status, strata, household expenditure patterns are considered, state, occupation, household size and religion. which are age, gender, ethnicity, marital status, education levels, work status & household Several studies have been carried out in several size were considered. Ordinary Least Squares parts of the world to identify what could and Quantile regression methods are used to possibly be the factors influencing household analyze the difference between the urban and expenditure pattern. For example, in Turkey, rural distributions of household consumption a study by Ebru Caglayan (2012) found that expenditure at various levels of expenditure. age increases the consumption expenditures The findings show that education level increases in general and urban estimations, while it the consumption expenditures in the urban decreases the consumption expenditures in the area significantly whereas in the rural area, the rural estimations. In rural estimates, only age, household size and the work status of family income, marital status, insurance and the size head are the more important determinants of of the household are significant. Another study the household spending in 2004/5 and 2009/10 by Nguyen et al (2006) in Taiwan based on 1993 time frame, respectively. and 1998 data, they found that the difference between the urban and rural distributions due 1. Introduction to factors such as education, ethnicity and age. However, in the latter survey analysis, they Malaysia is a country that is experiencing rapid showed that the factors identified earlier are urbanization process in accordance with only true for lowest quantiles. A study on the the status of a developing country. This has behaviour of household’s food expenditures by stimulated the process of urbanization as Massimo et al (2009) showed that households many rural communities are trying to seize headed by elder, not employed and endowed the opportunities that exist in urban areas with low education level individuals spent a to improve their economic status. Although relatively higher level of food expenditure based various efforts have been made, there are still on 2000 and 2006 Consumption Expenditure socio-economic gap between rural and urban Surveys at household level, implemented by the areas. This, in turn, affects the way people in Italian National Statistical Institute (ISTAT). In rural and urban spending. Thus, factors affecting most cases, it can be found that factors

1 Universiti Kebangsaan Malaysia

MyStats 2015 Proceedings 69 influencing expenditures for both urban and this study, which are the Ordinary Least Squares rural areas vary between countries, depending (OLS) and Quantile Regression (QR) models. on covariates of interest. The Ordinary Least Squares (OLS) regression is Ordinary least squares method (OLS) and a statistical technique used to find the closest Quantile regression (QR) method are used estimate to the actual value of the data or to model the relationship between the usually termed as the best fit line. It assumes expenditure and factors affecting it. The OLS that the relationship between the dependent method assumes that the total expenditure variable Y and the independent variable X is is a linear function of a set of socioeconomic linear with the following equation: households characteristics and proceeds to minimize the sum of squared residuals from YX=++aβii ε i (1) the mean. On the other hand, QR, introduced by Koenker and Basset (1978) is a method to where estimate the conditional of a variable. This Y = dependent variable regression has the potential of generating a = intercept on the y-axis

different responses in the dependent variable bi = the ith regression coefficient

at different quantiles. According to Koenker and Xi= the ith independent variable

Bassett (1978), the QR can be considered as an ei = the ith error extension of the conditional mean model when compared with the OLS model. In other words, Based on equation (1), α is the value of the the QR substitutes to the mean the different dependent variable 𝑌 when 𝑥 value is zero. The quantiles values and proceeds to minimize β coefficient describes the change in value of the weighted sum of the absolute residuals. In 𝑌 with every unit change of 𝑥. The coefficient β that, the median regression estimator can be provides information on the average estimate of considered as a central special case (Koenker the model. and Hallock, 2001). To get the best fit line, the fitted line must be This study aims to investigate the main factors the one affecting household consumption expenditure that minimizes the total distance between the in urban and rural areas in Malaysia using actual values of the data with the expected the Household Expenditure Surveys (HES) values based on the fitted line. The difference data collected in 1998/1999, 2004/2005 between these two values is called residual or and 2009/2010. The aim of this study is to error of the model. The best fitted line is the identify the main factors affecting household line that minimizes the total sum of squares expenditure in rural and urban areas based on between the fitted line and actual data, that is the method Ordinary Least Squares Regression 2 (OLS) and Quantile Regression (QR) methods 2 ˆ ∑∑ei=( YY ii − ) across the 3 census periods. The usage of QR i enables the identification of factors which may with e = error at the i-the observation, be different across several levels of quantiles i Y = actual value, Y at the i-th observation considered. From this study, the differences i in the factors affecting the expenditures at Y hat = estimated Y value at the i-th both urban and rural areas may be identified. observation Seven factors which may affect the household expenditures, namely factors concerning the The OLS method assumes that the total head of the household which are age, gender, household expenditure is a linear function of ethnicity, marital status, academic status and a set of other household characteristics and employment status as well as household size proceeds to minimize the sum of squared were considered. residuals from the mean. 2. Methodology Quantile regression, introduced by Koenker Two statistical modelling methods is used in and Basset (1978), is a method to estimate and

70 MyStats 2015 Proceedings draw inferences about conditional quantile functions. As opposed to OLS which estimates the conditional mean of the response variable given certain values of the predictor variables, QR aims at estimating either the conditional median or other quantiles of the response variable. In other words, this regression has the potential of generating different responses in the dependent variable at different quantiles. These different responses may be interpreted as differences in the response of the dependent Figure 1: Example of OLS- QR graph with variable to changes in the regressors at various indicators points in the conditional distribution of the dependent variable. 3. Data

Quantile regression models assume that the This study aims to determine the factors conditional quantile of a random variable Y is a affecting the household expenditures for linear in the regressors X with rural and urban households in Malaysia. The data used in this study is part of the YXi=++aβθθ ii ε with Quantθθ( Yii\ X) = β X i household expenditure data collected during where X is the vector of independent variables the Household Expenditure Survey (HES) i in 1998/1999, 2004/2005 and 2009/2010 and βθ is the vector of parameters. Quantθ(Y\X) is the θth conditional quantile of Y given covering both the rural and urban areas. The X. Estimation of the quantile parameters is data constitutes 30% of total data collected by performed as the solution to Department of Statistics Malaysia (DOSM) and the number of data, n used in this study is as in Table 1 below: min →k θβYX− +1 −− θYX β β∈R (∑∑iY::>< X ββiiθθiY X ( ) ii) ii ii Table 1: Number of households, n involved in The standard errors for the vector of parameters the HES for the 3 periods considered are obtainable by using the bootstrap method described in Buchinsky (1998). As QR looks into Year Rural Urban different quantiles of the model, it can provide a more complete description of the underlying 1998/1999 1191 1570 conditional distribution compared to other 2004/2005 1385 2838 mean-based estimators such as OLS. 2009/2010 1997 4498 To determine whether a factor is significant or not, the confidence interval for QR is computed The dependent variables in this study the total and if the coefficient of one factor calculated household expenditure per month with seven based on OLS does not fall inside the confidence independent variables recorded. Details of these interval for the QR, this implies that the factor independent variables can be found in Table 2. is significant at that quantile. A factor is considered as significant based on OLS using 4. Results the ANOVA test that is by comparing the sum of squared error for that factor versus the overall Both the OLS and QR were fitted to the data model total sum of squares. and the best fitted line for each type of model

MyStats 2015 Proceedings 71 was derived. For the QR model, three quantiles middle expenditure. Table 4 shows the summary namely the 25th, 50th and 75th quantiles were of results for rural area. considered; which imply the low expenditure level, medium level and high expenditure level. 4. Conclusion & Discussion

Table 3 shows the results of analysis based on This study manages to identify the factors that data from urban areas for the three periods affected the household expenditure across the considered. Based on the analysis results, in three time frames the HES were conducted urban areas, the highest level of education using the OLS and QR techniques. It can be (EDU3) is the factor that contributes most to the seen that in the urban areas, level of education expenditure regardless whether the analysis of the household head plays a very important was based on mean or on low, middle and role in determining the household expenditure. income range. For OLS, large household size is However, this factor, although significant is not found to be highly significant for expenditure; the main factor that determines household however for the QR, for all quantiles considered, expenditure in the rural areas whereby it can be EDU2 is a very important factor. For all the seen that when data is analysed using quantile regression models considered, BUMIPUTERA regression method, although in 1998/99 for gives a negative relationship with household the low and middle expenditure, highest expenditure. In 2004/2005 data, the household level of education plays a very significant role, size is found not to have any relationship with nevertheless a person who falls in BUMIPUTERA expenditure, however in 2010/11 survey data, category will results in decrease in household this is not true. In 2010/2011 time frame, the expenditure at the high expenditure category. status of the head of household as employer In 2004/2005 time frame, large family size has a significant positive relationship with affects expenditure most in all categories of expenditure for all models considered. The age expenditure whereas when the head of the of the head of household and work status as household is an employer, this result in higher ‘Worker’ play an important role for the low and expenditure for all quantiles considered. It medium household expenditure in 1998/99 can also be seen from the results that the time frame. latest time frame (2009/10) showed that for the low expenditure group in the urban area, The results on factors that significantly affect being single and married are two significant household expenditure are slightly different factors that affect expenditure as opposed to for rural areas. Based on OLS, the highest earlier years. Similarly, in the rural areas, these education level (EDU3) is the most significant two factors are also significant but at the high factor which influences household expenditure expenditure level for the same year. Age also in positive direction. However, detailed analysis affects expenditure but at low expenditure level showed that this factor does not have the most in the urban areas during 1998/99 and 2004/5 significant effect for all quantiles considered time frames; which implies that the older the apart from at the low and middle expenditure in head of household is, the more expenditure is 1998/99 time frame. For the high expenditure in spent. the same time frame, BUMIPUTERA is the most significant factor with negative relationship with The results in this study may be used as a expenditure. For the rural areas, being married guideline to differentiate factors that influence is a factor that is significant in influencing household expenditure between urban and expenditure in a positive relationship. Similarly, rural areas. However, there may be other factors size of the family affects household expenditure which may influence the expenditure that were significantly whereby the larger household size not available in this set of data such as income will translate to higher expenditure especially in of the head of household or total household all quantiles considered in 2004/5 as opposed to income. Similarly future study can also be other factors It can also be seen that in 2009/10 done on the effect of independent variables on time frame, age, BUMIPUTERA, and being single different types of household consumption such has significant effects compared to the low and as medical, insurance, food, entertainment etc. It

72 MyStats 2015 Proceedings is hoped that the availability of more data with other possible influencing variables in the future will ensure a more valid study to be conducted.

Acknowledgements

The authors are indebted to the Department of Statistics Malaysia for providing the valuable data to make this paper possible.

References

Binh T. Nguyen, James W. Albrecht, Susan B. Vroman & M. Daniel Westbrook. 2006. A Quantile Regression Decomposition Of Urban-Rural Inequality In Vietnam. 31

Buchinsky, M. (1998): Recent advances in quantile regression models: A practical guidline for empirical research,” The Journal of Human Resources, 33, 88–126

Ebru Caglayan (2012). A Microeconometric Analysis Of Household Consumption Expenditure Determinants For Both Rural And Urban Areas In Turkey. Februari. No. 2:8

Koenker, R. and Bassett, G. (1978). Regressions quantiles, Econometrica, 46, No. 1, 33 - 50.

Koenker, R. and Hallock, K. F. (2001), Quantile regression, Journal of Economic Perspectives, 15(4), 143-156.

Massimo Bagarani, Maria Forleo & Simona Zampino. (2009). Households Food Expenditures Behaviours And Socioeconomic Welfare In Italy: A Microeconometric Analysis. 1-16

MyStats 2015 Proceedings 73 Independent Variable Term Description Dummy Variable Age AGE Age of Head of NIL household Gender MALE Gender of Head of MALE = 1 for male, 0 Household otherwise Ethnic BUMIPUTERA Ethnic of Head of BUMIPUTERA = 1 for Household bumiputera, 0 otherwise Marital Status MS Marital Status of Head SINGLE = 1 of Household MARRIED = 1 SEPARATED=1

Education Level EDU Highest education EDU1= 1 if low level (PMR) level of head of EDU2= 1 medium level household (SPM/SPMV/diploma/ certificate) EDU3= 1 if high (university) EDU4= 1 if no formal education Work Status WS Work status of the EMPLOYER = 1 head of household WORKER = 1 OTHERS = 1 if not working Household Size SIZE Number of household SIZE1 = 1-3 people supported by head of SIZE2 = 4-7 people household SIZE3 ≥ 8 people

Table 3: Analysis results for Urban Area for the three periods

*The blue box indicates significant positive relationship, the pink box indicates significant negative relationship, numbers inside the box (1-5) indicates ranking based on significance with 1 indicate very significant. The green box indicates the factor which is most significant in all models considered with direction same as the effect of that particular factor in the same year.

74 MyStats 2015 Proceedings Table 4: Analysis results for Rural Area for the three periods considered.

* The blue box indicates significant positive relationship, the pink box indicates significant negative relationship, numbers inside the box (1-5) indicates ranking based on significance with 1 indicate very significant. The green box indicates the factor which is most significant in all models considered with direction same as the effect of that particular factor in the same year.

MyStats 2015 Proceedings 75 76 MyStats 2015 Proceedings OPEN DATA AND CHALLENGES FACED NY NATIONAL STATISTICS OFFICES Siti Haslinda Mohd Din 1, Nur Aziha Mansor, Faiza Rusrianti Tajul Arus

Abstract Open data has potential to create tremendous value and has started to be used on a wider An amazing growth of interest in open data scale. New products and business models worldwide over the past half-dozen years are emerging off the back of the Open Data has derived tremendous open data initiatives movement. Hence, by using App developers (ODI). Open data of which defined as data the weather reports to warn people of pollution that can be freely used, shared and built-on in specific areas can be made. Traffic data is by anyone, anywhere, for any purpose. The being used for real-time traffic reporting to ease benefits of the availability of open data can be congestion in urban areas. Government data is served many groups including government. being utilised to track how tax income is being Open data creates value for the government. spent. Open data is helping people improve It is able to improve measurement of policies, their household energy efficiency and linking better government efficiency, deeper analytical property owners with construction companies insights, greater citizen participation, and a that can make it happen. There are many great boost to local companies by way of products examples of how open data is already saving and services that use government data. lives and changing the way we live and work. However, in order to gain these huge potential “Open Data” is generally understood to mean benefits, the question of what is hindering data that are made available to the public free the success of open data and what are the of charge, without registration or restrictive real challenges need to be worked on. The licenses, for any purpose whatsoever (including challenges that need to be considered include commercial purposes), in electronic, machine- the standard guides and definitions of open readable formats that are easy to find, download data; the suitable technologies for opening up and use. As applied to public institutions data; technological barriers for governments; such as governments and intergovernmental no structured approach to openness; too little organizations, Open Data is grounded in the research on the impact of openness; the privacy recognition that government data is produced issues; and the mosaic effect. with public funds so, with few exceptions, should be treated as public goods. 1. INTRODUCTION Data that are publicly available do not Information is a tool for supporting necessarily meet the definition of “Open Data.” development, knowledge sharing and social For example, data from NSOs may be publicly initiatives. With the vast quantities of data available, but only to certain qualified or and this data can be transform virtually to registered users, or with narrow restrictions enhance the quality of living. It makes precise on how the data can be used. Data may also the meaning of “open” in the terms “open data” be publicly available but only in proprietary and thereby ensures quality and encourages formats that are difficult to access or manipulate compatibility between different pools of (such as PDF), or even non-electronic formats. open material. Open data are can be freely Nonetheless, these data are often characterized used, modified, and shared by anyone for any as “public data.” It is important for NSOs to purpose. clearly differentiate their data products and

1 Department of Statistics, Malaysia

MyStats 2015 Proceedings 77 corresponding policies. 2. Primary - Data is as collected at the source, with the highest possible level of granularity, 2. CHARACTERISTICS OF OPEN DATA not in aggregate or modified forms.

Key characteristic of Open Data is the potential 3. Timely - Data is made available as quickly as for reusability, both by data experts and necessary to preserve the value of the data. the public at large. Reusability is the key to creating new opportunities and benefits from 4. Accessible - Data is available to the widest government data as detailed later in this range of users for the widest range of purposes. working paper. For Open Data to be reusable it must generally meet two basic criteria. First, 5. Processable Machine - Data is reasonably the data must be legally open, meaning that it structured to allow automated processing. is placed in the public domain or under liberal terms of use with minimal restrictions. This 6. Non-discriminatory - Data is available to ensures that government policies do not create anyone, with no requirement of registration. barriers or ambiguities concerning how the data may be used. Second, the data must be 7. Non-proprietary - Data is available in a format technically open, meaning that it is published over which no entity has exclusive control. in electronic formats that are machine-readable and preferably non-proprietary. This ensures 8. License-free - Data is not subject to any that ordinary citizens can access and utilize copyright, patent, trademark or trade secret the data with little or no cost using common regulation. Reasonable privacy, security and software tools. privilege restrictions may be allowed.

Open data definition gives precise detail of 3. OPEN DATA INITIATIVES BY NSOs summaries the characteristics of open data are: Open Data initiatives are transforming how • Availability and Access: the data must be governments and other public institutions available with no more than a reasonable interact and provide services to their reproduction cost, preferably by downloading constituents. They increase transparency and over the internet. The data must also be value to citizens, reduce inefficiencies and available in a convenient and modifiable form. barriers to information, enable data-driven • Universal Participation: must be able to use, applications that improve public service re-use and redistribute - there should be no delivery, and provide public data that can discrimination against fields of endeavour or stimulate innovative business opportunities. against persons or groups. For example, ‘non- commercial’ restrictions that would prevent National Statistics Offices (NSOs) produce ‘commercial’ use, or restrictions of use for many datasets that could typically comprise certain purposes (e.g. only in education), are not the foundation of an Open Data program. allowed. With relationships with other data producing • Re-use and Redistribution: the data must be agencies and expertise in dealing with technical provided under terms that permit re-use and and data quality issues, NSOs are extremely well redistribution including the intermixing with placed to make a valuable contribution to Open other datasets. Data initiatives.

Government data shall be considered open if it Despite these advantages, NSOs do not always is made public in a way that complies with the feature conspicuously in government sponsored principles below: Open Data programmes and may be missing an important opportunity to expand the use and 1. Complete - All public data is made available. re-use of the data. Public data is data that is not subject to valid privacy, security or privilege limitations. 3.1 Open Data initiatives have significant implications for NSO operations

78 MyStats 2015 Proceedings NSOs manage many surveys and census such NSOs reputation and opportunities. as economic survey and household census for gather economic statistics as well as Data quality is often cited as a concern demographic data series, which are considered preventing NSOs from adopting Open Data essential, high-value datasets in Open Data policies. Statistics are subject to various quality programs; hence, NSO products and expertise factors which are inherent to standard statistical will be in high demand. The underlying methods. The important thing for NSOs is to principles of Open Data—to make government provide adequate context and information data as accessible and useful as possible—are about the uncertainties and limitations of any clearly aligned with the NSOs core mission. particular dataset. Doing so would lead to Some requirements of Open Data are likely to increased transparency and greater public trust be easily achievable by NSOs, such as opening in both the data and the NSO itself. Furthermore, data that is already public, and allowing data NSOs can use feedback mechanisms available that is already available in non-open formats to through Open Data to improve data quality. be downloaded in bulk formats. Other desirable features of a good Open Data initiative, such as 3.4 Open Data raises potential challenges for the transparent publishing of relevant policies, NSOs metadata, and training materials, will likely take more effort. NSOs have the responsibility, often established in legislation, to protect the confidentiality 3.2 NSOs have strong roles to play in Open Data and privacy of their data providers, who may initiatives be individuals, households or businesses. Confidentiality issues do not apply equally to all While NSOs are not usually well positioned to types of data, and many types of data, such as lead a government-wide Open Data initiative, aggregated statistics, can typically be published they nonetheless can be a vital component of and opened without breaching confidentiality it. NSOs produce high-demand official statistics, as standard anonymization techniques are and can be instrumental in ensuring that the applied. In the more sensitive realm of survey Open Data initiative is properly aligned with microdata, each NSO must ultimately make the the wider National Statistical System. NSOs determination about whether and how to make have extensive experience in data selection, these data public and which techniques to be the application of standards, and provision use, but experience to date includes several of metadata; hence, NSOs have a clear role to cases where microdata have been opened play in providing guidance to other agencies in without compromising confidentiality. publishing their own data. Finally, as members of the international statistics community, NSOs Open Data best practices require that data are in a good position to make sure that the producers provide clear terms of use with Open Data initiative is well aligned with the minimal restrictions on how data can be used. efforts of other countries and international This may prove challenging for NSOs that organisations, and in line with internationally manage different classifications of products, agreed standards. some of which may be restricted on grounds of confidentiality or even national 3.3 Open Data initiatives can greatly benefit security. However, NSOs can take advantage of NSOs standard international licenses, and there are several case studies for managing data under Data collection is a costly activity, financed by multiple access policies. public spending and other investments. Open Data can also relieve pressure on NSO operating NSOs may also be concerned about the budgets by reducing the demand for custom resources and capacity required to implement tabulations and other data requests. Open Open Data. Experience to date suggests that Data will likely raise the profile of NSOs both the additional resources are not substantial, within the government, with key agencies and and may be at least partly offset by cost savings the public. A greater profile will strengthen the and greater efficiencies. But it is true that Open

MyStats 2015 Proceedings 79 Data may represent an opportunity costs to 4.2 Suitable technologies some NSOs, to the extent that they derive some revenue from data sales. It can happen where some local governments or NSOs cannot identify the relevant for opening NSOs may also face challenges in engaging a up data and therefore they fail to implement larger user ecosystem. Since the goal of Open proper technologies. They are unable to Data is to increase the use of data, it is almost ascertain relevant applications for their data. inevitable that the NSO will be interacting with Besides that, the data also should be available user communities which are new and possibly in a machine-readable standard format, which unanticipated. Again, this can be an opportunity means it can be retrieved and meaningfully for the NSO as much as a challenge. Many processed by a computer application. According public agencies that have adopted Open Data to Emily Shaw, the national policy manager policies have experimented using workshops, of Sunlight Foundation, if the intended users toolkits, competitions, social media and other are developers and programmers the data approaches to successfully engage new data should be presented within an application users. Ultimately, what may be required is for programming interface (API); if it’s intended NSOs to simply be open to change, in order to for researchers in academia, data might be reap the benefits that Open Data can provide structured in a bulk download; and if it’s aimed them. at the average citizen, data should be available without requiring software purchases (Shuen, 4. CHALLENGES FACES BY NSOs 2014).

As we know, open data represents a new era in data exploration and utilization but it can be 4.3 Technological barriers fraught with challenges. The hindering factors that affect the success of open data and what Basic technological components for open are the real challenges need to be worked on. data are still not easily accessible to smaller Government as well as NSOs truly believed in governments or NSOs. NSOs might have limited the opportunities of open data but the use of independence and unstable budgets as well open is accompanied by many impediments. as lack of local technical skills. As discussed This paper highlighted several challenges that in Istanbul by 15 open data and transparency need to be identified by NSOs in utilization of leaders from across Eastern Europe and Central open data. Asia organized by UNDP and the Partnership for Open Data (the World Bank, the Open Data 4.1 Standard guides and definitions of open Institute and Open Knowledge) in September data 2014, in order to know how much government should expect to spend on an open data Government and NSOs should establish the initiative, cost estimates are typically required standard guides and definition of what data for: should be public and how to make data public. - Allocating budget(s) for the central team and The data openness means the NSOs will make within individual ministries, departments or public information available proactively and agencies. put the information within the reach of the - Budgets need to reflect the additional and public (online) without barriers for its reuse specific resources required for a project; when and consumption. Strong open data policies further capacity was required. should be built upon the principles embodied - A business case for the open data initiative. by existing laws and policies that defend Besides that, communication with developers and establish public access, often defining and technical peoples on open data also should standards for information quality, disclosure and be part of open data strategy in order to reduce publishing (Sunlight Foundation, 2014). technological barriers.

80 MyStats 2015 Proceedings detailed analysis of what government is doing 4.4 Structured approach to openness (Open Data Research Network, 2013).

In the age of open data, all public data should 4.7 Mosaic effect be made available where the data is not subject to valid privacy, security or privilege “According to Marion Royal, program director limitations. The data supposes as it is collected of U.S. Data.gov, when very large data sets, at the source, with the highest possible level even those with completely unclassified of granularity, not in aggregate or modified information, are combined. People can mix that forms. Therefore, the data should reasonably data, reassembling it in unforeseen ways, like a structure to allow automated processing. The mosaic puzzle. In a worst case scenario, security ability for data to be widely used requires that can be compromised by those with ill intent” the data be properly encoded. Free-form text (Breeden, 2014). This is the phenomenon where is not a substitute for tabular and normalized non-personally identifiable information can be records. Images of text are not a substitute for combined with other available information in the text itself. Sufficient documentation on the such a way as to pose a risk of identifying an data format and meanings of normalized data individual. items must be available to users of the data. This principle was agreed by 30 open government 5. CONCLUSION advocates during the meeting in Sebastopol, California, December, 2007 (Tauberer, 2007). The use of open data is a recent phenomenon but, as with many technological advances, 4.5 Research on the impact of openness it is growing in relevance and prevalence to become “normal.” Although clearly a good There are few studies done by academics thing in theory, open data risks and challenges (instead of consultants) on the impact of data must be given equally concern. Governments openness. The studies often take a macro need to protect individuals and organizations economic perspective and do not assess micro from the risks of open data and at the same levels of impacts. In general, public had mixed time advancing open data’s potential value. views as to whether trust could be built by The risk of utilization of open data is there opening up data or whether trust could be unless approached right. With overloaded undermined if confusion was caused by multiple information, NSOs need to find better ways interpretations of the same data. Thus, more to compile, process, store and share the researches in this area should be intensified to “sea of information” without compromising prove that making information accessible to the quality and confidentiality. The statistics can public can improve public service delivery and typically be published and opened without lowering the cost of getting information. breaching confidentiality so long as standard anonymization techniques are applied. The new 4.6 Privacy issues data policy that opens public datasets to all users free of charge or in other word, increasing In reality, government agencies collect huge access to information can be an opportunity to amounts of data about businesses and NSOs to improve the data quality. It aligns with households. Technological advances have “good government” efforts, in which publicly made those data possible to be accessed easily. available information is seen as essential to Therefore, governments as well as NSOs should transparency, public trust, and improving public alert to what extent privacy concerns should be services. NSOs should transform the challenges handled at technological level advances. Key faced into opportunities and try to take concern is the sensitivity of the data in which advantages of open data. The new abilities bring privacy rights might be damaged. However for by open data can eliminate the loose procedure transparency and open data supporters, ‘data in data collection, compilation and processing protection’, or ‘protecting privacy’ might be used in order to enhance the quality of data. In the as excuses not to release data, or to only release nutshell, open data gives excuses to NSOs make data in aggregated forms that don’t permit changes.

MyStats 2015 Proceedings 81 6. REFERENCES

10 challenges for open data. (2015, August). Retrieved from www.openstate.eu/en/2015/08/english- 10-challenges-for-open-data/

Breeden, J. (2014, March). Worried about security? Beware the mosaic effect. GCN. Retrieved from https://gcn.com/articles/2014/05/14/fose-mosaic-effect.aspx

Guidelines for open data policies. (2014, March). Sunlight Foundation, 3. Retrieved from http://assets. sunlightfoundation.com/policy/Open%20Data%20Policy%20 Guidelines/OpenDataGuidelines_v3.pdf

How to plan and budget an open data initiative. (2014). Retrieved from https://theodi.org/guides/ how-to-plan-and-budget-an-open-data-initiative

Open data & privacy. (2013, August). Retrieved from http://www.opendataresearch.org/ content/2013/501/open-data-privacy-discussion-notes

Parker, B.J. & Jain, K. (2015, April). The challenges of open data and privacy issues. Western City. Retrieved from http://www.westerncity.com/Western-City/April-2015/The-Challenges-of-Open-Data- and-Privacy-Issues/

Shuen, J. (2014, March). Open data: what is it and why should you care? Retrieved from http://www. govtech.com/data/Got-Data-Make-it-Open-Data-with-These-Tips.html

Tauberer, J. (2007) 8 principles of open government data. Retrieved from https://opengovdata.org/

Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information Systems Management, 29(4), 258-268.

82 MyStats 2015 Proceedings MULTIVARIATE TIME SERIES SIMILARITY-BASED COMPLEX NETWORK IN STOCKS MARKET ANALYSIS: CASE OF NYSE DURING GLOBAL CRISIS 2008 Maman Abdurachman Djauhari 1, Gan Siew Lee 2

Abstract economic sectors played in NYSE during global crisis in 2008 will be presented and discussed. By Long before we started with the 21st nature, all stocks are a multivariate time series. millennium, Stephen Hawking saw the current Therefore, in that example, we show that the millennium as the millennium of complex use of Pearson correlation coefficient is useless systems. Until present, he was right due to to define the similarity among them. We use the fast growing technology in computer. Escoufier’s vector correlation coefficient instead. Nowadays, in the era of digital world where big Keywords: minimal spanning tree, network data is our daily menu, we cannot escape from centrality, stocks network, stock’s prices, vector complex systems. As big data is characterized correlation by “4V” (Variety, Velocity, Veracity and Volume), statistics such as practiced in traditional way 1. Introduction is not enough and sometime is not apt to be used to understand the most important The inter-relationships among stocks in a information contained in big data. What people given portfolio represent a complex system call now data analytics needs to be used as of correlation structure. It is numerically the only complementary. It is mathematically represented in the form of a correlation dominated by multivariate data analysis (MVDA) matrix. In practice, it is then a complex network in the French way. Traditional statistics, which where its level of complexity is of order. Here is based on mathematical statistics, is to do n is the number of stocks. As a consequence, confirmatory analysis while data analytics is when stocks market becomes a big dataset in to do exploratory analysis. The former is to do terms of n, the correlation structure becomes hypothesis testing (micro analysis) and the latter harder and harder to understand. How can is for hypothesis generation (macro analysis). we transform such complex network into Macro analysis is more appropriate to deal with important economic information? To answer big data. The principal mathematical tool to this question, the tools developed in the field of do macro analysis is MVDA in the French way econophysics, see Mantegna and Stanley (2000), where big data is considered as a complex are very powerful. system. In this regards, the main problem is to In the current practice, stock is usually define the similarity among objects of the study represented by its closing price. It is then a such as stocks, economic sectors, currencies, univariate (UV) time series which is customarily and other commodities in financial industry, assumed to be governed by geometric which are statistically a multivariate time series. Brownian motion law. This means that the Furthermore, the principal tools to filter the price returns are independent and identically important information contained in a complex log-normally distributed. In other words, the system are complex network and social network log returns are independent and identically analysis. To demonstrate the advantages of normally distributed (i.i.n.d). This fundamental complex network approach in stocks market assumption is the theoretical basis in stocks analysis, in this paper the behaviour of network analysis. Under this assumption, the

1 Institute for Mathematical Research (INSPEM), Universiti Putra Malaysia 2 Department of Mathematical Sciences, Universiti Teknologi Malaysia

MyStats 2015 Proceedings 83 similarity among stocks can be measured in Kenett et al. (2013). terms of Pearson correlation coefficient (PCC) of log returns. Based on PCC, the stocks network is The rest of the paper is organized as follows. constructed as dissimilarities network and the We start by introducing in Section 2 the notion important economic information is filtered by of similarity among stocks each of which is using minimal spanning tree (MST). defined by its OH LC prices and stocks network is constructed. Section 3 presents the MSTs of However, in daily market activities, stock is NYSE one year before, during, and one year represented by its opening, highest, lowest and after global crisis. Some evidences from MST, closing (OHLC) prices. This means that stock is a Jaccard Index, as well as degree centrality multivariate (MV) time series of those prices. In will be reported to show the advantages of this case, since PCC is not apt anymore measure MV approach. Concluding remarks will be the similarity among stocks, we show that the highlighted in the last section. use of Escoufier vector correlation (EVC) is more to advantageous. It generalizes PCC from 2. Stocks network in multivariate setting bivariate into multivariate case. EVC, originally introduced by Escoufier (1973), quantifies Let pi(t,1), pi(t,2), pi(t,3)and pi(t,4) the linear relationship of two random vectors denote the opening, highest, lowest and while PCC is about two random variables. closing prices of stock i; i = 1, 2, …, n. We write,

Nowadays, its application can be found in ri(t,m) = ln pi(t,m) - ln pi(t-1,m) (1)the log return many areas of statistics. However, to the best of the m-th price of stock i at time t; m = 1, 2, of our knowledge, its use in stocks network 3, 4. Under the assumption that each price is analysis is still at the beginning as can be seen in a GBM process for all stocks, ri (t,m) are i.i.n.d

Kazemilari and Djauhari (2015) and Djauhari and for all m. Therefore, ri (t,m) in (1) can be viewed Gan (2015). as the m-th component of a random vector. This viewpoint leads us to consider stock as a Those authors have showed that MV could multivariate entity. describe well the real market situation. For example, (i) in terms of the number of worst 2.1. Similarity among stocks performance stocks (leaves), and (ii) the phenomenon of social embeddedness that Let X and Y be two random vectors representing cannot be detected by using closing price only. two different stocks each of dimension p = 4

This phenomenon, see Halinen and Tornroos and q = 4. We denote rX(t,m) and rY(t,m) the (1998), is very important in the study of stocks m-th component of X and Y; m = 1, 2, 3, 4, and behaviour under similar management. T the length of time support for the four prices.

Since highest and lowest prices can take place at Let SXX, SYY and SXY be the sample covariance any time during the trading day, to synchronize matrix of X, Y, and between X and Y are, In this the effect of OHLC prices, it is customary to circumstance, by using EVC, the correlation use weekly data as can be seen, for example, in between the random vectors X and Y is, Lo and MacKinlay (1990). Furthermore, like in UV case, M ST is used to filter the topological (2) structure of stocks network based on OHLC prices. To construct MST from dissimilarities network, since Kruskal’s algorithm or Prim’s is computationally slow for large n, we use the Like PCC, this coefficient is the cosine of an algorithm proposed in Djauhari and Gan (2013). angle spanned by two stocks. It satisfies, To show the advantages of MV approach (i) 0 ≤ RVXY ≤ 1. It is 1 if X and Y have the same compared to the standard UV approach, a correlation structure and it is 0 if each price of case study on NYSE data during global crisis in one stock is uncorrelated with all prices of the 2008 has been conducted. We compare the M other stock.

STs issued from those approaches in terms of (ii) In bivariate case, RVXY is the squared of PCC. degree centrality measure since it represents market index as remarked, for example, in

84 MyStats 2015 Proceedings In terms of these properties, therefore, EVC constructed. In Figure 1, the dynamics of half defines the similarity among stocks in MV yearly MST is presented from the period of setting.

2.2. Stocks network

Let Sij be the covariance matrix between stocks i and j. According to (2), the RV coefficient of these stocks is RVij obtained by substituting X = i and Y = j. If we consider a matrix of size

(nxn) with RVij as i-th row and j-th column element, this matrix is symmetric with all diagonal elements equal to 1 and the off- diagonal elements are between 0 and 1. It then represents similarities network among stocks in MV setting of OHLC prices. This generalizes then the notion of correlations network such as presented in Mantegna and Stanley (2000), Bonanno et al. (2003), and Galazka (2011) into the notion of vector-correlations network. By (a) January-June 2007 using the idea in Mantegna and Stanley (2000), to analyse that network, we define the distance between two stocks i and j,

(3)

If we denote D the matrix of size (nxn) with δij in (3) as the element of its i-th row and j-th column, then D represents the dissimilarities network among stocks that we required. To filter the information contained in D, like in UV case, the method of MST is used. For that purpose, due to computational complexity, we use the algorithm developed in Djauhari and Gan (2013) instead of Kruskal’s algorithm or Prim’s. Furthermore, the (b) July-December 2007 topological properties of MST is analysed in terms of market index.

3. Results on NYSE

NYSE 100 most capitalized stocks are analysed one year before, during, and one year after global crisis in 2008. Data were downloaded from this link: http://www.nyse. com/about/listed/nyid_components.shtml. Four stocks were removed from the analysis because of the incompleteness of data. Therefore, only 96 stocks were analysed.

3.1. Evidence from MST Based on weekly data of OHLC prices, NYSE 96 most capitalized stocks network is (c) Jan-June 2008

MyStats 2015 Proceedings 85 Figure 1. Dynamics of MSTs in Jan-Jun 2007 (a), Jul-Dec 2007 (b), Jan-Jun 2008 (c), Jul-Dec 2008 (d), Jan-Jun 2009 (e), and Jul-Dec 2009 (f) with the following legend:

(d) July - December 2008

The colour of the node (stock) refers to the economic sector where it belongs. At a glance, we see at Figure 1 that one year before the crisis the centre of the network is dominated by Financials. The performance of this sector was dominant in NYSE. However, it was not so anymore during the crisis. In this period all colours are distributed more randomly. The situation becomes severe in the second half of 2008 where Financials moved to the periphery. When the crisis was over, this sector strengthened again. But, in the second half of 2009 it worsened again. This is perhaps due to the Greek debt crisis. This result is totally different from that given by Djauhari and Gan (e) Jan - June 2009 (2014) who use closing price only. To show other advantages of MV approach, in the next sub-sections the two results will be compared in terms of Jaccard index and degree centrality.

3.2. Evidence from Jaccard index

Jaccard index is used to measure the similarity among MSTs. Here, for each period, we compare MST issued from MV approach and the one given by UV approach. This index reflects the discrepancy in correlation structure. For period i = 1, 2, …, 6, see Djauhari and Gan (2014), it is defined by,

, , = (f) July - December 2009 �𝑴𝑴𝑴𝑴𝑴𝑴𝑼𝑼𝑼𝑼,𝒊𝒊 ∩ 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴 𝒊𝒊� 𝑰𝑰𝒊𝒊 �𝑴𝑴𝑴𝑴𝑴𝑴𝑼𝑼𝑼𝑼 𝒊𝒊 ∪ 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴�

86 MyStats 2015 Proceedings Here, MSTUV,i and MSTMV,i are the MSTs of the describe the above market situation. Under 0.2338 (Jan-Jun 2009). This indicates that the this approach, leader in all periods is Industrials MST issued from UV approach is really different and not Financials, and turbulence among from that given by MV approach in all periods. stocks also occurs but not severe like under MV. This is the advantages of MV approach based on This does certainly not reflect the real market OHLC prices in stocks network analysis. situation.

3.3. Evidence from degree centrality 3.3.3. Degree centrality of economic sector

In this sub-section, the results of MV approach Degree centrality of an economic sector is and UV will be compared in terms of number of defined as the average of all stocks’ degree leaves, diameter of MST, and degree centrality. in that sector. By using the same method described in the previous paragraph, we find 3.3.1. Number of leaves and diameter of MST that MV approach based on OHLC prices A leaf is a stock of degree one. It represents a presents more dynamics degree centrality of worst performance stock in the market in terms economic sector compared to the case where of the number of other stocks being directly only closing price is considered. linked with. On the other hand, diameter of MST represents the longest path used to propagate 4. Concluding remarks the influence of a particular worst stock (called pole) to another pole. Data NYSE show that the This paper is to show that stocks network number of worst stocks in MV setting is always analysis based on OHLC prices, where similarity less than in UV and the diameter of MST under among stocks is defined using EVC, is more MV is generally greater than under UV. This is a advantageous than based on closing price only. natural consequence from the fact that, unlike In the case of NYSE data, we show, in UV setting which focuses on the closing price, each stock in MV brings all information about (i) The evidence from Jaccard index in each OHLC prices. period. The results issued from both MV and UV approaches are totally different. 3.3.2. Diameter between and within economic (ii) The evidence from the number of worst sectors in the MST stocks and the diameter of MST. MV setting reflects the real market situation. Other advantages reveal in the study on, (iii) The dynamics of each sector in terms of (i) Diameter between sectors which measures degree centrality during the six periods of study the closeness of a particular sector A to all other can be seen more clearly under OHLC prices- sectors in MST in terms of maximum linkage. based rather than closing price-based. In other words, it is the sum of diameter of the (iv) In real situation, Financials sector is at union A U B for all other sector B in MST. the centre (the leader) in NYSE. This can be described nicely by using MV approach but not (ii) The cohesiveness among stocks in a given UV. sector in terms of diameter within sector. The results show that Financials and Industrials Acknowledgement are the two most influential sectors before, during, and after the crisis. However, during the The first author is very grateful to the second half of 2008, the leadership of Finance Institute for Mathematical Research (INSPEM), was replaced by Industry. Furthermore, severe Universiti Putra Malaysia, for providing turbulence occurs among stocks in Financials research facilities during his service as which is not the case in Industrials sector. Research Fellow. The second author thanks the Universiti Teknologi Malaysia for the facility to By using the same method, UV approach cannot support her PhD program.

MyStats 2015 Proceedings 87 References

Bonanno, G., Caldarelli, G., Lillo, F. and Mantegna, R.N. (2003). “Topology of correlation-based minimal spanning trees in real and model markets”, Physical Review E, 68(4), p. 046130-046133.

Djauhari, M.A. and Gan, S.L. (2013). “Minimal spanning tree problem in stock network analysis: An efficient algorithm”, Physica A, 392(9), p. 2226-2234.

Djauhari, M.A. and Gan, S.L. (2014). “Dynamics of correlation structure in stock market”, Entropy, 16(1), p. 455-470.

Djauhari, M.A. and Gan, S.L. (2015). “Bursa Malaysia stocks market analysis: A review”, Journal of Science – Academy of Science Malaysia, 8(2), p. 2226-2234.

Escoufier, Y. (1973). “Le traitement des variables vectorielles,” Biometrics, 29(4), p. 751-760. Galazka, M. (2011). “Characteristics of Polish stock market correlations,” International Review of Financial Analysis, 20, p. 1-5.

Halinen, A. and Tornroos, J. (1998). “The role of embeddedness in the evolution of business network”, Scandinavian Journal of Management, 14(3), p. 187–205.

Kazemilari, M. and Djauhari, M.A. (2015). “Correlation network analysis for multi-dimensional data in stocks market”, Physica A, 429, p. 62-75.

Kenett, D. Y., Ben-Jacob, E., Stanley, H. E. and Gur-Gershgoren, G. (2013). “How high-frequency trading affects a market index”, Nature Scientific Reports, 3, 2110(1)–(8).

Lo, A. W. and MacKinlay, A. C. (1990). “An econometric analysis of nonsynchronous trading”, Journal of Econometrics, 45, 181-212.

Mantegna, R.N. and Stanley, H.E. (2000). An introduction to econophysics: Correlations and complexity in finance, Cambridge University Press, Cambridge, UK.

88 MyStats 2015 Proceedings ,PERCEIVED HAPPINESS AND SELF-RATED HEALTH: THE TWINS? A BIVARIATE ORDERED PROBIT MODELS ANALYSIS USING WORLD VALUE SURVEY Ying-Yin Koay 1 ,Yoke-Kee Eng 2

Abstract the most crucial inputs of individual happiness. In turn, happiness may have influential impact Studies regarding the resources of happiness on health especially in the modern life today have reached a consensus that health is which it is full of social competitions and thus one of the most crucial inputs of individual stress is created. Someone who is happy would happiness. Nevertheless, happiness also be more optimistic in handling their stressful seems to have influential impact on health. As life compared to those who are unhappy, such, are happiness and health co-existed? If thus the probabilities of getting any diseases yes, is the co-existed relationship due to the that stress related such as depression and joint determinants such as socioeconomic hypertension will be lower. In view of this, status (SES) or because of the endogeneity of both happiness and health may influence each happiness to health? This study aims to reveal other at the same time. Nevertheless, most the co-existed relationship between happiness happiness-health studies have been done with a and health in Malaysia using subjective unidirectional manner, either they emphasize indicators from the SES perspective. Based on on the influence of health on happiness, or vice a sample of 1300 Malaysia respondents from versus. the Wave 6 of World Value Survey (WVS), we Further investigation is needed to reveal the first construct a simultaneous system, Bivariate simultaneous relationship between happiness Ordered Probit Models, to investigate the nexus and health; either the relationship is formed between perceived happiness and self-rated due to the same explanatory variables (such health. Then, we test if the perceived happiness as socioeconomic status), or because of the is endogenous to self-rated health by the endogeneity of happiness to health. In order Likelihood Ratio and Wald test. The results to fill up the research gap, this study intends to show that perceived happiness and self-rated study the simultaneous relationship between health are significantly and positively related happiness and health in Malaysia using with each other. This implies that Malaysians’ subjective indicators – perceived happiness welfare can be earned by providing better and self-rated happiness. Both indicators are health care services and insurances. This cross- obtained based on a single item which are sectional empirical study also reveals that “Taking all things together, would you say you money (income) can buy Malaysians’ happiness are very happy, rather happy, not very happy or but not their health. Malaysians should strike not at all happy.” and “All in all, how would you a balance on their earning life by not over­ describe your state of health these days: very loading themselves as good health is hardly to good, good fair or poor?”, respectively. be earned. 2. Jointly determinants of happiness and health 1. Introduction The socioeconomic status (SES) is a multifaceted social economic variable which is usually Among the resources of happiness, health has may be due to their job securities are more globally and statistically been proven as one of guaranteed and hence the standard of living is

1, 2 Department of Economics, Faculty of Business and Finance, Universiti Tunku Abdul Rahman

MyStats 2015 Proceedings 89 more ideal, hence they are happier (Stutzer & (1) Frey, 2008; Diener, 2000). However, Clark and Oswald (1994) obtained a negative relationship (2) between education and wellbeing. They justified their findings by the explanation that happiness and self-report health. Assume the educated people would expect higher income; latent variables of perceived happiness (PH) * * the unmet of the expected income would create and self-rated health (SH) are PH i and SH i , disappointment and unhappiness. respectively as such: Many happiness economists have placed the focus lens on the relationship between income where Equation (1) and (2) are jointly and happiness. It is a norm that higher income determined, hence, they are recognised as a

can support better life style and thus happiness simultaneous system. β1 and β2 are vectors of is more guaranteed. This practice holds in unknown parameters, γ is an unknown scalar,

the cross sectional empirical results (Ferrer-i- ε1 and ε2 are the error terms, and i shows an Carbonell, 2005; Diener et al, 1999) but not in individual observation. The regressors in the the time series studies (Easterlin, 2001; Frey & models fulfill the exogeneity assumption that

Stutzer, 2000). This has been further explained E(X’1i ε1i) = 0 and E(X’2i ε2i) = 0.

by the adaptation to income where people X’1 is a vector of exogenous variables for have used to the high income level and they estimating the latent variable of perceived do not feel happier if the income increases (Di happiness, which include socioeconomic Tella et al, 2010). On the other hand, the studies status which are education (education), income have shown that the contributions of income (income) and employment status which has have outweighed its costs on health (Fichera & been categories into full-time (fulltime), part- Savage, 2015; Oshio & Kobayashi, 2010). Higher time (partime) and self-employed (semployed). income can afford a higher quality of living such Additionally, the model also includes the as living at a more peaceful residential area needs for basic needs (basic), safety (safety), (Subramanian et al, 2005), health care (Cutler belongingness (belongingness) and self- & Lleras-Muney, 2006), and balanced nutrition esteem (selfesteem), financial satisfaction (Lynch et al, 2000). (FS), life satisfaction (life) where they are served as instrumental variable to avoid the Unemployment has been found as one of problem of exogeneity. Furthermore, they are the crucial predictors for both unhappiness assumed not to correlate with the self-reported and poor health (Pierewan & Tampubolon, health. Lastly, we also consider age (age) and

2015). Previous studies have revealed a strong gender (gender). X’2 is a vector of exogenous negative relationship between unemployment variables for estimating the latent variable of and happiness (Di Tella et al, 2001). Being self-rated health. They are income, education, unemployed are not just about the loss of employment status, age and gender to avoid income, yet it brings significant non-pecuniary the identification problem of bivariate ordered impacts such as the loss of self-esteem, lack of probit modelling where at least one element

self-confidence, being pessimistic and doubtful of X’1 should not be presented in X’2. After the on the meaning of life (McKee-Ryan et al, 2005). exogenous variables are chosen, Equations (1) On the other hand, studies also presented a and (2) will be jointly determined by Bivariate positive nexus between unemployment and Ordered modelling. adverse health outcomes (McKee Ryan et al, 2005). Unemployment may tighten the financial Equations (1) and (2) will be estimated by the constraints, depress social status and promote method of full-information maximum likelihood. unhealthy behaviours that would put someone Compared to the two-steps estimation, FIML has in a stressful condition (Luo et al, 2010). statistically been proven to be more efficient and unbiased if (i) the error terms are bivariate 3. Methodology normally distributed, (ii) the absolute value of endogenous dummy coefficient, |ρ|, is high or We are using the Wave 6 of WVS for to reveal the (iii) the sample size is small (Sajaia, 2008). If the

co-existed relationship between perceived ρ = 0, PHi and ε2i are uncorrelated and PHi is

90 MyStats 2015 Proceedings exogenous for Equation (2). In contrast, ρ ≠ 0 level. This implies that income can motivate implies that PHi is correlated with ε2i and hence happiness while it diminishes health. endogenous. Thus, we use the likelihood ratio Besides income, the fulfillment of basic, safety test and Wald test to examine the exogeneity and belongingness needs, financial satisfaction in the bivariate (perceived happiness and and life satisfaction can contribute to the self-rated health) ordered probit model. If the improvement of Malaysians’ happiness at likelihood ratio / Wald test is greater than the 1% significance level. However, the need of critical values, we reject the null hypothesis self-esteem is negatively related to happiness that ρ≠ 0. Hence, we should regress Equations at 1%significance level while the likelihood of (1) and (2) simultaneously with bivariate being happy makes no difference across the ordered probit specification. In contrast, if age and gender. On the other hand, age and the rejection of null hypothesis is failed, we gender do significantly influence the likelihood should regress the two equations separately as of obtaining higher self-rated health at 1% univariate ordered probit specification. In order significance level. The elderly Malaysians would to show the robustness results, this study also rate their health poorer than those younger analyze the data with the seemingly unrelated due to their degeneration of physical functions. bivariate ordered probit with robust standard Additionally, females perceived themselves error models and ordered probit models as healthier than males in terms of their shown in the Table 1. healthiness in physical, mental and lifestyle.

5. Conclusion 4. Results From the statistical results, we can conclude that Table 1 shows the statistical results of his study. Malaysians’ happiness and health are positively Among the specifications, the simultaneous related to each other simultaneously not bivariate ordered probit models are the most because of having the same determinants. This appropriate in explaining the linkage between may provide some insights to the Government perceived happiness and self-rated health in and policy makers that regardless wellbeing Malaysia. The likelihood ratio (LR) test shows a policies or health policies, each of them can help simultaneous relationship between happiness to improve both happiness and health benefits and health where happiness is endogeneity to for Malaysians. As such, the Government may the health model at 1% significance level. The consider providing the health insurance or results from this simultaneous system display medical card for those who are from low-income that among the socioeconomic status, only groups. This cross-sectional empirical study also income (but not education and employment reveals that money (income) can buy Malaysians’ status) significantly influence both perceived happiness but not their health. Malaysians happiness and self-rated health. However, should strike a balance on their earning life by income positively influences happiness while it not over-loading themselves as good health is negatively related to health at 5% significance hardly to be earned.

MyStats 2015 Proceedings 91 Table 1: Results of the relationship between perceived happiness and self-rated health from different models.

Simultaneous bivariate Seemingly unrelated Ordered probit ordered Probit bivariate ordered probit with robust standard error PH income 0.4941** 0.0731*** 0.0578*** education -0.0335* -0.0325 -0.0360* fulltime 0.0401 0.0393 0.0713 partime 0.0297 0.0057 0.0703 semployed 0.0572 0.0619 0.0645 basic 0.1147*** 0.0913*** 0.0992*** safety 0.1664*** 0.1402*** 0.1509*** belongingness 0.1762*** 0.1488*** 0.1596*** selfesteem -0.2182*** -0.2027*** -0.2192*** FS 0.1116*** 0.0708*** 0.0755*** life 0.1276*** 0.1058*** 0.1130*** age -0.0029 -0.0024 0.0047 gender -0.0583 -0.0578 -0.1037 SH 0.5538*** SH income -0.0469** 0.0554*** 0.0093 education 0.0260 0.0014 0.0131 fulltime -0.0945 -0.1026 -0.1011 partime -0.1959 -0.2023 -0.2145 semployed -0.0224 -0.0041 -0.0118 age -0.0204*** -0.0205*** -0.0213*** gender 0.1903*** 0.1337** 0.1670** PH 0.7485*** Athrho _cons -0.3176*** 0.4308*** Gamma_cons 0.7175*** rho -0.3073 0.4060 LR test 111.39*** 13.90 (for PH model) 39.04***(for SH model) Wald test 96.78***

Notes: The asterisk (*) represents the significant level: * p < 0.10, ** p < 0.05 , and *** p < 0.10. LR test in the simultaneous bivariate ordered probit models is used to show the endogeneity of happiness to health at 1% significance level. LR test in the PH ordered probit model indicates that the model fulfill the equality assumption of coefficients across response categories while SH ordered probit does not fulfill such assusmption. Wald test displays that the PH and SH models should be regressed simultaneously.

92 MyStats 2015 Proceedings References

Clark, A.E., & Oswald, A. J. (1994) “Unhappiness and unemployment”, The Economic Journal, 104(424), 648-659.

Cutler, D. M., & Lleras-Muney, A. (2006). Education and health: evaluating theories and evidence. NBER Working paper 12352. Retrieved from http://www.nber.org/papers/w12352

Diener, E. (2000). Money and happiness: Income and subjective well-being across nations. In E. Diener & E. M. Suh (Eds), Culture and subjective well­being. Massachusetts: The MIT Press.

Di Tella, R., MacCulloch, R.J., & Oswald, A.J. (2001) “Preferences over Inflation and Unemployment: Evidence from Surveys of Happiness”, The American Economic Review, 91(1), 335-341.

Diener, E., Suh, E.M., Lucas, R.E., & Smith, H.L. (1999) “Subjective well-being: Three decades of progress”, Psychological Bulletin, 125, 276-302.

Easterlin, R.A. (2001) “Income and happiness: Towards a unified theory”, The Economic Journal, 111, 465-484.

Ferrer-i-Carbonell, A. (2005) “Income and well-being: An empirical analysis of the comparison income effect”, Journal of Public Economics, 89, 997-1019.

Fichera, E., & Savage, D. (2015) “Income and health in Tansania. An instrumental variable approach”, World Development, 66, 500-515.

Frey, B. S., & Stutzer, A. (2002) “What can economists learn from happiness research?”, Journal of Economic Literature, 40(2), 402-435.

Luo, J., Qu, Z., Rockett, I., & Zhang, X. (2010) “Employment status and self-rated health in north-western China”, Public Health, 124, 174-179.

Lynch, J. W., Smith, G. D., Kaplan, G. A., & House, J. S. (2000) “Income inequality and mortality: Importance to health of individual income, psychosocial environment, or material conditions”, British Medical Journal, 320(7243), 1200­1204.

McKee-Ryan, F., Song, Z., Wanberg, G.R., & Kinicki, A.J. (2005) “Psychological and physical well-being during unemployment: A meta-analytical study”, Journal of Applied Psychology, 90, 53-76.

Oshio, T., & Kobayashi, M. (2010) “Income inequality, perceived happiness, and self-rate health: Evidence from nationwide surveys in Japan”, Social Science and Medicine, 70, 1358-1366.

MyStats 2015 Proceedings 93 Pierewan, A. C., & Tampubolon, G. (2015) “Happiness and health in Europe: A multivariate multilevel model”, Applied Research Quality Life, 10(2), 237-252.

Sajaia, Z. (2008). BIOPROBIT: Stata module for bivariate ordered probit regression. Retrieved from Applied Research Quality Life, 10(2), 237-252. Sajaia, Z. (2008). BIOPROBIT: Stata module for bivariate ordered probit regression. Retrieved from http://www.adeptanalytics.org/download/ado/bioprobit/ bioprobit.pdf

Stutzer, A., & Frey, B. S. (2008) “Stress that doesn’t pay: The commuting paradox”, The Scandinavian Journal of Economics, 110(2), 339-366.

Subramanian, S. V., Kim, D., & Kawachi, I. (2005) “Covariation in the socioeconomic determinant of self-rated health and happiness: Multivariate multilevel analysis of individuals and communities in the USA”, Journal of Epidemiology Community Health, 59, 664-669.

Veenhoven, R. (2008) “Healthy happiness: Effects of happiness on physical health and the consequences for preventive health care”, Journal of Happiness Studies, 9, 449-469. Winkleby, M.A., Jatulis, D.E., Frank, E., & Fortmann, S.P. (1992) “Socioeconomic status and health: how education, income, and occupation contribute to risk factors for cardiovascular disease”, American Journal of Public Health, 82(6), 816-820.

94 MyStats 2015 Proceedings PREDICTION OF BURSA MALAYSIA STOCK INDEX USING AUTOREGRESSIVE INTEGRATED MOVING AVERAGE AND ARTIFICIAL NEURAL NETWORK Angie Tan Sze Yiing, Chan Kok Thim1

Abstract Investors and financial analysts would be able to adopt the best technique to forecast accurately The FTSE Bursa Malaysia Kuala Lumpur and gain insights to the Malaysian stock market. Composite Index (KLCI) comprises the 30 largest The knowledge and skill gained in performing listed companies is often used as the benchmark forecasts using the right technique would for the overall market performance of Malaysian definitely give the users an added advantage, stocks. Despite claims that the stock market is as they would be able to get more accurate in fact an efficient market that follows a random forecasts, thus, making better decisions in terms walk process and therefore, is not predictable, of risk and investment management. there are evidences from previous empirical studies that argued that the stock market can 1. Introduction be predictable. This study basically examines the ability of time series analysis and artificial Trading in the stock market indices has intelligence system to predict the movement of become remarkably popular in the major stock prices. We use daily historical data from financial markets around the world (Patel and 3 January 2012 to 31 March 2014 as the base, Marwala, 2006). Lawrence (1997) states that whereas the daily forecasts will be generated the basic motivation in predicting stock market for the period starting from 1 April 2014 to 31 prices is undoubtedly, financial gain. Hence, March 2015 using Eviews 7 and MATLAB 8.5 investors have been trying to conquer multiple with Neural Network Toolbox version 8.3. The approaches to forecast the stock market prices predictability of the stock market is conducted and its future trends so as to obtain profit from using two different approaches of stochastic the stock market (Lee, 2010). Still, due to the time series analysis and artificial intelligence noisy, non-stationary and dynamic nature of system, namely the Autoregressive Integrated stock prices, it is undeniably challenging to Moving Average (ARIMA) and Artificial Neural forecast stock market prices accurately (Wang, Network (ANN) methods, respectively. The Wang, Zhang and Guo, 2011). findings revealed that the ANN (5-6-1) model The predictability of the stock market has been outperformed the ARIMA (1,1,0) model with a long debate among the market participants the lowest error recorded. Thus, it is concluded since the earlier years (Lee, 2010). A group of that the ANN model is indeed a more superior researchers emerged and claimed that the stock forecasting model as compared to the ARIMA market is an efficient market and that it follows model in forecasting the stock price index. a random walk process, thereby indicating that

1, 2 Faculty of Management, Multimedia University

MyStats 2015 Proceedings 95 the stock market movement is unpredictable. information and stock prices adjust to the This group that favors the random walk process new information immediately (Fama, 1970). includes Malkiel (1973), Fama (1965) and Kendall This instantaneous adjustment of stock prices and Hill (1953). However, there is another group to new information is usually unpredictable, of researchers that opposes to the random thereby leading the change in the stock prices walk hypothesis thereby arguing that the to be absolutely random (Malkiel, 2003). Hence, stock market is indeed predictable (Moreno it is concluded that the random walk theory and Olmeda, 2007), as can be seen in studies is certainly related to the EMH. Fama (1970) conducted by Keim and Stambaugh (1986), practically classified the market efficiency into Pesaran and Timmerman (1995) and Lewellen three types, namely the weak form EMH, semi- (2004). strong form EMH and strong form EMH, based on the different sets of information that includes Many studies in forecasting stock prices historical prices, public information and private have been conducted over the years (Wang, information respectively. It is concluded in the Wang, Zhang and Guo, 2012). Among the studies of Nassir, Ariff and Mohamad (1993) and most prominent forecasting techniques are Lim, Liew and Wong (2004) that Bursa Malaysia the statistical time series models such as is essentially a stock market with weak-form Autoregressive integrated moving average efficiency. (ARIMA), and artificial intelligence (AI) models The ARIMA model, or better known as Box- including the artificial neural networks (ANN), Jenkins methodology, is a popular approach fuzzy logic and expert systems. that is widely used in analysis and forecasting, McMillan (2001) has pointed out that the especially in predicting time series models, as it forecast of stock market returns is deduced is deemed to be the most efficient technique for to be absolutely possible by using a range of forecasting in the area of social science (Adebiyi, financial and macroeconomic variables. This Adewumi and Ayo, 2014). Mathew, Sola, statement is supported by several empirical Oladiran and Amos (2013) used ARIMA model evidences in the studies done by previous to predict the stock prices of Nokia and Zenith researchers such as Keim and Stambaugh stock index and Nigerian Breweries Plc stock (1986), Fama and French (1988), Pesaran and prices respectively. The ARIMA methodology Timmerman (1995), Kim, Shamsuddin and Lim has also been proved worthy in forecasting (2011) and Lewellen (2004). In fact, there are economic and social variables such as essentially two important hypotheses that are unemployment rate, interest rates, rate of failure closely related to the possibility of forecasting, and so on, as shown by Dobre and Alexandru namely the random walk hypothesis and the (2008), Ho and Xie (1998) and Hassan (2014). efficient market hypothesis (EMH). Similarly, the ANN model has been a popular Introduced by Malkiel (1973), the random walk forecasting technique that is frequently theory states that the future values or directions analyzed and applied in time series forecasting of the stock prices cannot be predicted based in the recent years (Zhang, 2003). The ANN on the past behavior because the changes in approach has been proven to be capable of stock prices are independent of each other. modeling complex non-linear problems in On the other hand, the EMH states that an which no prior assumption of the relationship efficient market incorporates all freely available is needed (BuHamra, Smaoui and Gabr, 2003).

96 MyStats 2015 Proceedings The main reason for its popularity is for the fact as depicted in Table 2.1 that ANN is able to learn patterns from data and deduce appropriate solutions for it (Adebiyi Table 2.1: Performance measures of accuracy of et al., 2014). In essence, the stock market all the selected models. values are best modeled with the application of expert systems with ANN given that this ARIMA (1,1,0) approach does not have standard formulas MSE 83.4905 and enables the changes of the market to be RMSE 9.1373 easily adapted (Guresen, Kayakutlu, and Daim, MAE 6.6522 2011). For instance, Kuan and Liu (1995), Jiang MAPE 0.3675 (2003), Devadoss and Ligori (2013), Wang et al. (2011) and Yildirim, Ozsahin and Akyuz (2011) 2.2 ANN Model demonstrated the application of ANN models in forecasting stock prices. The data series of KLCI closing price are normalized between the range of -1 to 1 2. Results before being trained, validated and tested.

2.1 ARIMA Model 2.2.1 ANN Models Without Iterations

Since the ADF unit root test gives a test Table 2.2 below presents the MSE recorded for statistic of -2.05 5 which is larger than the critical each of the neural network experiment that value at 5% level of significance at -2.865 and is done without iteration at 1,000, 2,000 and the correlogram plotted on the KLCI closing 3,000 epochs respectively. From the table, it price, the time series data is proven to be non- is observed that the neural network 5-6-1 (5 stationary at level. Hence, it is differenced once input neurons, 6 hidden neurons and 1 output at first level to obtain stationarity. The statistical neuron) is selected as the predictive model with results of different ARIMA parameters for KLCI the lowest MSE at all epochs as compared to the that is obtained from the estimation output other models. generated using ordinary least squares (OLS). Since ARIMA (1,1,0) has relatively the lowest 2.2.2 ANN Models With Iterations SIC, AIC and standard error of regression at 7.1219, 7.1062 and 8.4348 respectively, relatively A series of neural network experiments is also highest adjusted R2 of 0.99%, a significant tested with 10 iterations at 50, 100 and 200 p-value of 0.0113 that is lower than 5% epochs respectively. The idea behind this is to significance level and residual Q-stats p-value attempt to improve the performance of the with lag 36 at 0.106 that is higher than 5% model. The result of the test is shown in Table significance level indicating the residual is white 2.2. The ANN model that gives the lowest noise, it is therefore the best model in relative MSE of all models after 10 iterations is the to other models being tested. The performance 5-4-1 (5 input neurons, 4 hidden neurons and 1 measures of accuracy of ARIMA (1,1,0) is the output neuron) neural network. evaluated based on MSE, RMSE, MAE and MAPE

MyStats 2015 Proceedings 97 Table 2.2: Performance measures of accuracy of KLCI close index. Thus, the forecasts produced all the selected models. by the ANN (5-6-1) model are technically nearly the same values as the actual closing price index ANN (5-6-1) ANN (5-4-1) of the KLCI, whereas the results of the ARIMA MSE 34.7891 37.1658 (1,1,0) model is deviated from the actual values RMSE 5.8982 6.0964 with with an observable difference. 3. Conclusion MAE 3.2906 3.3457 MAPE 0.1759 0.1784 This entire study focuses on the study of the predictability of two forecasting approaches namely the ARIMA and ANN model in predicting Since the ANN (5-6-1) model generally has a the daily closing price of the FTSE Bursa Malaysia lower value of MSE, RMSE, MAE and MAPE as KLCI over the period from 3 January 2012 to shown in Table 2.2, we can conclude that the 31 March 2015. The outputs of these values ANN (5-6-1) model is a better predictive model provided an interesting account of the closeness between the actual and the forecasted values. that produces a more accurate daily forecast Nonetheless, the accuracy of the forecast than the ANN (5-4-1) model. results can be further improved through the From the Fig 2.1 below, it is observed that both enhancement of the forecasting tools such as formation of a hybrid model combining either the ARIMA and ANN models deviated from the the ARIMA and ANN models, or even other actual KLCI close index with the ANN model only approaches including expert systems or fuzzy deviates in the beginning of the forecast period. logic technique. Overall, both the ARIMA and However, towards the end of the forecast ANN approaches are indeed good predictive models and are capable of predicting the period, the ANN plot is hovering or directly Malaysian stock market; especially in the short above the actual KLCI close whereas there is run, thus can be effectively engaged in risk and a deviation of the ARIMA plot from the actual portfolio management.

Figure 2.1: Graphical representation of the actual stock price index of KLCI against the forecasted stock index using ARIMA (1,1,0) and ANN (5-6-1) from 1

98 MyStats 2015 Proceedings References

Adebiyi, A. A., Adewumi, A. O., & Ayo, C. K. (2014). Comparison of ARIMA and Artificial Neural Networks Models for Stock Price Prediction. Journal of Applied Mathematics, 1-7.

BuHamra, S., Smaoui, N., & Gabr, M. (2003). The Box-Jenkins Analysis and Neural Networks:Prediction and Time Series Modelling. Applied Mathematical Modelling, 27, 805-815.

Devadoss, A. V., & Ligori, T. A. (2013). Forecasting of Stock Prices Using Multi Layer Perceptron. International Journal of Computing Algorithm, 2, 440-449.

Dobre, I., & Alexandru, A. A. (2008). Modelling Unemployment Rate Using Box- Jenkins Procedure. Journal of Applied Quantitative Methods, 3 (2), 156-166.

Fama, E. F. (1965). Random Walks in Stock Market Prices. Financial Analysts Journal, 51 (1), 75-80. Fama, E. F. (1970). Efficient Capital Markets: A Review of Theory and Empirical Work. The Journal of Finance, 25 (2), 383-417.

Fama, E. F., & French, K. R. (1988). Permanent and Temporary Components of Stock Prices. The Journal of Political Economy, 96 (2), 246-273.

Guresen, E., Kayakutlu, G., & Daim, T. U. (2011). Using Artificial Neural Network Models in Stock Market Index Prediction. Expert Systems with Applications, 38 (8), 10389-10397.

Hassan, J. (2014). ARIMA and Regression Models for Prediction of Daily and Monthly Clearness Index. Renewable Energy, 68, 421-427.

Ho, S. L., & Xie, M. (1998). The Use of ARIMA Models for Reliability Forecasting and Analysis. 23rd International Conference on Computers and Industrial Engineering, 35 (1), 213-216.

Isenah, G. M., & Olubusoye, O. E. (2014). Forecasting Nigerian Stock Market Returns Using ARIMA and Artificial Neural Network Models. CBN Journal of Applied Statistics, 5 (2), 25-48.

Jiang, H. (2003). Forecasting Stock Market with Neural Networks. UMI Microform. Keim, D. B., & Stambaugh, R. F. (1986). Predicting Returns in the Stock and Bond Markets. Journal of Financial Economics, 17, 357-390.

Kendall, M. G., & Hill, A. B. (1953). The Analysis of Economic Time-Series-Part I: Prices. Journal of the Royal Statistical Society. Series A (General), 116 (1), 11-34.

Kim, J. H., Shamsuddin, A., & Lim, K.-P. (2011). Stock Return Predictability and the Adaptive Markets Hypothesis: Evidence from Century-Long U.S. Data. Journal of Empirical Finance, 18 (5), 868-879. Kuan, C.-M., & Liu, T. (1995). Forecasting Exchange Rates Using Feedforward and Recurrent Neural Networks. Journal of Applied Econometrics, 10 (4), 347-364.

Lawrence, R. (1997, December 12). Using Neural Networks to Forecast Stock Market Prices. Department of Computer Science.

Lee, A. (2010, June). A Stock Forecasting Framework for Building and Evaluating Stock Forecasting Models. Thesis of Faculty of Graduate Studies and Research.

Lewellen, J. (2004). Predicting Returns with Financial Ratios. Journal of Financial Economics, 74 (2), 209-235.

MyStats 2015 Proceedings 99 Lim, K.-P., Liew, V. K.-S., & Wong, H.-T. (2004). Weak-Form Efficient Market Hypothesis, Behavioral Finance and Episodic

Malkiel, B. G. (1973). A Random Walk Down Wall Street. New York, United States of America: W. W. Norton & Company, Inc.

Malkiel, B. G. (2003). The Efficient Market Hypothesis and Its Critics. Journal of Economic Perspectives, 17 (1), 59-82.

Mathew, O., Sola, A. F., Oladiran, B. H., & Amos, A. A. (2013). Prediction of Stock Price Using Autoregressive Integrated Moving Average Filter (ARIMA (P,D,Q)). Global Journal of Science Frontier Research Mathematics and Decision Sciences, 13 (8), 79-88.

McMillan, D. G. (2001). Nonlinear Predictability of Stock Market Returns: Evidence from Nonparametric and Threshold Models. International Review of Economics and Finance, 10 (4), 353-368.

Moreno, D., & Olmeda, I. (2007). Is the Predictability of Emerging and Developed Stock Markets Really Exploitable? European Journal of Operational Research, 182 (1), 436-454.

Napitupulu, T. A., & Wijaya, Y. B. (2013). Prediction of Stock Price Using Artificial Neural Network: A Case of Indonesia. Journal of Theoretical and Applied Information Technology, 54 (1), 104-109.

Nassir, A. M., Ariff, M., & Mohamad, S. (1993). Weak-Form Efficiency of the Kuala Lumpur Stock Exchange: An Application of Unit Root Analysis. Pertanika Journal of Social Sciences and Humanities, 1 (1), 57-62.

Patel, P. B., & Marwala, T. (2006). Forecasting Closing Price Indices Using Neural Networks. IEEE Conference on Systems, Man, and Cybernetics, 2351-2356.

Pesaran, M. H., & Timmermann, A. (1995). Predictability of Stock Returns: Robustness and Economic Significance. The Journal of Finance, 50 (4), 1201-1228.

Wang, J.-Z., Wang, J.-J., Zhang, Z.-G., & Guo, S.-P. (2011). Forecasting Stock Indices with Back Propagation Neural Network. Expert Systems with Applications, 38 (11), 14346-14355.

Zhang, G. P. (2003). Time Series Forecasting Using a Hybrid ARIMA and Neural Network Model. Neurocomputing, 50, 159-175.

Yildirim, I., Ozsahin, S., & Akyuz, K. C. (2011). Prediction of the Financial Return of the Paper Sector with Artificial Neural Networks. BioResources, 6 (4), 4076-4091.

100 MyStats 2015 Proceedings THE CYCLICAL EXTRACTION (CE) AND THE CAUSALITY TEST (CT) IN BUSINESS CYCLE ANALYSES: DO THEY COMPLEMENT OR CONTRADICT TO ONE ANOTHER? * Abd Latib Talib 1

Abstract Board, 2000). It measures the tendency of an indicator to exhibit upswings and downswings In business cycle analysis, the CE and the CT in accordance with past business cycles (OECD, models are used during the process to select 2005). As comparison, the CE model is relatively the indicators. However, the results from both time consuming compared with the CT model. models sometime were inconsistent. The aims This is because researchers need to decompose of this paper is to examine which model as the the time series step by step to eliminate its best for business cycle analysis. The IPI is used as seasonal, trend and cyclical components by a benchmark indicator while the money supply using different types of statistical software and M1, M2 and M3 as the tested indicators. Overall methodologies. findings suggest that the outcome of the CE and CT models were very different especially In analyzing the “cyclical conformity” of time in classifying the indicators as a “leading”, series, the CE model or sometime also called as “coincident” or “lagging” series. The findings also growth cycle (GC) is the appropriate one. The suggest that the CE and CT model can be used reason is the turning points of the series can simultaneously in business cycle analysis. The CT be closely compared with the turning points helps to identify the relationship of each of time of general business cycles, thus, it is easier to series while the CT provide extra information on classify whether the series under study is a “conformity” criterion as well as the magnitude “leading”, “coincident” or “lagging”. However, of predictive measure. the weakness of CE model is that the decision made only based on judgement without any Introduction statistical test as compared with the CT model which adopts the statistical test in the posses of The cyclical extraction (CE) and the causality test classifying the series under study. However, the (CT) were among the models used in business CT model neglected the “cyclical conformity” cycle analysis especially for Business Cycle criteria. Indicators (BCI) selection. However, both models have limitations. For example, the CT was unable Both models have strength and weaknesses. In to absorb the property of “cyclical conformity” in principle, they were supposed to support to one time series under study. Conformity is one of the another. This is because the weakness of one main criteria in BCI selection (The Conference model is offset by the strength of the other.

* The views and opinions expressed here reflect the author’s point of view and not necessarily those of Department of Statistics Malaysia. 1 Department of Statistics, Malaysia

MyStats 2015 Proceedings 101 This is because the weakness of one model is Yazis Ali Basah et. el (2007) and Eatzaz Ahmad & offset by the strength of the other model. The Aisha Malik (2009) used the ratio of credit value question is will these two models provide the by financial intermediaries of private sectors to same conclusion and support to one another. GDP and the ratio of Commercial bank asset to Or, will these two models sometime clash to one commercial bank assets plus central bank asset another which implies that the results were very as financial development indicators. different between them. Edison (2000) used M2 multiplier, the ratio The aim of this paper is to evaluate whether of domestic credit to nominal GDP, the real the CE or CT is the appropriate tools for BCI interest rate on deposits, the ratio of lending-to- selection. For this argument, this paper will deposit interest rates, excess real M1 balances, examine both models and compared the and commercial bank deposits as a financial findings between them. For this purpose, the indicator. Abd. Latib Talib & Asmaddy Harris money supply be used as for the BCI and the (2011) examined eighteen financial indicators Industrial Production Index (IPI) represents the including the money supply and they found general business cycle fluctuation. that money supply is one of the best indicator that has a potential to predict in advance of The organisation of the paper is as follows. The the Malaysia 1997/98 financial crisis. Faiz following section is the literature related to the Masnan et. Al (2013) examined the relationship tools for business cycle analyses as well as the between inflation, money supply, and economic time series used for business cycle analysis. growth in Malaysia, Singapore and Indonesia. It will follow with the methodologies used They found that in Malaysia money supply for analysis. The findings and the concluding does not Granger cause economic growth. remarks will at the end the paper. Economic growth does Granger cause inflation in Malaysia, Indonesia and Singapore. Causality 2. The Literature Review runs from economic growth to money supply only in Malaysia. Chimobi (2010) examined Financial development is one of the important the relationship and causality between trade components in determining the economic openness, financial development and growth growth. It is defined as a process that marks in Nigeria. His findings suggest that the trade improvement in quantity, quality and efficiency openness and financial development does have of financial intermediary services. This process causal impact on economic growth involves the interaction of economic activities and possibly associated with economic growth. This paper aims to examine the relationship There is no specific indicator to measure between the economic growth (industrial the financial development. However, many production index) money supply M1 M2 researchers assume that the best measurement and M3 using the CE and the CT models. of financial development is via financial The CE model or the growth cycle model is a indicators. Some use money stock usually M2 as decomposition method to extract the cyclical indicators for financial development. Mohamad component of time series from its long term

102 MyStats 2015 Proceedings trend. Its sometime also called the deviation The CE involves three main steps. The first step from trend’s model. The long term trend is is to decompose economic time series into its estimated using the Phase Average Trend (PAT)2 main components 3 and the second step is to and later has been improved for example extract the cyclical components and the final by using the filtering model such as Hodrick- step is compare turning points of the series with Prescott (HP-filter) or Christiano-Fitzgerald the general business cycle turning points. (CF-Filter) or Band Pass (BP-Filter). In comparing which is the best model, Nielsson and Gyomai and the second step is the cyclical extraction. (2011) found that the HP-filter and the CF- The third step is to establish the reference cycle filter performs better than the PAT. The choice period. between HP-filter and CF-filter depend on the objective of analysis. If the objective is to have a) Step 1: The Trend Estimation an early, clear and steady turning point signals then the HP-filter is the choice and use the CF Assuming that the time series under study has filter if the analysis are sensitive to cumulative already seasonal adjusted. Then, the remaining revisions. The CT model is to examine the in the series were the trend-cycle and irregular relationship between the series in the model. components. Trend is defined as the upward or Detail of the two model will be describe in the downward movement observed in the data over methodology section. several decades. This component represents smooth, gradual variations over long period of 3. The Methodologies time. In the CE method it is assumed that the trend-cycle components can be isolated. The As mention in the earlier part this paper that issue is how does the trend is estimated? is to evaluate the performances of the two models for BCI selection by applying the There are numbers of methods in estimating the same time series. The first method is called a trend of economic time series. In growth cycle decomposition method in which the time series (GC) analysis, the long-term trend is estimated will be decompose to its main components: using the Phase Average Trend (PAT) 4 method. seasonal, trend-cycle and irregular. The trend This method was widely used prior 1990s and cycle components are extracted using the for example by the Center for International Cyclical Extraction Method. The second method Business Cycle Research (CIBCR) as well as is called the Causality Test method in which to the Organization for Economic Cooperation examine the relationship between the series in and Development (OECD). In the recent the model. development, the the filtering model such as the Hodrick-Prescott (HP) filter or Christiano 3.1 Method 1: Cyclical Extraction Methods (CE) Fidgerald (CF) filter are used as an alternative model to estimate long-term trend of the series.

2 Trend is estimated from a centred 75-month (25- quarter) moving average. The beginning and end of the series being extrapolated. 3 Economic time series is assumed to have four main components: trend (T), cycle (C) , seasonal (S) and irregular (I) and they are also assume in a multiplicative form which is Yt = Tt x Ct x St x It 4 PAT is deviations from a centred 75-month (25- quarter) moving average and from its extrapolations at the beginning and end of the series..

MyStats 2015 Proceedings 103 Nielsson and Gyomai (2011) compared the three as recommended by Zarnowitz and Ozyildirim. methods: the PAT, the HP-filter and the CF-filter, b. Step2: The Cyclical Extraction and they found that the HP-filter and the CF- filter performs better than the PAT. The choice The cyclical components is estimated by between HP-filter and CF-filter depend on the dividing the trend component (T) which is objective of analysis. They propose researchers estimated in equation (1a) and (1b) to the to use the HP-filter if the early, clear and steady equation (1c). Thus, the remaining components turning point signals are the priority and to of the respective time series are the cyclical (C) use the CF filter if the analysis are sensitive to and true irregular components (I’) as in equation cumulative revisions. (1d).

Yt = Tt * Ct * It (1c) The main objective of the CE model in this paper is to have clear and steady turning. Thus, the remaining components of the Thus, as recommended by Nelson and Gyomai respective time series are the cyclical (C) and (2011), the HP-filter is the appropriate model to true irregular components (I’) as in equation estimate the trend and cycle components. The (1d).

HP filter decomposes time series (yt) into non- Yt = Ct * It (1d) stationary trend (gt) and a stationary residual component (ct) or cyclical as follows: The true irregular (I), by definition should not be removed in the particular time series. However,

yt = gt + ct for t = 1,...,T (1a) for the purpose of analyzing the cyclical turning points, equation 1(d) is smoothed by applying The growth component should be smooth, so HP filter with a very small lamda λ( ). Thus, the that the procedure recommended by Hodrick smoothed cyclical components is as in equation and Prescott (1997) is to minimize 1(e),

T T 2 2 + λ [(g − g ) − (g − g ) Z = C * I’ (1e) ∑ ct ∑ t t−1 t−1 t−2 ] (1b) t t t t t

where I’ is the smoothed true irregular where the parameter λ is positive. If λ → 0 , the component trend approximates the actual series yt and if λ → ∞ the trend become linear. According to c. Step3: Establish the Growth Cycle Reference Zarnowitz and Ozyildirim (2002) the estimate Period (GCRP) with larger λ is quite similar to PAT and if λ is increased by 7.5-fold will improve in the The purpose of GCRP is to check the conformity new H-P estimate of the “cyclical component”. criteria and also to establish lead-lag analyses Although the recommended for monthly time of each of time series under study. The RCP is series λ is 14,400 but for the purpose of this estimated based on the cyclical components paper the size of λwill be increased by 7.5 fold of Industrial Production Index which is derived

104 MyStats 2015 Proceedings from equation 1(a) to 1 (e). The highest and the OLS coefficients for the equations (2a) and (2b), lowest points in the cyclical component of the four different hypotheses about the relationship IPI respectively represent the “peak” and “trough” between IPI & MS can be formulated: of the series or the GCRP. i) Unidirectional Granger-causality from MS to 3.2 Method 2: Causality Test Model (CT) IPI. In this case the money supply increases the prediction of economic growth (represented by The cyclical extraction method relatively is time the IPI) but not vice versa. consuming. Researchers need to decompose n q time series under study step by step to eliminate τ ≠ Thus, ∑ j 0 and ∑ϖ j = 0 its seasonal variation, trend-cycle and irregular j=1 j=1 components by using different types of ii) Unidirectional Granger-causality from IPI to statistical software and methodologies. As an MS. In this case the economic growth increases alternative measures, researchers applied the the prediction of the money supply but not vice causality test model such as the Unrestricted versa. Vector Autoregressive (VAR) model, Restricted VAR or Vector Error Correction Model (VECM) n q τ = Thus, ∑ j 0 and ∑ϖ j ≠ 0 or Toda-Yamamoto Model for business cycle j=1 j=1 indicator selection 5. iii) Bidirectional (or feedback) causality. In this This paper will apply simple granger causality n q test of two variables and their lags which is ϖ ≠ 0 case ∑τ j ≠ 0 and ∑ j so the j=1 called the CT method. The reason of choosing j=1 this method is because of to compare the economic growth increases the prediction of findings with the results of CE model. money supply and vice versa.

Subjected to the stationary test results 5 which iv) Independence between IPI & MS. In this case model to be applied whether the VAR or the there is no Granger causality in any direction, VECM, the general causality model between the IPI and MS is as follows’ n q τ = thus ∑ j 0 and ∑ϖ j = 0 j=1 j=1 (2a) 4. Findings

(2b) 4.1 Findings from CE Method a) The Growth Cycle Reference Period MS comprises of three sets of time series as explained in section 2. Based on the estimated Chart 1 below represents the dates of cyclical

4 The choice of the appropriate model to be applied is depend on the stationary test of time series. 5 The ADF and AIC is applied in this paper

MyStats 2015 Proceedings 105 turning points of the IPI which is estimated

Chart 1: Growth Cycle of the Industrial Production Index Jan 90- Dec 2013

Table 1: Growth Cycle Reference Period Jan 1990 to Dec 2013

from the equation 1(a) to 1(e). For the purpose eight months as in table 1. It was also observed of analysis, the cyclical “peak” and “though” of that in the recent year the duration of full-cycle the IPI later called as the growth cycle reference become much shorter as compared with in the period (GCRP). 1990s. The significant dropped were for the It is observed that there were seven cyclical expansion period which was from 46 months in down-turns from January 1990 to December 1990s to only 16 months and nine months 2013. The full cycle (from peak to peak or from respectively for fifth and sixth cycle lower than trough to trough) was estimated about thirty- the average expansion (24 months).

106 MyStats 2015 Proceedings b) Lead-lag Tables Analysis

Table 2: Lead and Lag Table of Money Supply M1, M2 and M3, Jan 1990-Dec 2013

Table 2 is the lead-lag analysis of turning points however, indicated that it is a lagging series. for money supply M1, M2 and M3 compared Based on these analyses, M1 is recommended as with the turning points of GCRP. The dates for a potential BCI because it fulfils the “conformity” GCRP were taken from Chart 1 while the turning criteria and also the “leading” criteria. point dates of M1, M2 and M3 respectively were taken from Chart 2, Chart3 and Chart 4 as in 4.2 Findings of CT Model appendix 1. In the perspective of “conformity”, the M3 is better than M1 or M2. The M3 a) The Stationary Test Results “conform” with all turning points of GCRP while M1 represents 91.7 per cent and M2 only 66.7 The results of stationary test showed that all per cent. In term of predictive measure, the M1 series were stationary at the first difference is better than M2 or M3 since M1 lead the overall (Table 3). The findings recommend us to apply turning points of GCRP by three months the Johansen cointegration test to examine the compared with only two months for M2. The M3, relationship between the IPI and money supply

MyStats 2015 Proceedings 107 Table 3: The Stationary Test Result

No Series Name Level First Difference Constant Constant and Constant Constant and Trend Trend 1 Industrial Production 0.7416 0.4275 0.0000*** (2) 0.0000 *** (2) Index 2 Money Supply M1 1.0000 1.000 0.1794 0.0032** (12)

3 Money Supply M2 1.0000 0.9988 0.0570 *(7) 0.0013 ***(9) 4 Money Supply M3 1.0000 1.0000 0.0385**(7) 0.0000*** (3)

Note: ***,** and * is the significant level at 1%, 5% and 10% respectively. Number in bracket is the optimal lag based on Akaike Info Critirion (AIC).

b) The CT Findings

Table 4: The Causality Test Results of the IPI,M1. M2 and M3

Base on Johansen cointegration test it can suggests the money supply-M1 is a “coincident be concluded that the IPI has a long run series” while the CE model recommends as a relationship with money supply M1, M2 and M3 “leading series” which is contradicting between (table 4). However, the causality test showed them. The conformity of turning points of the that the IPI has no relationship with M2 and M3 M1 is 91.7 per cent which also suggested the M1 in the short run. These finding suggests that the as the best potential component of BCI. M1 and IPI has a bi-directional which imply the M1 is a candidate for BCI as a “coincident series” which is contradict with the findings using the The CE model recommends M2 as a leading. CE model. However, the conformity criterion indicates this indicator was not recommended to be selected 4.3 Comparison the findings of the CE and CT as a candidate for BCI. M2 only conforms 66.7 percents of GCRP. The CT model also rejects The findings of the CE and the CT models as M2 as a potential candidate for BCI. The CT describe in section 4.1 and 4.2 are illustrated in model indicates that there is no causality exists table 5. To simplify the findings, the CT model between IPI and M2.

108 MyStats 2015 Proceedings Table 5: Summary of the CT and CE Findings for IPI, M1, M2

The CE model recommends M3 as a lagging the money supply M1, M2 and M3 as the tested series. However, the CT model was not indicators. recommend M3 to be selected as a candidate for BCI. The CT model indicates that there is no Overall findings suggest that the outcome of the causality exists between IPI and M3. CE and CT models were very different especially in classifying the indicators as a “leading”, The above findings propose that CE model “coincident” or “lagging” series. The CE model is better than the CT model. However, both recommended M1 as a “leading” while the CT models can be used simultaneously especially suggests M1 as a “coincident” series. For M2, in the case of to examine a bulk of time series. the CE recommended the series as a “leading”, As mention earlier that the CE models is time however, the conformity criterion does not consuming since the model apply various suggest M2 be selected. The CT model also statistical packages decompose time series rejects the M2 as the result indicates that there step by step. If the CT model is applied prior to is no causality exists between IPI and M2. For the the CE models some of the “unnecessary” time M3, the CE model, recommends as a “lagging” series can be neglected. Thus, the CE model only series. However, the CT indicates that there is no examined the time series that has “selected” by causality exists between IPI and M3. the CT model. The findings also suggest that the CE and CT Conclusion model can be used simultaneously in business cycle analysis. The CT helps to identify the The main objective of this paper is to evaluate relationship of each of time series while the the outcome of CE and CT models especially in CT provide extra information on “conformity” determining the appropriate indicators for BCI. criterion as well as the magnitude of predictive The IPI is used as a benchmark indicator while measure.

MyStats 2015 Proceedings 109 Appendix 1

Chart 2: Growth Cycle of Monet Supply (M1), Jan 90- Dec 2013

Shaded areas are the growth cycle recessions

Chart 3: Growth Cycle of Monet Supply (M2), Jan 90- Dec 2013

Shaded areas are the growth cycle recessions

Chart 4: Growth Cycle of Monet Supply (M3), Jan 90- Dec 2013

Shaded areas are the growth cycle recessions

110 MyStats 2015 Proceedings References

Abd. Latib Talib and Asmaddy Harris. 2011. Cyclical Development of Malaysia’s Financial Time Series Data, The 4th. Islamic Economic System Conference 2011, Pan Pacific KLIA, Malaysia.

Chimobi O.P. 2010. The causal Relationship among Financial Development, Trade Openness and Economic Growth in Nigeria, International Journal of Economics and Finance Vol. 2, No. 2.

Eatzaz Ahmad and Aisha Malik. 2009. Financial Sector Development and Economic Growth: An Emperical Analysis of Developing Countries, Journal of Economic Cooperation and Development, 30, 1.

Edison H. J. 2000. Do Indicators of Financial Crises Work? An Evaluation of an Early Warning System. International Finance Discussion Papers Number 675, Board of Governors of the Federal Reserve System

Faiz Misnan, Mohd Shahidan Shaari and Nor Ermawati Hussain. 2013. Relationship among Money Supply, Economic Growth and Inflation: Empirical Evidence from Three Southeast Asian Countries, International Journal of Information, Business and Management, Vol. 5, No.3.

Mohamad Yazis Ali Basah, Mazlynda Md Yusuf and Hisham Sabri. 2007. Financial Development and Economic Growth: Evidence from Malaysia, The Journal of Muamalat and Islamic Finance Research, Vol. 4 No. 1.

Nilsson R and G Gyomai, 2011. Cycle Extraction: A comparison of the Phase-Average Trend method, the Hodrick-Prescott and Christiano-Fitzgerald filters. OECD Statistics Working Papers 2011/4, OECD Publishing. http://www.oecd.org/dataoecd/32/13/41520591.pdf

OECD, 2005. Glossary of Statistical Terms, https://stats.oecd.org/glossary/detail.asp?ID=6697

The Conference Board. 2000, Business Cycles Indicators Handbook, New York

Zarnowitz, V. and A. Ozyildirim. 2002. Time Series Decomposition and Measurement of Business Cycles, Trends and Growth Cycles NBER Working Paper No. 8736.

MyStats 2015 Proceedings 111 112 MyStats 2015 Proceedings INFLATION OF A TYPE II ERROR RATE IN THREE-ARM NON-INFERIORITY TRIALS Nor Afzalina Azmee 1

Abstract This type of clinical trial design is known as the three-arm non-inferiority trial and is termed as The aim of a non-inferiority trial is to the gold standard design. demonstrate that the new experimental treatment is not worse relative to the reference Determining a sample size prior to executing treatment by more than a pre- defined margin. a clinical trial is crucial and is given an utmost Assuming that the inclusion of a placebo arm is attention by the practitioners (see Pigeot et al. properly justified, the three-arm non-inferiority (2003), Julious (2004), Brasher and Brant (2007)). trial is termed as the gold standard design In practice, the calculation of sample size should and should be used whenever possible. This be made transparent in the protocol as this study focuses on the problem of sample size partly is the basis of granting the approval to determination in three-arm non-inferiority trials, run a clinical trial. The ethical committee will be a crucial matter that needs to be addressed concerned if the sample size is too large or too in the early stage of clinical trial. The current small. It is deemed unethical to enrol patients two-stage procedure involved in the analysis in a small trial as the desired treatment may not of three-arm non- inferiority trial presents a be demonstrated and patients may be exposed problem known as the inflation of type II error to unnecessary risk. Similarly, a sample size rate. In other words, the sample size obtained that is excessively large is frowned upon as the does not ensure enough power to reject the null results from having additional patients hypothesis when it is false. This paper illustrates the problem via simulation study and proposes are not beneficial and will slow down the an alternative solution using assurance. process of marketing an effective drug to the public. 1. Introduction The aim of a non-inferiority trial is to A required sample size can be determined demonstrate that the new experimental either by employing the frequentist method treatment is not worse than the reference or the Bayesian method, although the latter treatment by more than a pre-defined margin. approach, more often than not will be subjected Although the fundamental idea of testing non- to further debates among the members of the inferiority of a new treatment was given in the ethical committee. Nevertheless, the Bayesian early 1980s, it was not until early 2000s that approach offers a natural solution in quantifying considerable interest were shown among the the prior information about the unknown medical practitioners and academicians. Assay parameters. The approach highlighted in this sensitivity is a problem, widely recognized paper adopts the Bayesian point of view. The as unavoidable when aiming to establish non- method is known as assurance, defined as the inferiority of the new treatment with respect to unconditional probability that the trial will reference treatment (see Temple and Ellenberg successfully reject the null hypothesis. This (2000) and Pigeot et al. (2003)). For this reason, idea has been described earlier in O’Hagan and whenever and wherever possibly justified, the Stevens (2001) and O’Hagan et al. (2005), but inclusion of an extra placebo arm is seen as a not directly applicable in the setting of three- solution to solve the assay sensitivity problem. arm non-inferiority trials. Recently, Azmee et

1 Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris

MyStats 2015 Proceedings 113 al. (2013) has developed the assurance formula Note that X and n denote the sample mean in the three-arm non-inferiority trial, based on and the sample size, respectively and is the ratio of means. Although the authors have the unbiased estimator of the common but acknowledged the problem of inflation of type unknown . A rejection of σ II error rate seen in the simulation results, a solution has not been addressed. Therefore, the the null hypothesisσ in (2) will imply that the object of this paper is to illustrate a solution experimental treatment is at least in tackling the inflation of type II error rate, θ × 100 percent as good as the reference using assurance. treatment.

2. Inflation of Type II Error Rate The procedure, as noted earlier in Pigeot et al. (2003) suffers an inflation of a type II error rate

The common statistical procedure for three- as the ratio = (µE − µP )/(µR − µP ) increases. arm non-inferiority trials, described in Pigeot et This is because the necessary sample size al. (2003), is a two-stage testing, which begins derived to achieveρ the main objective (which is with establishing superiority of reference over non- inferiority) fails to maintain the type II error placebo. If only the first stage is successful, the rate at the desired level. A frequentist solution testing will proceed to the second stage, aiming provided by Pigeot et al. (2003), which to establish non-inferiority of the experimental is power adjustment for different values treatment with respect to reference treatment. of ρ can only be applied if the optimal Assuming that the outcome variables are allocation of sample size is used. This paper normally distributed with common but offers a Bayesian solution, which can be applied, unknown variances and that higher values regardless of whether a balanced or unbalanced correspond to better efficacy, the hypotheses design is adopted by the practitioners. Details statements in both stages are written as follows: are given in the next section.

(1) 3. Assurance in Three-Arm Non-Inferiority Trials

This section demonstrates the implementation (2) of assurance to find the required sample size, via simulation. As illustrated in O’Hagan et al. (2005) and Azmee et al. (2013), a Bayesian Clinical Trial Simulation (BCTS) is where µ represents the population mean, E, useful in avoiding complex integration and is R and P denote the experimental, reference applied in this work. Assume that the interest is and placebo groups respectively and θ is the to establish non-inferiority of the experimental non-inferiority margin. The value of θ is positive, treatment in a three-arm trial, then a sample size usually ranging from 0.5 to 0.8. Considering a which tackles the inflation of a type II error rate linear contrast, the null hypothesis in (2) can be can be determined by defining assurance as the rejected if the following test statistic, T is shown probability of demonstrating both superiority

to be greater than t 1−α ,df where t1−α ,df refers to of reference over placebo and non-inferiority of ( ) quantile of the central t-distribution, with experimental relative to reference. significance level α and degrees of freedom 1 - α df = nE + nR + nP − 3. To illustrate the above idea, consider the following sampling distributions:

114 MyStats 2015 Proceedings Suppose, normal priors are assigned to the unknown parameters µE, µR and µP , with the following means, mE, mR and mP and variances 2 vE, vR and vP . Suppose has a log normal prior with mean a and variance b, represented as xi. If both statistics T and U are greater than 2 σ follows ln ~ N (a, b) . Thus, assurance via BCTS t1− ,df , where = 0.025 and df = nE + nR + nP − 3, can be implemented by following the steps update α S = S + 1. below. σ α xii. Update I = I + 1. i. Define the counters;I = 0 and S = 0, where I corresponds to a number of repetition and S xiii. While I ≤ J , repeat the following steps (v) corresponds to a number of successful event. – (xii). ii. Define the number of repetition,J . For xiv. Calculate assurance, A. Given the sample example, J = 1000. size in placebo arm as nP or total sample size of iii. Define the non-inferiority margin, θ. In this cn + cR nP + nP , assurance is A = S/N. example, θ = 0.8. iv. Define the number of subjects in placebo arm, nP and the allocation of sample size in For illustration, consider plotting the assurance the ratio of cE:cR:1, where cE and cR are the curves, given the following proportions of sample size in the experimental and reference groups with respect to those specifications; m= 4.2, Rm = 3.8, mP = 3.0. in the placebo groups. In this example, the sample size allocation is made to be 5:4:1 for Suppose the uncertainty regarding these values experimental, reference and placebo groups, is represented with vE, vR and vP set to be 0.04, respectively. although it is also possible to specify unequal v. Sample E , R and P from the prior values. This arrangement reflects the belief that distributions specified. In this a new treatment is thought to be better than particular example,µ µ theµ prior distributions are the reference, with the ratio of the difference specified as: in means, = ( E − P )/(µR − µP ), possibly centred around 1.5 with some slight variation. ρ µ µ E ~ N (mE , vE ), R ~ N (mR , vR ), P ~ N (mP ,vP ) Furthermore, suppose the uncertainty regarding 2 can be represented by setting a = 0 and µvi. Sample σ2 fromµ the prior distributionµ b = 0.0625. specified, which is ln 2 ~ N (a, b) . σ Figure 1 demonstrates the assurance curves vii. Using the results σin (iv), (v) and (iv), sample based on Event 1 and Event 2. Event 1 is the followings: defined as “successfully establishing non- inferiority of the experimental treatment with respect to reference” whereas Event 2 is defined as “successfully establishing superiority viii. Using the results in (iv) and (vi), σ2 of reference over placebo and establishing can be obtained by sampling from the chi- non-inferiority of experimental treatment with square distribution, with degrees of freedom, respect to reference”. Suppose a sample size nE + nR + nP − 3. in the placebo arm is taken to be 17, which reflects a total sample size of N = 170, where patients are randomly allocated across the experimental, reference and placebo groups in a ratio of 5:4:1. Based on Figure 1, that ix. Using the results in (iii), (vii) and (viii), particular sample size will yield an assurance of calculate the T statistic, given in Equation (3). 58 percent, when considering Event 2. On the other hand, the same 58 percent assurance can x. Using the results in (iv), (vii) and (viii), already be achieved at a smaller sample size, if calculate statistic U : one considers Event 1, which is optimistically

MyStats 2015 Proceedings 115 misleading. The inflation of a type II error rate can be tackled by considering assurance as probability of

Note that when the values vE, vR, vP and b are set establishing superiority of reference against to be very small, the assurance curve (see Event placebo and establishing non-inferiority of 1 in Figure 2) will match the power curve, due to the experimental treatment (see Event 2, in the adoption of strong priors. Based on Figure Figure 2). For example, when the sample size in 2, power of 80 percent is achieved when the the placebo arm is 17, that determined sample sample size in placebo arm is set as 11, which size is able to ensure that the desired 80 percent brings a total sample size of 110, in the ratio of power is achieved. The result is in line with 5:4:1 across experimental, reference and placebo Pigeot et al. (2003), but the approach taken is groups. The sample size which is derived based different as it requires adjusting the type II error on the main objective however, is too small to rate according to the ratio ρ, which works only power a superiority trial of reference against for optimal allocation of sample size. placebo, conducted in the first stage. This is the reason why a two-stage testing in the three-arm 4. Conclusions non-inferiority trials suffer an inflation of type II error rate. To conclude, the choice of a sample size based on the assurance concept is a subjective matter. Unlike power, it is not possible to fix an assurance of say for all situations, as different priors will lead to different values of assurance. This can be seenγ clearly in the simple example provided in Section 3. However, this approach is seen attractive as uncertainty is easier to be represented using a prior distribution rather than the point estimate. In particular, this paper has demonstrated that the common practice of determining the sample size based on the main objective (i.e. non- inferiority) is not sufficient to detect the desired Figure 1 Comparison of assurance curves based on effect of non-inferiority of the experimental Event 1 and Event 2 treatment with respect to reference. Since the analysis of three-arm trial involves a two-stage testing, the sample size calculation must take into account both objectives; that is establishing superiority of reference against placebo and establishing non-inferiority of the experimental treatment.

Figure 2 Comparison of assurance curves based on Event 1 and Event 2, using strong priors.

116 MyStats 2015 Proceedings References

Azmee, N.A., Mohamed, Z. and Ahmad, A. (2013) “Determination of the required sample size with assurance for three-arm non-inferiority trials”, Jurnal Teknologi, 63, 89-93.

Brasher, P.M.A and Brant, R.F. (2007) “Sample size calculations in randomized trials: common pitfalls”, Canadian Journal of Anesthesia, 54, 103-106.

Julious, S.A. (2004) “Sample sizes for clinical trials with normal data”, Statistics in Medicine, 23, 1921-1986.

O’Hagan, A. and Stevens, J.W. (2001) “Bayesian assessment of sample size for clinical trials of cost- effectiveness”, Medical Decision Making, 21, 219-230.

O’Hagan, A., Stevens, J.W. and Campbell, M.J. (2005) “Assurance in clinical trial design”, Biometrical Journal, 48, 559-564.

Pigeot, I., Schafer, J., Rohmel, J. and Hauschke, D. (2003) “Assessing non- inferiority of a new treatment in a three-arm clinical trial including a placebo”, Statistics in Medicine, 22, 883-899.

Temple, R. and Ellenberg, S.S. (2000) “Placebo-controlled trials and active- control trials in the evaluation of new treatments”, Annals of Internal Medicine, 133, 455-463.

MyStats 2015 Proceedings 117 118 MyStats 2015 Proceedings GENERALIZED AUTOREGRESSIVE MOVING AVERAGE MODELS: AN APPLICATION TO GDP IN MALAYSIA

Thulasyammal Ramiah Pillai 1

Abstract may produce poor forecast values (Peiris et al. (2004)). Due to that, Peiris introduced a new Gross Domestic Product (GDP) per capita is class of Autoregressive Moving Average (ARMA) often used as an indicator of standard of living type models with indices called Generalized in an economy. GDP per capita observed over ARMA (GARMA) to describe data with different the years can be modelled using time series frequency components (Peiris (2003)). models. A new class of Generalized Auto regressive Moving Average (GARMA) namely Firstly, Peiris has introduced Generalised GARMA (1, 2; δ, 1) has been introduced in the Autoregressive (GAR(1)) model, and followed time series literature to reveal some hidden by the Generalised Moving Average (GMA(1)) features in time series. In this paper, GARMA model (Peiris (2003), Peiris et al. (2004)). More (1, 2; δ, 1) model and ARMA (1, 1) model are recently, the GARMA (1, 1; 1, δ) model has been fitted in the GDP growth data of Malaysia which considered (Pillai et al. (2009)). In addition, has been observed from 1955 to 2009. The Shitan and Peiris studied the behaviour of the parameter estimation methods considered process GARMA(1, 1; δ, 1) (Shitan and Peiris include the Hannan Rissanen Algorithm (2011)). The GARMA (1, 1; 1, δ) and GARMA(1, (HRA), Whittle Estimation (WE) and Maximum 1; δ, 1) models can be further gen- eralised as Likelihood Estimation (MLE). Point forecasts GARMA (1, 1; δ1, δ2) and some properties of this also have been done and the performance model have been established (Pillai et al. (2012)). of GARMA (1, 2; δ, 1) and ARMA (1, 1) and the All these models have been shown to be useful estimation methods are discussed. in modelling time series data. It is interesting to note that the GARMA model can be further 1. Introduction expanded to GARMA (1, 2; δ, 1).

Time series is a set of well-defined data items These GARMA models give a better forecast collected at successive points at uniform time compared to traditional ARMA models. This intervals (Prapanna et al. (2014)). The goal of will be supported by the modelling of the time series analysis is to predict a series that Gross Domestic Product per capita of Malaysia. contains a random component. If this random The Gross Domestic Product (GDP), the Gross component is stationary, then we can develop National Product (GNP) and the Net National powerful techniques to forecast its future values Income (NNI), all are indicators of a country’s (Brockwell and Davis (2002)). Forecasting is economic power. Nev- ertheless, in almost all important in the fields like finance, meteorology, the countries, the GDP per capita is used as a industry and so forth (Chen et al. (2014)). benchmark for measuring the nation’s economic It is known that the modelling of time series progress. GDP is the measure of the market with changing frequency components is value of all goods and services produced within important in many applications, especially in a country during a specified period. GDP per financial data. These type of time series cannot capita is the share of individual members of the be identified using the existing standard time population to the annual GDP. It is calculated series techniques. However, one may propose by dividing real or nominal GDP by the number the same classical model for all these cases. This of population per year. GDP per capita is an

1 School of Computing and Information Technology, Taylors University

MyStats 2015 Proceedings 119 indicator of the average standard of living of In order to achieve stationarity, the data set was individual members of the population. An twice-differenced at lag 1 and mean corrected increase in the GDP per capita signifies national and a plot of this is depicted in Figure 2 and it economic growth. The GDP per capita observed can be written as over years can be modelled using time series Yt = (1 − B)(1 − B)(Xt − 52.7925). models. Then the computer programs were written to The objective of this paper is to compare the model the GDP per capita of Malaysia using performance of the ARMA (1,1) model and ARMA (1, 1) and GARMA(1, 2; δ, 1) model. GARMA (1, 2; δ, 1) model besides comparing the three estimation methods. In Section 2, we 2.2 ARMA(1, 1) illustrate the applications of ARMA (1, 1) and GARMA (1, 2; δ, 1) modelling to GDP data set. The objective of this section is to illustrate the Finally, the conclusions are drawn in Section 3. modelling of the GDP data of Malaysia using ARMA(1, 1). The preliminary estimation of the 2. Application of GARMA Modelling to GDP parameters of this model has been done using data set the Hannan-Rissanen Algorithm Estimator. We fit a standard ARMA(1, 1) model for the In this section, ARMA (1, 1) and GARMA (1,2; δ, 1) differenced data and the following results were modelling are given. obtained. The Hannan-Rissanen Algorithm estimation is obtained for the ARMA(1, 1) model

2.1 Stationary Data and the fitted model is (1 − 0.2091B)Yt = (1 −

0.4938B)Zt , where Zt ~ W N (0, 884910). On the The GDP of Malaysia was obtained from the other hand, the ARMA(1, 1) fitted models are,

official website of Department of Statistics (1 −0.2923B)Yt = (1 − 0.3488B)Zt ,

Malaysia, National Accounts, consisting of yearly where Zt ~ W N (0, 66132), by the Whittle’s observations from 1955 to 2009. Figure 1 shows estimation method and (1−0.9830B)Yt =

the time series plot of the GDP of Malaysia from (1+0.0103B)Zt , where Zt ~ W N (0, 884910), by 1955 to2009 and it is quite apparent that it is a the Maximum Likelihood Estimation method. nonstationary time series. Many observed time series, however, are not stationary. In particular, Using the above fitted models, point forecasts most economic and business series exhibit time- for the GDP data set for the next six time changing levels and/or variances (Abraham and periods are shown in Table 1. The point Ledolter (1983)). forecasts obtained from MLE method gives the best answer.

Table 1: Actual and forecast values for GDP data using ARMA (1, 1) Model

Step Actual Forecast value Forecast value Forecast Value using HRA using WE value using MLE

1 18531 4446 1214 16448 Figure 1: GDP per capita of Malaysia from 1955 to 2009 2 19996 4940 1334 18238

3 21563 5320 1446 19675

4 23544 5738 1560 21217

5 26639 6270 1700 23169

6 23826 7111 1907 26222

2.3 GARMA(1, 2; δ, 1)

Figure 2: GDP per capita of Malaysia which was twice GARMA(1, 2; δ, 1) model was fitted to the GDP differenced at lag 1 and mean corrected data set that has been differenced and mean

120 MyStats 2015 Proceedings corrected. The Hannan-Rissanen Algorithm estimation is obtained for the GARMA(1,2; δ, 1) model and the fitted model is (1 −

0.9237B) 0.9237 Yt = (1 − 0.1443B + 0.9521B2)

Zt where Zt ~ W N (0, 1827225). On the other hand, the GARMA(1, 2; δ, 1) fitted model is,

(1 − 0.9920B)0.2468 Yt = (1 − 0.3845B −

0.0604B2)Zt where Zt ~ W N (0, 12943), by the Whittle’s estimation The GARMA(1,2; δ, 1) fitted model is, (1 −0.8665B)0.9982 Yt = (1 + 0.0012B

+ 0.0004B2 )Zt , where Zt ~ W N (0, 1827225), by the Maximum Likelihood Estimation method. Using the above fitted models, point forecasts for the GDP data set for the next six time periods are shown in Table 2. It can be seen from Table 2 that all the point forecasted values through HRA and MLE estimation give a closer reading to the actual values compared to WE estimation. GARMA(1, 2; δ, 1) results are closer to the true values than the traditional ARMA (1, 1) model.

Table 2: Actual and forecast values for GDP data using GARMA(1, 2; δ, 1)

Step Actual Forecast value Forecast value Forecast value Value using HRA using WE using MLE

1 18531 16538 14342 16553

2 19996 18238 15368 18264

3 21563 19738 16502 19762

4 23544 21295 17741 21320

5 26639 23214 19162 23244

6 23826 26136 20993 26181

3. Conclusion

The objective of our study was to compare the performance of ARMA (1, 1) and GARMA(1, 2; δ, 1). In addition, we have evaluated the performance of the three estimators based on HRA, WE and MLE. It appears from this study, that the MLE estimation procedure is relatively good for ARMA (1, 1). HRA and MLE estimation give a closer reading to the actual values compared to WE estimation for GARMA (1, 2; δ, 1) model. GARMA (1, 2; δ, 1) performs better than ARMA (1, 1) for all the estimation methods. We have successfully illustrated the superiority, usefulness and applicability of the GARMA (1, 2; δ, 1) model using the GDP data set.

MyStats 2015 Proceedings 121 References

[1] Prapanna, M. Labani, S. Saptarsi, G. (2014) Study of effectiveness of time series modeling (ARIMA) in forecasting stock prices, International Journal of Com- puter Science, Engineering and Applications (IJCSEA) 4(2):13-29.

[2] Brockwell, P. J., Davis, R. A. (2002) Introduction to Time Series and Forecast- ing, 2nd Edition. New York: Springer.

[3] Chen, S. Lan, X. Hu, Y. Liu, Q. Deng, Y. (2014) The time series forecasting : from the aspect of network, arXiv preprint arXiv :1403-1713.

[4] Brockwell, P. J., Davis, R. A. (1991) Time Series: Theory and Methods, New York: Springer-Verlag.

[5] Peiris, M. S. (2003) Improving the Quality of Forecasting using Generalized AR Models: An Application to Statistical Quality Control, Statistical Methods 5(2):156-171.

[6] Peiris, S., Allen, D., Thavaneswaran, A. (2004) An Introduction to General- ized Moving Average Models and Applications, Journal of Applied Statistical Science 13(3):251-267.

[7] Priestly, M. B. (1981) Spectral Analysis and Time Series, New York: Academic Press.

122 MyStats 2015 Proceedings A SURVEY ON USER’S PERCEPTIONS OF ELECTRIC VEHICLES FOR MOBILITY IN A MALAYSIAN UNIVERSITY

Siti Azirah AsmaL 1, Safiah Sidek, Sabrina Ahmad, Massila Kamalrudin , Aminah Ahmad and Mohamad Tahkim Salahudin

Abstract

In line with the increased awareness of green I. INTRODUCTION environment and efficient use of energy, there is a need to find an alternative for the use In line with the increased awareness of green of conventional vehicles for mobility within environment and efficient use of energy, there university. In this respect, a university in is a need to find an alternative for the use of Malaysia has initiated a project that investigates conventional vehicles for mobility. There has Electric Vehicles (EV) as a possible means to been a worldwide agreement that electric deliver green vehicles for mobility system vehicles can replace the use of conventional solutions. Subsequently, a survey on user’s vehicles. Many scholars, namely [1][2][3][4] have perception of the electric vehicles has been claimed that the use of electric vehicles can conducted at the university to ensure the address the environmental issues associated with success of the project development. Specifically, climate change and air quality, high dependency the survey focused on four aspects related on oil supply, growth in travel demand. It also to the usage of the electric vehicles, namely has the potential to make the transport system safety, rechargeable method, speed limitation more sustainable [5]. However, these benefits will and return of investment. The pattern of user’s not be realized if users do not want to use the perceptions based on the levels of education new technology. and employment status were also investigated. A total of 482 staff and students from the Moving towards achieving a low carbon university responded to either the online or economy, Malaysia has taken the initiative to offline questionnaire. The data were analyzed develop an infrastructure roadmap for the use statistically using open source software, i.e. of electric vehicles in Malaysia. In relation to RStudio. We found that users exert their highest this, the National Electric Mobility Blueprint has interest in electric vehicle that does stop been recently launched [6]. The blue print aims recharging during its operation in comparison to position Malaysia as the electric mobility to their interest in electric vehicle safety, speed marketplace. The first aim is to fast-track limitation and concern in return on investment. Malaysia’s transformation into a global electric Furthermore, it was found that there is a strong marketplace and the second aim is to propel relationship between the level of education Malaysia forward in both sustainable practices and both the speed limitation and concern in as well as economic development. While the return on investment. The finding of the study use of electric vehicles is well supported with provides valuable insights for early perceptions government policies, many have reported the in developing and adopting electric vehicle for slow uptake of electric vehicles by the public [1] mobility in a university. [2][3][4][5]. Sang and Bekhet [1] argued that the

1 Universiti Teknikal Malaysia Melaka

MyStats 2015 Proceedings 123 public acceptance and diffusion of the electric use electric vehicles. Referring them as the early vehicles is relatively new and unknown in adopters, they found that the most likely group Malaysia although it is considered as a solution of private electric vehicles buyers in Germany to replace the conventional transportation. are middle-aged men with technical professions living in rural or suburban multi-person Consistent with the initiatives of the Malaysian households. Further, higher socio-economic government to encourage the use of electric status allows them to purchase electric vehicles. vehicles, the researchers have initiated a project They also found that inhabitants of major cities to implement the use electric vehicles for are less likely to buy electric vehicles since they short distance mobility within the university. form a small group of car owners. The university has several faculties which are quite distance from each other. At present, Focusing on identifying the potential buyers of the university is using conventional bus electric cars, [3] administered a survey to 1000 transportation as the means of short distance US residents to understand factors influencing mobility within the university community. the potential for the market penetration of Thus, this paper aims to report a survey study Plug-in hybrid electric cars. They found that the that investigates user acceptance of the use financial and battery-related concerns remain of electric vehicles at the university. This major obstacles to widespread PHEV market survey is necessary as the acceptance of users penetration. is an important prerequisite to determine the usability and practicality of a successful To gain understanding of the key tools and implementation. strategies that might enable the successful introduction of new technologies and II. LITERATURE REVIEW innovation, Steinhilber et al. [7] explored the key barriers to electric vehicles encountered Electric vehicles are high technology innovation by two countries (the UK and Germany) where and involved high cost investment. As such, the automobile industry has been historically it is crucial to investigate its acceptance for significant. The study evaluates stakeholders’ its successful implementation. The potential opinions on relevant regulation, infrastructure of electric vehicles has been studied from its investment, R&D incentives and consumer technical, economic and environmental point incentives. Several barriers that inhibit a larger of view [4]. Further, much of the studies related market penetration were identified. to electric vehicles are contextualized within To understand the relationship between the advanced automotive industry such as the knowledge and attitudes which is related to UK, Germany and the USA. However, there have the user acceptance of electric vehicles, Tarigan been very limited studies of electric vehicles in et al. [8] conducted a survey with 1000 electric Malaysia. vehicle users in Germany. The study found that age, education duration and gender influence A study conducted by Sang and Bekhet [1] the possibility to accept electric vehicles. determines the key predictors affecting the use The study also indicated that the higher the of electric vehicles acceptance in Malaysia. An knowledge of sustainable environment, the empirical data were collected using a survey higher the attitude to support electric vehicles. questionnaire distributed to 1000 private vehicle drivers in Malaysia. The results demonstrated Wu et al. [9] investigated customers’ perceptions that electric vehicle acceptance in Malaysia that electric vehicles are more expensive than can be explained as being significantly related the conventional cars due to the higher capital to social influences, performance attributes, cost. Focusing on the total cost of ownership financial benefits, environment concerns, to evaluate the complete cost for customers demographics, infrastructure readiness and on individual vehicle classes. The study found government interventions. that comparative cost efficiency of electric Aiming to predict the poten cars increases with the consumer’s driving tial buyers of electric vehicles, Plotz et al. [2] distance. Further, they argued that total cost investigated the characteristics of people who of ownership does not reflect how consumers

124 MyStats 2015 Proceedings make their purchasing decision for electric cars. They suggested for a further investigation of the improvement of charging infrastructure and decreased battery cost.

Ziefle et al. [4] aimed to identify influencing factors on acceptance of electric vehicles. Using questionnaire survey, they found that the conventional car is perceived still much more Figure 1: Data Science Process comfortable and receives a high trustfulness in comparison to electric cars. They argues A. Sampling that age and gender differences seem to give different opinions. Female users but also aged The sampling frame of this study is among persons show a higher level of acceptance, the UTeM’s main campus community, who has which might due to their higher environmental responded to the survey conducted in both consciousness in contrast to male person and online and offline questionnaires. The targeted younger participants. Further, different domain population are students, staffs that inclusive of knowledge did not show a large influence on administration, academic and university higher the level of acceptance. management. The population covers both gender almost evenly, age ranges from 17 to Most of the studies related to user acceptance more than 60 years old with different education tend to focus on users of electric vehicles in the levels. established electric vehicles such as the UK, the The respondents are in total of 482 staffs and USA and Germany. There has been only one students. They have provided different views study found to investigate the acceptance of regarding the needs of having electric vehicles users of electric vehicles in Malaysia. Perhaps as transportation in the campus. It is found that this is due to the relatively new introduction of the purpose and the frequency of commuting in electric vehicles in Malaysia. Further, most of the campus are varied based on the employment these studies tend to focus on the use of electric status. car for the purchase of the users. So far, studies of the use of electric vehicles for short distance B. Data pre-processing mobility within a university compound are non- existence. However, the studies discussed above The received data through online and offline are can be used as reference to investigate the users recognise as raw data as the data is still fresh and acceptance of electric vehicles in a university. not being analysed. Therefore, the data pre- Considering the high cost investment in the processing plays important role to process and adoption of electric vehicles, it is necessary to recode the received data for better data analysis. identify user acceptance of electric vehicles It also involves a complex data clean process to ensure that the project is successfully to deal the missing data and answers from the implemented. open-ended questions.

III. THE RESEARCH METHOD C. The Analysis We have conducted a survey to investigate the users’ perceptions towards the usage of electric The analysis were divided into two sections. vehicle in university. To do this we Universiti The first section is an exploratory analysis Teknikal Malaysia Melaka (UTeM) has been that captured the respondents’ demographic choosen as a sample study. In order to derive information and the second section captured the result of the survey, we have adopted the respondents’ preferences and the respondents’ data science process [10] as shown in Figure 1: perceptions.

MyStats 2015 Proceedings 125 1) Respondents’ Demographic Information approximately equal proportion in respondents’ gender. The majority of respondents are The respondents’ demographic information student with 72.5% and staff with only 27.5%. focuses on determining the range of respondents through three factors. The first Table I. Demographic profile of respondents factor is based on age followed by gender to understand if male or female has differing Percentage preferences to commute by EV in the campus. The third factor is employment status to Age recognize if the employment status influences 17-34 82.3 the willingness of the campus community to 35-52 15.2 commute by EV. 53 + 2.5 2) Respondents’ Preferences and Perceptions Gender The respondents’ preferences information is Female 48.2 captured mainly on the campus community Male 51.8 willingness to commute by EV if the facility is available in the campus. The follow up questions were to further understand the preference type Employment Status of EV tha they will ride, the destinations to go Staffs 27.5 and the preferred regularity of the EV traveling between destination points in the campus. In Students 72.5 addition, user’s perception other questions were also asked; if the respondents think that the EV From this demographic information in Table 1. is safer compared to the normal cars, if recharge there are two demographic variables have been and speed are the issues and if EV is good for identified as the condition factors i.e. Gender return on investment. and Employment Status. The condition factors are used to investigate a relationship with the D. Data Visualization user perception variables on EVs technology.

In this phase, visualization from the analysis B. Respondents’ Preference was used for describing data in order to display information and gain the knowledge. There The analysis recodes the EV Destination are many various graphic tools for displaying according to the number of places. Although information. This study uses the series of graphic respondents were allowed to suggest the tools from ggplot2 package in RStudio to location, the preference was EVs should be able generate versatile graphs to show the way data to travel more than one destination point as is distributed, the frequency of the variables shown in Table II. Therefore, it can be concluded and possible relationship between variables that most of the respondents would like the EV that could navigate more than one places in the IV. RESULTS university area.

This section presents the results of the Table II. EV Destination point number(s) conducted analysis. EV Destination Percentage A. Demographic 1 10.2 >1 89.8 Table I shows the demographic variable of the respondent surveyed. Most of them are aged Meanwhile Figure 2 shows that nearly half between 17-34 with 82.3% followed by aged of respondents choose and suggest that EV between 35-52 (15.2%) and above aged 53 should be travelled back in the same point in years (12.5%). This demographic also presents less than 1 hour.

126 MyStats 2015 Proceedings C. Chi Square

This study also investigates the influence of the observed factors i.e Gender and Employment Status on the user’s perception variables. Table III shows the relationship of user perception with gender and employment status

Table III. Chi Square Test

Gender Employment Status Figure 2. Travel Time Preference Safer Chi^2 7.52508 6.96907 Next, Figure 3 illustrates that the respondents’ d.f. 2 2 preference type of electrical vehicle in the p 0.02322 0.03067 university. It shows that most of the respondents Recharge Chi^2 4.47585 2.73247 prefer to use tram with 61.9%, followed by buggy with 35.9% and minority of 2.2% with d.f. 1 1 other type of EVs. p 0.034377 0.09832 Speed Chi^2 1.62209 4.06950 Limit d.f. 1 1 p 0.000651 0.04366 ROI Chi^2 10.88549 0.69863 d.f. 2 2

The results in Table III shows the Pearson Chi-Square which indicates that there is no Figure 3. Preferred type of EV significant relationship between employment status with the preference if the EV don’t ever Figure 4 reveals the frequencies of respondents stop for recharging and the perception if the on the observed user’s perception on some electric cars is a good return on investment. characteristics of an EV such as safety, recharge As shown in Tables III, the result of Pearson capability, speed limitation and return on Chi-Square indicates that there is a significant investment. relationship between employment status with the perception whether the EV are safer than the normal cars 2(2)=6.9691, p<0.05 and the perception whether the speed is the major limitation in EV χ2 (1)=4.0695, p<0.05. Moreover there is a significant relationship between gender with theχ perception whether the EV are safer than the normal cars 2 (2)=7.5251, p<0.05, the preference if the EV won’t ever stop for recharging 2 (1)=4.4759, p<0.05,χ the perception whether the speed is the major limitation in EV 2 (1)=1.6621,χ p<0.05 and the perception whether the EV is a good return on investment Figure 4. User’s Perception 2 (2)=10.8855,χ p<0.05.

χ

MyStats 2015 Proceedings 127 acknowledge that the EVs will ever stop for recharging, but it not significant. When respondents were asked about the speed as major limitation of the EVs, female university members significantly believe that speed is the major limitation in EV compare to male members. Again, there is significant proposition that both female and male university members do not know that electrical cars are a good return on investment.

Figure 5. User’s Perception

Figure 5 and 6 show the mosaic plot which is a graphic tool in RStudio to show the data distribution based on the relationship between variables. Interestingly, Figure 4 reveals that majority of the staffs feel that the EVs are safer than the normal cars compare to students. However, the result shows that university’s staffs significantly not aware about the safety of EV compare to normal cars. Figure 5 also reveals for perception whether the speed is the major limitation. Most of the students believe that the speed is the major limitation in EV. Besides, this mosaic plot also shows that there is a significant same proportion that staffs believe and doubt the speed as a major limitation in EV.

Comparing between gender and the user’s perception, Figure 6 shows a difference analysis emerges. It has shown that gender somehow show significant proportion on some expectations for each observed perception variable. From Figure 6, it shows that there is significant proportion that both female and male respondents do not know whether the EVs are safer than the normal car. Even though, most of both female and male respondents

128 MyStats 2015 Proceedings Figure 6. User’s Perception

V. CONCLUSION

As the implementation of EV is complex and high cost, it is necessary to conduct a holistic data analysis on user preferences and perception on the technology. This study employs the process of data science to derive the information and gain knowledge on the perception of university members towards the implementation of EV within the university. The findings from data analysis lead to the preferences of the EV types in the university. Moreover, the study also confirms the preference of sequential route of EVs to accommodate the needs of university members to travel many destination nodes. In addition, the findings also recommend that there are still much efforts required to educate university members on the EV technology. The findings also provide original new knowledge through statistical analysis to help the manufacturer on the development of EVs that can be implemented and accepted by the university’s community. Thus, this initial study somehow indirectly portray the relevancy of EVs for the education industry as well as supporting the government’s aim in empowering green technology for better living. In future, we would like to study further the acceptance of EVs in other university of Malaysia and finally developed a sustainable model or guideline that could be used by the manufacturers to promote EVs in the education industry.

MyStats 2015 Proceedings 129 ACKNOWLEDGMENT

The authors would like to acknowledge Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka for contribution to this study.

REFERENCES

[1] Sang, Y-N. & Bekhet, H.A. (2015), Modelling electric vehicle usage intentions: an empirical study in Malaysia, Journal of Cleaner Production, Vol. 92, pp.75-83.

[2] Plotz, P., Schneider, U., Globisch, J. & Dutschke, E. (2014). Who will buy electric cars? Identifying early adopters in Germany, Transportation Research Part A: Policy and Practice, Vol. 67, pp. 96 – 109.

[3] Krupa, J.S., Rizzo, D.M., Eppstein, M.J., Lanute, D.B., Gaalema, D.E. Lakkaraju, K., Warrender, C.E. (2014). Analysis of consumer survey on plug-in hybrid electric vehicles, Transportation Research Part A: Policy and Practice, Vol. 64, pp 14-31.

[4] Ziefle, M., Beul-Leusmann, S., Kasugi, K., and Schwalm, M. (2014). Public perception and acceptance of electric vehicles: Explorig user’s perceived benefits and drawbacks, In Marcus, A (ed.) DUXU, Part III, LNCS 8519, pp. 628-639.

[5] Globisch, J., Schneider, U., and Dutchke, E. (2013). Acceptance of electric vehicles by commercial users in the electric mobility pilot regions in Germany, ECEE Summer Study Proceedings, pp. 973- 983.

[6] Azrin, M. (2015). National Electric Mobility Blueprint, retrieved http://transportandclimatechange. org/wp-content/uploads/2015/05/05-Transport-NAMA-Development-Malaysia-AZRIN.pdf

[7] Steinhilber, S., Wells, P. & Thankappan, S. (2013). Socio-technical inertia: understanding the barriers to electric vehicles, Energy Policy, 531-539.

[8] Egbue, O. & Long, S. (2012). Barriers to widespread adoption of electric vehicles: An analysis of consumer attitudes and perceptions, Energy policy, Vol. 48, pp. 717 -729.

[9] Wu, G., Inderbitzin, A. & Bening, C. (2015). Total cost of ownership of electric vehicles compared to conventional vehicles: A probabilistic analysis and projection across market segment, Energy policy, Vol. 80, pp. 196- 214.

[10] Schutt, R., O’Neil C., (2014). Doing Data Science, Sebastopal, CA: O’ Reilly.

130 MyStats 2015 Proceedings ACADEMIC ACHIEVEMENT ON STUDENT MOTIVATION: LATENT CLASS ANALYSIS ACROSS GENDER GROUP

Asyraf Afthanorhan 1, Zainudin Awang 2, M.A.M. Asri 3, Ahmad Nazim Aimran 4, Hidayah Razali 5

Abstract psychological and educational research (Pintrich, 2003). Basically, students’ motivation is more The study of ‘Academic Achievement on focusing on primary and secondary school level Student Motivation’ has overcome limitation in and thus the learning and teaching context previous research by including simultaneous is always prioritized (Murphy & Alexander, consideration of mediating effect, longitudinal 2000). True nature of students’ motivation is relationship and gender factor. Western very complex and need a depth investigation researchers have found that students have no to be comprehended. Currently, four theories direction in life after graduation. However, it are prominent in educational pyschological remains to be seen whether this finding may such as self-efficacy theory, attribution theory, occur among students in Malaysia since there self-worth theory and achievement goal theory is no research has been carried out regarding (Seifert, 2004). Among of these established this matter. A longitudinal study was conducted theories, attributional theory is perceived better to assess the students’ motivation at various to link the students’ motivation in Malaysia. time points. Final year students were being However, the discussion on students’ motivation targeted to answer a questionnaire that has is not emphasized by researchers in Malaysia, been approved by the expert in this particular thus we attempt to extend the existence field. This study aims are a) to identifying theory of attributional theory. To best of our the best fitting unconditional latent growth knowledge, there is no investigation research on model for student motivation across gender students’ motivation were conducted by either group and b) to develop a conditional model government or private institution in Malaysia so to test the potential direct effect of academic far. In fact, western researchers noted that the achievement on student motivation. The study on students’ motivation as an research result indicates that a) student motivation outcome is always debated. was indeed growing linearly every year across gender and b) significant direct effect of Therefore, the aim of this paper is to identify the academic achievement on intercept of student best fitting unconditional latent growth curve motivation but obtained non-significant effect model for students’ motivation across gender. on slope of studet motivation. Implications The implementation of method used is chosen and recommendations for further research are due to its advantage to estimate the growth discussed. rate of students’ motivation with the passage of time. Further, it will enable us to identify which 1.0 Introduction period should be prioritized for improvement. Student Motivation construct is not new in In doing so, Structural Equation Modeling (SEM), academic field, yet, this construct is always the second generation statistical analysis was considered as an research outcome. For selected. However, the procedure to handle instance, many western region research on this method is not as convenient as the first students’ motivation that is seem central in generation statistical analysis methods such

1,2 Faculty of Economics and Management Sciences, Universiti Sultan Zainal Abidin 3,4 Schools of Informatics and Applied Mathematics, Universiti Malaysia Terengganu 5 Department of Computer and Mathematical Science, Universiti Teknologi MARA, Seremban, Malaysia

MyStats 2015 Proceedings 131 as repeated measure of One Way Analysis of will intentionally increase the performance of Variance (ANOVA) because SEM requires a achievement. normal data with sufficient sample size and minimum three point of Longitudinal Model. Parental involvement has been defined in Hence, All requirements were first ensured in numerous ways and currently was suggested order to use SEM. Later on, the causal effect to be one of the second order construct that is between exogenous and endogenous construct manage to explain the attribution of parental were tested based on hypotheses proposed. involvement in different perspectives. For example, there are some researchers have 2. Hypotheses Development proposed specific dimension of parental involvement such as (Grolnick & Slowiaczek, Pintrich & Schunk, (2002) put forth that 1994; Keith et al., 1993; Sui-Chu & Willms, motivational theories focus on the process that 1996; and Hong & Ho, 2005; Fan & Chen, 2001). explain goal-directed activity. In educational Yet, this study interest is not on parental research, motivation are most often used involvement solely, instead, we are intended to explain students’ activity choice and to highlight the impact of students’ motivation performance in learning activities. Hence, in goverment and private institution. So far, motivation is frequently used as a measure most of the theory established centered that in education system (Roeser & Eccles, 1998). parental involvement is only tied for primary Motivation theory will enable us to identify the and secondary school students. Therefore, there individuals’ goals to pursue in achievement is a need to review the parental involvement situations (Meece, Anderman, & Anderman, for higher institution level students in order 2006). However, the study of students’ to understand the true nature of this specific motivation is may not sufficient to explain construct whether it is worth or unworthy the actual situation in education especially in construct in determining the relationship with terms of gender perspective. To date, there students’ motivation. Indeed, the parental are numerous of studies involving gender involvement may depends on the region perspective and were revealed to be important culture itself in order to influence the students’ in estimating the students’ motivation so that motivation at the higher institution level. For the accurate estimate can be dislosed (Meece, instance, the culture in Malaysia and other Glienke, & Burg, 2006). Plus, it also revealed that countries might be slightly or far too different the student motivation can be fluctuate with in term of learning activities. Hence, the the passage of time (Gottfried et al., 2001). result revealed from other regions may not Basically, academic achievement is determined be applicable in Malaysia society. Finally, it is when related to the achievement goal theory important to develop the hypotheses based on that is emerged as one of the prominent theory the review discussed earlier: of achievement motivation in many ages (Meece, Andermann, & Andermann, 2006). As Hypothesis 1: Institutional Support has a this research intended to explore the students’ positive effect on Academic Achievement motivation, then, academic achievement is Hypothesis 2: Parental Involvement has a pre-determined. It is believed that achievement positive effect on Academic Achievement is the most important factor to assess the Hypothesis 3: Academic Achievement has students’ motivation in learning activities. a positive effect on Intercept of Student To bolster the evidence, McCombs & Whisler, Motivation (1997) and Pintrich (2003) were discussed Hypothesis 4: Academic Achievement has a the similar discovery. Considering this factor, positive effect on Slope of Student Motivation it is not surprising that institutional support may be one of the influential factors. Studies 3. Measures focusing on the achievement also emphasized the importance of establishing supportive The study using questionnaire (Likert Scale educators. There is little information concerning measure) that includes items on parental the role of educators’ support on students in involvement, academic achievement mediating institutions. An increase in educators’ support construct, institutional support and students’

132 MyStats 2015 Proceedings motivation. Each construct consists of 10 items. growth curve model for students’ motivation The content of each items were validated by across gender (Male and Female) that is the expert and experienced in pyschological compatible with the first research objective. study. The minimum sample size for this study To do so, model fit should be established to is 400 sample as recommeded by Hair et al., evaluate the parameter estimates, standard error (2006). They stated that the sample size should and measures of explained variances (Antonakis be ranged between 5 to 10 times of variables et al., 2010; Hayduk et al., 2005; McIntosh, 2007; (40 items x 10 = 400 samples). The questionnaire McIntosh, Edwards and Antonakis, 2014). A were distributed to seven (7) insitutions in good fit model reflects a good models’ quality Terengganu state. Later on, the Structural for the subsequent analysis. An analysis must Equation Modeling (SEM) is performed. be executed separately for each group so that estimates and fitness index for each model Structural Equation Modeling (SEM) is one of the can be reported. Using a well-known theory of second generation statistical analysis method Maximum Likelihood Estimator (MLE), we found that has an ability to perform Confirmatory that the fitness indexes for unconditional model; Factor Analysis (CFA) and path analysis with Chisquare of normalized by degree of freedom multiple variables simultaneously (Afthanorhan (< 3.0); CFI, TLI, IFI (rangging from 0.931 to 0.976) & Ahmad, 2014). It also has the advantage to and RMSEA (rangging from 0.075 to 0.115) handle unconditional model of Latent Growth which are acceptable across gender.To bolster Curve model to replace the outdated method. the explanation of fitness, Zainudin (2015) and Subsequently, conditional model is conducted Holmes-Smith et al., (2006) contend that the once the best fitting of unconditional model researchers can choose any index to assess the based on global fitness indexes were identified. measurement model. Global fitness should be achieved in the In terms of the parameter estimates exhibited initial phase so that the estimates obtained is in Figure 1, male (0.00, 0.18, 0.34, 0.49) and trustworthy female (0.00, 0.17, 0.34, 0.49) were indeed growing linearly during the process of learning across gender that is supported to the research 4. UNCONDITIONAL MODEL question. Once unconditional model is validated, the conditional model is performed as a final Firstly, we developed an unconditional latent stage in analysis based on the research objective.

Figure 1: Unconditional Model

MyStats 2015 Proceedings 133 Conditional Model

Figure 2: Unstandardized Conditional Model

Figure 3: Standardized Conditional Model

134 MyStats 2015 Proceedings After the best fitting unconditional latent between student at Year 3 and Year 4 (furter growth curve model for student motivation is implicit explanation in discussion topic). It was identified, we developed a conditional model typically more effective than no intervention as to test the potential relationship between recommended by previous findings (Ryan, 1999; Academic Achievement, Parental Involvement Whiston, Sexton, & Lasoff, 1998). and Institutional Support on Student Motivation. Conditional model is a model that 5. DISCUSSION AND FUTURE RESEARCH has been extended from unconditional latent growth model. Therefore, the best fitting of Numerous Western studies have found that unconditional model is necessary to ensure the the programme was essential to influence the fitness level of student motivation is achieved student motivation especially after graduated and thus has a potential to approximate the and they have proposed explanation for this accurate estimate for decision making. In doing discovery. However, there are few studies so, there are three main constructs such as about whether these research finding and Academic Achievement, Parental Involvement interpretation are applicable to the Malaysia and Institution Support in which regarded as society. Thus, this study need further exogenous constructs to influence on Student investigation and exploration to study the Motivations (endogenous construct). Student Motivation issue which means Among exogenous constructs, Academic important in education sector. In order to explore Achievement acts as a mediator construct the main issue, the study used four waves follow which can explain double interpretation either up suvery data, that is, supplemented with latent in exogenous or endogenous perspective. As growth curve model in an attempt to analyze mentioned earlier, the student motivation just the growth rate of respondent motivation explain the goal-directed process that is only and conditional model to detect the potential limited to the motivation theory. Because this relationship with relevant factors. study attempt to extend the existence theory of It was found that the Student Motivation is grow attributional theory, we attempt to integrate the positively across gender which means indicates motivation factor in this situation. Turning now that their motivation is gradually increasing with to the model estimation, the path estimates of the passage of time. Means that, they would each hypothesized relationship in the causal initially have low initial scores, and later, they model and square multiple correlation (R2) would also have a greater motivation growth of dependent construct were examined as rate. In other words, students with low initial reported in AMOS 21.0. We find out that there motivation would accelerate their motivation are three significant of causal effect such as: improvement in the future. To understand the a) Parental Involvement has a positive impact nature of the low initial score, we read and on Academic Achievement(β= 0.226, CR= review a great number of journals, books, and 2.328, P= 0.020), b) Institution Support has a discussion and later on we find out that Parent positive impact on Academic Achievement (β= Education and Institution Support can be a great 0.102, CR= 2.406, P= 0.016), and c) Academic contribution for improvement. Today, there Achievement has a positive impact on Intercept are numerous programs have been served for of Motivation (β= 4.467, CR= 2.859, P= 0.004). education syllabus by government and private Meanwhile, one of non-significant effect institution. Some of them will further their occur in the relationship between Academic study into undergraduate level via Diploma Achievement and Slope of Motivation (β= programme or matriculation program. During 0.264, CR= 0.554, P= 0.580). Subsequently, it is undergraduate level, the initial score of student found that the proposed model explain a total motivation was positive where connoting that variation in endogenous construct is decreasing the students have a high determination to from 64.9% to 2.2%. Therefore, itcan be move forward. They believe if they got an excel conclude that Student Motivation in institution achievement during undergraduate program, level were worrisome since it is found that they they have an opportunity to get a better job as fail to maintain the good motivation. Hence, expected. the best solution is necessary to solve this However, their motivation to be more situation by providing more career intervention progressive in the future is not persisted. It

MyStats 2015 Proceedings 135 can be proven by the finding revealed using that implemented in conditional model is Structural Equation Modeling (SEM). At the suggested to be a mediator construct which intercept construct, the program was positive means has a potential to play as exogenous significant relationship on Student Motivation and endogenous construct at the same time. but later on this the result show non-significant An analysis of SEM with AMOS can be handled relationship on Student Motivation. By simultaneously which is more convenient than inspecting through the passage of time, we find SPSS package. out that the growth rate of Student Motivation between Year 3 and Year 4 (0.16 to 0.23) is Indeed, there are many studies investigate lower than Year 1 and Year 2 (0.00 to 0.08) and the relationship of Parental Involvement and Year 2 and Year 3 (0.00 to 0.16). As the growth Student Learning (Jeynes, 2005) but generally rate is slower in the range between Year 3 and speaking that the result revealed is not clear Year 4, one can be conclude that the Student and inconsistent. This study intends to extend Motivation was dropped in that time. the conditional model by including of Student The high initial score is maybe the influence Motivation as an outcome research so that of Parent Education and Institution Support the best finding can be identified for decision were performed better to educate the student making. In the spirit of Jeynes (2005), this article to be more determined during the first year has been examined Parental Involvement and study. That is, the involving of these constructs Academic Achievement and its current being in the conditional effect are relevant for modified to relate the Student Motivation investigation on Student Motivation. However, and Institution Support. Because Parental many researchers from previous study attribute Involvement is seem has a positive impact on the increasing of parental involvement will Academic Achievement and thus induces us to contribute a great effect on student learning form the Academic Achievement as mediator (Epstein, 1991) that is yielded hundreds of construct (Christenson, Rounds, & Gorney, 1992; results. Further, most of the previous study did Epstein, 1991; Singh et al., 1995). not notice Student Motivation as an outcome variable. Instead, they attribute the Student Additionally, the Institution Support Motivation as a predictor construct along with construct was significant impact on Academic parental involvement when related to the Achievement as disclosed in this study. With academic achievement (Jeynes, 2010; Fan & this consistent finding, Sanders (1998) and Chen, 2001; Jeynes, 2003; Hill & Taylor, 2004; Fan, DeBerard, Spielmans, & Julka (2004) also put 2001). forth the same thing on this effect. Thus, it is inevitable to connect the relationship To ensure our findings is not improper, we find between Institutional Support and Academic out that our finding is conformity with some of Achievement since it is really sensible. the previous research that is defined Student Therefore, we proposed that the conditional Motivation as an outcome construct (Gonzalez, model that including of Parental Involvement, Alyssa & Willems, 2005; Walker & Greene, 2009; Institution Support, Academic Achievement Sung & Padilla, 1998; Fan & Williams, 2010). and Student Motivation are relevant and it is So, this study is absolute relevant to employ confirm suit for Malaysia study. Parental Involvement as a predictor on Student From technical perspectives, this model has Motivation as an outcome subject. Parents several advantages over the western studies. is that one of the important individual to First, it adopts a longitudinal effect of Student shape their children to be more productive, Motivation to estimate the growth rate of educated, determined and motivated person. student every year (Year 1, Year 2, Year 3, and To be that person, parents should ensure their Year 4) in Malaysia institution. Thus, this address children activities are always monitored since the enduring issue of lack of a study to handle childhood so that their good attitude are Student Motivation issue with Longitudinal maintained. Then, the parents will motivate Growth Curve Model (LGCM). Using this their children to choose a better programme procedure, the best fitting of unconditional that is guarantee to provide a better job after model can be identified and thus sensible to graduated. Thus, the Academic Achievement further the strentgh of this model. Second, the

136 MyStats 2015 Proceedings study without LGC model would not produce a great impact on the research. Because this method is capable to let the researchers identified which period is dropped when dealing with one construct. In this case, we find out the growth rate of Student Motivation is drop between Year 3 and Year 4. Hence, the future research should be focus on this phase in order to ensure the Student Motivation maintained. Third, the proposed model are justified suit with the Malaysia culture. So, this model can be adopted in the future research for improvement when dealing with Student Motivation using LGC method. Lastly, the LGC model of Student Motivation construct can serve as a pioneer for futhering technical extensions of social sciences study in Malaysia. In sum, empirically the proposed model performs better in identify the relationship of exogenous and endogenous construct and detect which period need focused, while technically it is well-founded in a social science study.

MyStats 2015 Proceedings 137 REFERENCES

Afthanorhan, W. M. A. B. W., Ahmad, S., & Mamat, I. (2014). Pooled Confirmatory Factor Analysis (PCFA) Using Structural Equation Modeling on Volunteerism Program: A Step by Step Approach. International Journal of Asian Social Science, 4(5), 642-653.

Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly,21(6), 1086-1120.

Christenson, S. L., Rounds, T., & Gorney, D. (1992). Family factors and student achievement: An avenue to increase students' success. School Psychology Quarterly, 7(3), 178.

DeBerard, M. S., Spielmans, G., & Julka, D. (2004). Predictors of academic achievement and retention among college freshmen: A longitudinal study.College student journal, 38(1), 66-80.

Epstein, J. L. (1991). Effects on student achievement of teachers' practices of parent involvement. In Annual Meeting of the American Educational Research Association., 1984. Elsevier Science/JAI Press. Fan, W., & Williams, C. M. (2010). The effects of parental involvement on students’ academic self‐ efficacy, engagement and intrinsic motivation.Educational Psychology, 30(1), 53-74.

Fan, X., & Chen, M. (2001). Parental involvement and students' academic achievement: A meta-analysis. Educational psychology review, 13(1), 1-22.

Gonzalez-DeHass, A. R., Willems, P. P., & Holbein, M. F. D. (2005). Examining the relationship between parental involvement and student motivation. Educational psychology review, 17(2), 99-123.

Gottfried, A. E., Fleming, J. S., & Gottfried, A. W. (2001). Continuity of academic intrinsic motivation from childhood through late adolescence: A longitudinal study. Journal of Educational Psychology, 93(1), 3.

Grolnick, W. S., & Slowiaczek, M. L. (1994). Parents' involvement in children's schooling: A multidimensional conceptualization and motivational model. Child development, 65(1), 237-252. Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006). Multivariate data analysis (Vol. 6). Upper Saddle River, NJ: Pearson Prentice Hall.

Hayduk, L., Cummings, G., Boadu, K., Pazderka-Robinson, H., & Boulianne, S. (2007). Testing! testing! one, two, three–Testing the theory in structural equation models!. Personality and Individual Differences, 42(5), 841-850.

Hill, N. E., & Taylor, L. C. (2004). Parental school involvement and children's academic achievement pragmatics and issues. Current directions in psychological science, 13(4), 161-164.

Holmes-Smith, P., Coote, L., & Cunningham, E. (2006). Structural equation modeling: From the fundamentals to advanced topics. Melbourne: SREAMS.

Hong, S., & Ho, H. Z. (2005). Direct and Indirect Longitudinal Effects of Parental Involvement on Student Achievement: Second-Order Latent Growth Modeling Across Ethnic Groups. Journal of Educational Psychology, 97(1), 32.

Jeynes, W. (2010). The salience of the subtle aspects of parental involvement and encouraging that involvement: Implications for school-based programs. The Teachers College Record, 112(3).

Jeynes, W. H. (2003). A meta-analysis the effects of parental involvement on minority children’s academic achievement. Education and Urban Society, 35(2), 202-218.

138 MyStats 2015 Proceedings Jeynes, W. H. (2005). The effects of parental involvement on the academic achievement of African American youth. The Journal of Negro Education, 260-274.

Keith, T. A., & Bader, R. F. (1993). Calculation of magnetic response properties using a continuous set of gauge transformations. Chemical physics letters,210(1), 223-231.

McCombs, B. L., & Whisler, J. S. (1997). The Learner-Centered Classroom and School: Strategies for Increasing Student Motivation and Achievement. The Jossey-Bass Education Series. Jossey-Bass Inc., Publishers, 350 Sansome St., San Francisco, CA 94104.

McIntosh, C. N. (2007). Rethinking fit assessment in structural equation modelling: A commentary and elaboration on Barrett (2007). Personality and Individual Differences, 42(5), 859-867.

McIntosh, C. N., Edwards, J. R., & Antonakis, J. (2014). Reflections on partial least squares path modeling. Organizational Research Methods, 1094428114529165.

Meece, J. L., Anderman, E. M., & Anderman, L. H. (2006). Classroom goal structure, student motivation, and academic achievement. Annu. Rev. Psychol., 57, 487-503.

Meece, J. L., Glienke, B. B., & Burg, S. (2006). Gender and motivation. Journal of school psychology, 44(5), 351-373.

Murphy, P. K., & Alexander, P. A. (2000). A motivated exploration of motivation terminology. Contemporary educational psychology, 25(1), 3-53.

Pintrich, P. R. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. Journal of educational Psychology, 95(4), 667.

Pintrich, P. R., & Schunk, D. H. (2002). Motivation in education: Theory, research, and applications. Prentice Hall.

Roeser, R. W., & Eccles, J. S. (1998). Adolescents' perceptions of middle school: Relation to longitudinal changes in academic and psychological adjustment. Journal of Research on Adolescence, 8(1), 123- 158.

Ryan, N. E. (1999). Career counseling and career choice goal attainment: A meta-analytically derived model for career counseling practice (Doctoral dissertation, Loyola University of Chicago).

Sanders, M. G. (1998). The effects of school, family, and community support on the academic achievement of African American adolescents. Urban Education, 33(3), 385-409.

Seifert, T. (2004). Understanding student motivation. Educational research,46(2), 137-149.

Singh, K., Bickley, P. G., Trivette, P., & Keith, T. Z. (1995). The effects of four components of parental involvement on eighth-grade student achievement: Structural analysis of NELS-88 data. School psychology review.

Sui-Chu, E. H., & Willms, J. D. (1996). Effects of parental involvement on eighth-grade achievement. Sociology of education, 126-141.

Sung, H., & Padilla, A. M. (1998). Student motivation, parental attitudes, and involvement in the learning of Asian languages in elementary and secondary schools. The Modern Language Journal, 82(2), 205-216.

MyStats 2015 Proceedings 139 Walker, C. O., & Greene, B. A. (2009). The relations between student motivational beliefs and cognitive engagement in high school. The Journal of Educational Research, 102(6), 463-472.

Whiston, S. C., Sexton, T. L., & Lasoff, D. L. (1998). Career-intervention outcome: A replication and extension of Oliver and Spokane (1988). Journal of Counseling Psychology, 45(2), 150.

Zainudin Awang (2015). SEM Made Simple. MPWS Publisher.

140 MyStats 2015 Proceedings CHIKUNGUNYA DISEASE MAPPING IN MALAYSIA: AN ANALYSIS BASED ON SMR METHOD, POISSON-GAMMA MODEL, SIR-SI MODEL 1 AND SIR-SI MODEL 2

Nor Azah Samat 1, S. H. Mohd Imam Ma’arof 2

Abstract problem of relative risk estimation based on SMR and Poisson-gamma model. However, this model Disease mapping is a method that can be only suitable for non-rare diseases as for rare used to show the geographical distribution of diseases, the relative risk cannot be estimated disease occurrence. It presents the incidence due to the zero expected cases value. Therefore, of specified disease in areas of interest which improved method is proposed to estimate involves the usage and interpretation of the relative risk based on discrete time-space coloured or shaded maps. The focus of analysis stochastic SIR-SI model 2. Results of the analysis in disease mapping study is to estimate the shows that this new method offers improved true relative risk. Better statistical method used methodology for estimating the relative risk to estimate the relative risk will subsequently compared to the other three models. This is give better appearance of risks on maps. because this method offers a more detailed Consequently, the maps might be used by the description of the biological process, which takes authorities to identify the area that deserves into account the transmission of the disease. This closer scrutiny or more attention, as well as for method also considers the total posterior mean resources allocation. Therefore, the aim of this infective in the denominator of the relative risk paper is to compare the estimated relative equation. risks for chikungunya disease mapping using four different methods. These include 1. Introduction and Research Aim the analysis of relative risk estimation based on Standardized Morbidity Ratio (SMR), Chikungunya is a mosquito-borne viral disease Poisson-gamma model, discrete time-space in which the virus is transmitted to humans by stochastic SIR-SI model 1 and discrete time- infected female mosquitoes. These mosquitoes space stochastic SIR-SI model 2 for vector-borne are known as Aedes aegypti and Aedes infectious disease transmission. SMR is the most albopictus, which can also transmit other common statistic used in disease mapping. mosquito-borne viruses which are DEN-1, However, the use of SMR in disease mapping DEN-2, DEN-3 and DEN-4 of dengue disease. has several disadvantages. Many other However, chikungunya is rarely fatal. Since methods have been developed to overcome there is no cure for this disease, the treatment the drawbacks of the SMR, which include the is focused on relieving the symptoms, while the earliest example of Bayesian disease mapping prevention is based on mosquitos’ surveillance using Poisson-gamma model. However, and control measures. Disease map has been covariate adjustment in this model is difficult identified as an important tool to control the and there is no possibility for allowing spatial disease (see, for example, Rajabi et al., 2013; correlation between risks in adjacent areas. Nakapan et al. , 2012). The production of good Therefore, new approach in estimating relative risk map relies on the statistical models used to risk based on discrete time-space stochastic SIR- estimate the risk. SI model 1 is introduced and results of analysis shows that this new approach can overcome the Hence, the main aim of this paper is to discuss

1, 2 Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris

MyStats 2015 Proceedings 141 and compare the relative risk estimation for study is based on the stochastic SIR-SI model. chikungunya disease mapping based on four In this paper, we called it as a stochastic SIR-SI

different methods. These involve the analysis model 1. Here, the relative risk , rij, of the for relative risk estimation based on the SMR disease for i study regions and j time periods is method, Poisson-Gamma model, Stochastic equal to the posterior expected mean number

SIR-SI Model 1 and Stochastic SIR-SI Model 2, of infective case λij, and the application of these methods to observed chikungunya data from Malaysia.

2. Methodology divided by the expected number of new The most common statistics used in the study of infective disease, in which the denominator disease mapping is the Standardized Morbidity of the relative risk equation considers the total Ratio (SMR) method. The SMR equation can be observed cases. These can be written as written as,

Here, the estimation for i regions is based on ratio estimator of observed cases, Oi, divided Furthermore, an improved method of relative by the expected cases, ei, which have several risk estimation by Samat and Mohd Imam drawbacks. The most obvious drawback is that Ma’arof (2015) has been introduced. In this the SMR will be zero when there is no observed relative risk estimation method, the calculation case in certain regions. The drawback of SMR of the denominator considers the total method has led many researchers to explore posterior expected new infective cases instead other methods to estimate the relative risk of a of total observed cases. This equation can be disease (Lawson et al., 2003). This includes the written as, use of Bayesian methods.

The Poisson-Gamma model is one of the earliest examples of Bayesian mapping. In this model, the numbers of new infective, yi, are assumed Details explanation about the third and fourth to follow a Poisson distribution within a given models can be found in Samat and Percy (2012), period of time. This can be written as, and Samat and Mohd Imam Ma’arof (2015), respectively.

yi ~ Poisson ( e i θ i ) (2) 3. Application of the Relative Risk Estimation Many studies demonstrated that this Poisson- for Chikungunya Disease gamma model can overcome the problem of SMR when there is no observed case in a This section displays the results of relative risk region. However, the Poisson-Gamma model estimation based on the SMR method, Poisson- has a problem where covariate adjustment Gamma model, Stochastic SIR-SI Model 1 and in this model is difficult in which there is no Stochastic SIR-SI Model 2 using observed possibility for allowing spatial correlation chikungunya data from Malaysia. All the data between risks in adjacent areas. are analyzed using WinBUGS software, which is suitable to carry out wide variety of Bayesian Subsequent from the drawback of the Poisson- models. Firstly, the data set used in this study gamma model, Samat and Percy (2012) has is discusses. Then, the results of the relative proposed an alternative method of the relative risk estimation by using four different methods risk estimation. Their study considers the are presented in graph. Finally, the risk are transmission of the disease as well as covariate displayed on maps to show the high and low adjustment between adjacent areas. The risk areas of chikungunya disease occurrences. relative risk estimation method used in their In this study, relative risk above 1 means that

142 MyStats 2015 Proceedings the people within the region are more likely There is no zero value of relative risk estimation to contract with the disease compared to the when using the Poisson-Gamma model. Thus, people in the overall population. In contrast, it shows that this model can overcome the relative risk below 1 show that the people within drawback of SMR method especially when there the region are less likely to contract with the is no observed chikungunya case in certain disease compared to the people in the overall regions. population. While, if the relative risk close to 1, it means that there is no real differences in terms For the relative risk estimation of chikungunya of the likelihood that people become infected cases based on the stochastic SIR- SI model 1, with chikungunya virus within the state and the relative risk becomes difficult to estimate. within the whole population. This problem happens when there is no observed chikungunya case for all epidemiology 3.1 The Data Set weeks, which subsequently gives zero value of expected cases at the relative risk equation. The data applied in this study are in the form of Hence, when the denominator of relative risk counts of chikungunya cases from epidemiology equation is zero, the relative risk is difficult to week 1 to epidemiology week 52 for the year estimate. Therefore, the improved method of 2013, for all sixteen states in Malaysia which the relative risk estimation based on the SIR-SI include Perlis, Kedah, Pulau Pinang, Perak, Model 2 is introduced. Kelantan, Terengganu, Pahang, Selangor, Kuala Lumpur, Putrajaya, Negeri Sembilan, Melaka, Figure 2(c) displays the relative risk estimation of Johor, Sarawak, Labuan and Sabah. chikungunya cases based on the stochastic SIR-SI model 2. From the figure, the highest estimated relative risk is at Perak during epidemiology week 3.2 The Results 50 with 25.860. While, there is a similar pattern at the end of the year at the states of Perlis, Labuan The outcomes of relative risk estimation based and Putrajaya. This could have happen due to on the four different models in all the existence of chikungunya cases in the states 16 states of Malaysia are displayed in this with small populations compared to the whole section. population in Malaysia.

Figure 2(a) depicts the relative risk estimation The overall comparison between Figures 2(a), for chikungunya cases based on the SMR 2(b) and 2(c) give a conclusion that different method. The states of Perak, Selangor, Sabah methods used to estimate the relative risk and Sarawak show the relative risks above will subsequently give different appearance 2 for certain epidemiology weeks. From the of disease risks on map. Therefore, it is very analysis, it shows that people within the states important to identify a better statistical method have the high possibility to get infected with so that its depict the real situation. chikungunya disease compared to people in the whole population. Finally, the relative risks are displayed in maps to show the high and low risk area of chikungunya Figure 2(b) shows the relative risk estimation disease occurrences. In this study, ArcGIS for chikungunya cases based on the Poisson- software is used to create the map, and the Gamma model. From the figure, the estimated disease risk map for epidemiology week 10 is relative risks are close to 1 for most of the chosen for example and demonstration purposes epidemiology weeks. The highest relative only. risk is at Sabah with 1.3280. While, the lowest estimated relative risk is in the state of Selangor Figure 3 represents the chikungunya risk maps with 0.8908. Overall, the result from the analysis at epidemiology week 10 based on different indicates that people in Malaysia are categorized models that has been discussed. Unfortunately, to have low risk to contract with chikungunya there is no risk map for the stochastic SIR-SI disease since the estimated relative risk is close model 1 because the relative risk is difficult to one for most of the epidemiology weeks. to be estimated due to the zero value of the

MyStats 2015 Proceedings 143 denominator. From Figure 3(a), the chikungunya and Sarawak have darkest coloured regions in risk map based on the SMR method depicts the map. This means that people within both that people in all states are at very low risk to states have very high risk to be infected with get infected with the chikungunya disease. the chikungunya virus compared to people Meanwhile, the chikungunya risk map based in the whole population of Malaysia. This is on the Poisson-Gamma model displayed in followed by the states of Perak and Perlis which Figure 3(b) depicts that people in all states are have estimated relative risks more than one. at low risk to get infected with the chikungunya This means that people in these states are at disease compared to the people in the whole high risk and are categorized as more likely population. In contrast, Figure 3(c) which is the to get infected with the chikungunya disease chikungunya risk map based on the stochastic compared to the people in overall population SIR-SI model 2 shows that the states of Sabah of Malaysia.

Figure 2: Relative Risk Estimation based on the SMR Method, Poisson-Gamma Model and the Stochastic SIR-SI Model 2

144 MyStats 2015 Proceedings a) SMR Method

b) Poisson-Gamma Model

c) Stochastic SIR-SI Model 2

Figure 3: The Chikungunya Risk Maps

MyStats 2015 Proceedings 145 4. Conclusion

Results of the analysis show that the Stochastic SIR-SI Model 2 offers improved methodology for estimating the relative risk compared to the other three methods. This is because the improved model offers a more detailed description of the biological process, which takes into account the transmission of the disease while enables covariate adjustments and allows for spatial correlation between risks in adjacent areas. Furthermore, this new approach has been demonstrated is better and suitable for the case of rare disease like chikungunya.

5. References

Lawson, A.B., Browne, W. J., & Vidal Rodeiro, C.L. (2003). Disease mapping with WinBUGS and MLwiN, England: John Wiley & Sons.

Nakapan, S., Tripathi, N.K., Tipdecho, T., & Souris, M. (2012). Spatial diffusion of influenza outbreak- related climate factors in Chiang Mai Province, Thailand. International Journal of Environmental Research and Public Health, 9 (11), pp. 3824-3842.

Rajabi, M., Mansourian, A., & Bazmani, A. (2013). Susceptibility mapping of visceral leishmaniasis based on fuzzy modelling and group decision-making methods. Geospatial Health, 7 (1), pp. 37-50.

Samat, N. A., & Percy, D.F. (2012). Vector-borne infectious disease mapping with stochastic difference equations: An analysis of dengue disease in Malaysia. Journal of Applied Statistics, 39(9), pp. 2029- 2046.

Samat, N. A., & Mohd Imam Ma’arof, S.H. (2015). New Approach to Calculate the Denominator of the Relative Risk Equation. Sains Malaysiana, (in press).

146 MyStats 2015 Proceedings DEFECT IDENTIFICATION: APPLICATION OF STATISTICAL PROCESS CONTROL IN PRODUCTION LINE

Erni Tanius 1 , Noorul Ameera Adnan 2, Sharifah Zuraidah Syed Abdul Jalil 3, Che Manisah Mohd Kasim 4

Abstract of the key quality characteristic, achieving process stability and improving productivity as Control chart is a statistical process control (SPC) well as able to detect defect sample and identify that use to monitor the quality characteristics abnormal conditions so as to prevent defects, of processes to ensure the required quality scrap and rework of final products. Orhan Engin, level. The purpose of this study is to observe Ahmet Çelik, İhsan Kaya (2008) and Lee‐Ing Tong, the pattern of process variability based on data (1998) ) proposed used of control chart approach input have been taken from production lines in engine valve manufacturing process. in the months of July and August. 40 products’ samples from the production lines have been The basic idea in SPC is to take random samples selected. The methods used to analyse the of products from a production line and examine defect samples data are Microsoft Excel and the products to ensure that certain criteria of Minitab16 Statistical Software. Next, control quality are satisfied. If the products sampled charts test was done on the data. The result are found to be of inferior quality, then the shows, after control charts were able to detect manufacturing process is checked to seek out defect on the microprocessor being produced, assignable causes of inferior quality to bring the it allowed revised centre line and control limits process back to control. for changes in production. In conclusion, the control charts offers a reliable solution that is SPC is a used to monitor standard, making particularly suited to line production and testing measurements and taking corrective action operations such as those found in an automated on the product being produced or service manufacturing environment. delivered. The process applies mathematics and statistics in the evaluation of its process 1. Introduction performance. This methods have been widely used in production for abnormality detections In today’s fierce competition markets, product- and it is the important technique for measuring quality control has become a vital task for the value of the quality characteristic and help to manufacturers to sustain their mass-production identify a change or variation in some quality of products with low production cost and high the product. SPC has also become the backbone quality. The most useful technique to maintain of modern quality control in both theory and quality and reduce variability of products is practice. It is also useful for managing the using statistical process control (SPC) in line production floor to ensure the quality, safety and production. Goetsch & Davis (2003) defines realibility of the output. SPC is as a statistical method of separating variation resulting from special causes from This study is conducted in a manufacturing natural variation and to establish and maintain company that produce microprocessor. Top consistency in process, enabling process plates is used by this company to detect defect improvement. SPC is able to reduce variability on production line and Tim Adhesive is used to

1, 2, 3, 4 University Selangor, Selangor, Malaysia

MyStats 2015 Proceedings 147 correct it if the defect is found. Even thought be considered in control, while the centre line after using top plates some defects on the is the mean value for the process. These limits microprocessor still being identified. Therefore, are arbitrarily established at three standard this study would like to investigate the process deviations above and below the centre line. variability of these output based the data sample. It focuses on identify the pattern of 2. Methodology process variability based on data sample and to determine whether there is a significant The variables which will be used in this difference in the process mean of the data study are defect sample on production line. sample. The analysis is done by using control chart and MiniTAB16. The defect data bases on 1.1 Preliminary of the control chart inspection which done by operator or engineer on several lots and inspected one by one by In this study control chart is used to identify using a special top plate. The table of data was defect sample by monitoring the activities designed to obtain important information of a line production. It is widely used to help about the collected data such as lots number, manufacturer to take decisions such as the model of product, number of product for one need for machine or technology replacement lots, number of defect and total of defect. The to meet quality standards. It has attracted data is collected from forty (40) samples. the attention of many researchers as James D.T. Tannock, (1997) used it for economic 2.1 Type of Attributes Data comparison between variables control charting and other inspection for various scenarios. For this study, the type of data that has been Meanwhile, Joekes S. Barbosa E. P. (2013) collected is determined and based on the introduce an adjusted control limits of the characteristics of the data type, is called as p-chart. Simionescu M. (2015), developed it an attribute. Meanwhile U-chart is chosen for predict the national unemployment rate using further analysis since the sample number are unemployment rates at regional level. Gilenson unequal. For this study, two types of u-chart will M., Hassoun M., L (2014) formulated a trade-off be used for analysis, they are: between the expected values of the CT and the die Yield and found out control chart model (a) u-chart using variable-width control limit enables decision makers to knowingly sacrifice m Yield to shorten CT and vice versa. Another u ∑ = ci control limits = uu±= 3 ; where i 1 researcher But Wenbin Wang, Wenjuan Zhang, n m i ∑ = ni (2008) found that control chart are also not very i 1 sensitive to the small casual changes in the data. Finally Orhan Engin, Ahmet Çelik, İhsan Kaya (b) u-chart based on average sample size (2008) proposed used of control chart approach in engine valve manufacturing process. It can be mm u ∑∑ii=11cnii= concluded that the application of control chart control limits = u±== 3 ; where unm and n n mi is widely used by industry from different sectors ∑i=1 i in producing the quality products. Analysis of data The control chart consists of three lines; an upper control limit (UCL), a lower control limit After data are collected, data is analyzed by (LCL) and a center line (CL). The upper and using. In this study, MiniTab16 is used for the lower control limits are the maximum and u-chart output. Figure 2.0 illustrates the steps minimum values for a process characteristic to involved in analysis for this study.

148 MyStats 2015 Proceedings 2.0 Determine Measurement Method

Figure 3.1 : Control chart before going through Tim and Adhesive line

From the figure 3.1, it is identified that all samples of microprocessor are within the 3. Result control limit and they present a reasonably random pattern around µ . Hence the process is 3.1 Control Chart before running at Tim and said to be in control. Next, the similar sample are Adhesive line going through Tim and Adhesive line.

Result from Microsoft Excel 2007 is used to 3.2 After going through at the Tim and calculate control chart- Control limit (CL); Upper Adhesive line Control Limit (UCL) and Lower Control Limit (LCL). The formula as below: The same data were observed by using a special top plate to identify the defect. The result shows the number of defect increased. Since there is a ci ∑c 514 u = u =i = = 0.0161 i n ∑ n 31971 defect, the data is being tested again by using i i MiniTab16 and CL, UCL and LCL per sample (4) by using this formula; CL – 0.0161;

u Since the sample size is not constant, CL has u ± 3 n been calculated per sample (4); i ; CL = 0.0208, the result is below:

u n =797; UCL = 0.036125; LCL = 0.005474; u ± 3 i n =798, UCL = 0.036116; LCL = 0.005483; ni i ni =799; UCL = 0.036106; LCL = 0.005493;

ni =800, UCL = 0.036097; LCL = 0.005496 n =797; UCL = 0.029563; LCL = 0.002616; i Base on above result, the sample fraction n =798, UCL = 0.029576; LCL = 0.002625; i nonconformities from each preliminary sample n =799; UCL = 0.029566; LCL = 0.002633; i is plotted in this chart. It is noted that two n =800, UCL = 0.029558; LCL = 0.02956 i points, those from sample 10 and 30, plot above the UCL, so the process is not in control. Next, the result is plotted on a u-chart by using So, these points must be investigated to see an MiniTab16. assignable cause can be determined.

MyStats 2015 Proceedings 149 U Chart after revised 0.040 1 1 0.035 UCL=0.0349 t i

n 0.030 U

r e P

0.025 t n

u _ o

C 0.020 U=0.0199

e l p

m 0.015 a S 0.010

0.005 LCL=0.0049

1 5 9 13 17 21 25 29 33 37 Figure 3.2 : Control chart after going through Sample Tim and Adhesive line Tests performed with unequal sample sizes Figure 3.3 : Control chart after revised Assuming assignable cause, the control limits Therefore, it can be concluded that these are revised. Sample 10 and 30 are eliminated new control limits in Figure 3.3 can be from the control limits calculation, and the used for future samples. The control limits new center line and revised control limits are [0.0049,0.0349] are adopted as trial control calculated. limits for use in the following month where monitoring of future production is of interest. 3.3 Calculation of revised center line and The center line of 0.0199 is considered as the control limits mean of the process. Revised of CL can be done by using the 4. Conclusion following formula: Based on the result, it can be concluded that n 31971−− 800 800 30371 n =∑ i = = = 799.236 the overall performance of the sample can be ∑ ki 40− 2 38 control before and after process of Tim and Adhesive line by using Control chart. It showed that sample can be revised for data that were ∑c 664 -30-29 u =i = = 0.0199 out of control. This is done in order to ensure −− ∑ ni 31971 800 800 the center line and control limits of the same sample can be used for following months. In conclusion, control chart is an essential tool to CL = 0.0199; UCL = 0.0349; LCL = 0.0049 continuous quality control. The control chart shows, how the process is performing and how the process are effected by change to The revised center line and control limits are the process. It is very useful for SME in order shown on the control chart in Figure 3.3. This to improve their quality product. The study annotation of the control chart to indicate suggested future study is needed, especially unusual points, process adjustments, or the in identifying what could possibility reasons type of investigation made at a particular point for defect and the process may be simplified. in time forms a useful record for future process For example, companies can come out with a analysis and should become a standard practice system that is more user friendly for those who in control chart usage. do not have statistic backgrounds.

150 MyStats 2015 Proceedings REFERENCE

Gilenson M., Hassoun M., L (2014) Setting Defect Charts Control Limits to Balance Cycle Time and Yield for a Tandem Production Line, Computer and operational journal.

James D.T. Tannock, (1997) "An economic comparison of inspection and control charting using simulation", International Journal of Quality & Reliability Management, Vol. 14 Iss: 7, pp.687 – 699

Joekes S. Barbosa E. P. (2013) An improved attribute control chart for monitoring non-conforming proportion in high quality processes, Control Engineering Practice, Volume 21, Issue 4, April 2013, Pages 407–412

Lee‐Ing Tong, (1998) "Modified process control chart in IC fabrication using clustering analysis", International Journal of Quality & Reliability Management, Vol. 15 Iss: 6, pp.582 - 598

Orhan Engin, Ahmet Çelik, İhsan Kaya (2008), A fuzzy approach to define sample size for attributes control chart in multistage processes: An application in engine valve manufacturing process, Applied Soft Computing, Volume 8, Issue 4, September 2008, Pages 1654–1663

Simionescu M. (2015), Predicting the National Unemployment Rate in Romania Using a Spatial Auto- regressive Model that Includes Random Effects, Procedia Economics and Finance, Volume 22, 2015, Pages 663–671, 2nd International Conference 'Economic Scientific Research - Theoretical, Empirical and Practical Approaches', ESPERA 2014, 13-14 November 2014, Bucharest, Romania

Wenbin Wang, Wenjuan Zhang, (2008) "Early defect identification: application of statistical process control methods", Journal of Quality in Maintenance Engineering, Vol. 14 Iss: 3, pp.225 – 236

MyStats 2015 Proceedings 151 152 MyStats 2015 Proceedings CAPTURE, ORGANIZE AND ANALYZE BIG DATA FOR RESEARCH ENTERPRISE

Rakhisah Mat Zin 1

Abstract location research data—including large multi- media files. Today the term ‘big data’ draws a lot of attention, but behind the hype—especially for Oracle Solutions for Big Data Research research organizations—there’s a simple story. Oracle offers the broadest and most For decades, traditional research organizations integrated portfolio of products to help have been collecting and working with data research organizations acquire, aggregate, on a daily basis. As computing technology has organize and analyze data from diverse sources. evolved, so has the ability to gather, aggregate, And Oracle is the first vendor to address the analyze, and store increasing volumes of data. full spectrum of research enterprise big data But even many of the most forward-thinking requirements and is uniquely qualified to research technologists underestimated how fast combine everything needed to meet your these volumes of data would grow. And the big data challenges—including software and challenge is how to prepare the framework in hardware—into one engineered systems. managing and processing this big data will less difficulties so that the organization can focus on Key Components of the Research Infrastructure the research or analysis itself. to Support Big Data

Introduction The requirements in a big data infrastructure span data acquisition, data organization and In large part, the data deluge—or big data analysis. Oracle’s solutions to support these data phenomenon— in research has been areas are outlined below. For more details, fueled by the proliferation of unstructured, download Oracle’s whitepaper on Big Data non-traditional data generated through Whitepaper or visit the information about collaboration tools and social networking Oracle Big Data on Oracle.com. sources as well as the global sharing among researchers of observational data, simulation Acquire: Making the most of big data means models, and experimental data. In addition, quickly capturing high volumes of data libraries have continued to digitize huge generated in many different formats. Oracle volumes of archived bodies of research that offers a range of solution which includes Big once were available only to a handful of Data Appliance which bundled with software researchers—or not at all. to support big data processing. As the leader in database technologies, Oracle is developed In addition, the cost of storage, compute power, to handle very high transaction volumes in and capacity has decreased, making it more a distributed environment and support the affordable to aggregate, share, and analyze research community’s need for flexible, dynamic data in ways that may not have been feasible data structures. for many research organizations just a few years ago. And the proliferation of smart phones, Organize: A big data research platform needs GPS and other mobile devices have supported to process massive quantities of data—filtering, the immediacy of capturing observation and transforming and sorting it before loading it

1 Oracle Corporation, Malaysia

MyStats 2015 Proceedings 153 into a data warehouse. Oracle offers a choice With these technologies, research enterprise of solution for organizing data including will enjoy the benefit of having predictive Oracle Data Integration together with Big Data analytics, data mining, text mining, statistical Connectors. In addition, Oracle enables end- analysis, advanced numerical computations and to-end control of structured and unstructured interactive graphics inside the database. content, allowing you to manage all your data from application-to-archive efficiently, securely, Another challenge in implementing big and cost effectively with the Oracle content data projects is basically lack of expertise in management and tiered storage solution using new language that comes with new designed specifically for research organizations. technologies in Hadoop world to access and analyze the data. Oracle has taken one step Analyze: The infrastructure required for further in delivering Big Data SQL where we analyzing big data must be able to support bring the knowledge of SQL to be able to deeper analytics such as statistical analysis and access both structured data from the relational data mining on a wider variety of data types database together with unstructured data from stored in diverse systems; scale to extreme data Hadoop world and also NoSQL data. Customers volumes; deliver faster response times; and can use existing expertise in using SQL to automate decisions based on analytical models. manage all data sets. Oracle offers a portfolio of tools for statistical and advanced analysis. These solutions include The key benefit of using these technologies is to the Oracle Exadata and Data Warehousing as reduce data movement, preserve data security well as developer tools for the application layer and maximize performance. such as Oracle Application Express, which allows your researchers to more easily access and Decide: Many analytical and visualization tools analyze data from within their applications. have surfaced in recent years. It is important to understand the similarities and differences of Being in data management business for more the key capabilities to help select the right tool than 30 years, Oracle objective is to reduce for the right analysis and user base. data movement to make it easier for customer to process and analyze the data. Oracle Business Intelligence and Information Discovery Database 12c running on engineered system solve different problems and creates different such as Oracle Exadata allows customer to type of values. Business Intelligence provides use in-database implementation of advanced proven answers to known questions. Infor analytics features like data mining and statistical mation Discovery provides fast answers to new analysis without moving the data out of Oracle questions. The key performance indicators Database. This is provided thru the use of Oracle (KPIs), reports, and dashboards produced by Advanced Analytics function which includes the Business Intelligence tools drive the need Oracle Data Mining and Oracle R Enterprise. for exploration and discovery using Information

154 MyStats 2015 Proceedings Discovery Platform. Visualization tools that projects before. This tool normally will work with will help users to make decision can be either structured data where the data model already Oracle Business Intelligence or Oracle Big Data prepared. However, when we move to Big Data Discovery to support the information discovery world, where the data can be structure, semi- needs. structured and also unstructured, we need to have the capability of analyzing the relationship Majority of us are familiar with Business of this data without creating specific data model Intelligence since we have been using these and able to discovery new patterns which can tools as part of analytics or data warehouse help organizations to create new KPIs.

CASE STUDIES FINANCIAL SECTOR: CAIXABANK – BARCELONA, SPAIN

CaixaBank Maximizes Big Data Business Value and Improves Analytics, Agility, Service, and Organizational Efficiency

Background CaixaBank is a leading Spanish retail bank and insurer with more than 13.8 million customers, 9,700 ATMs, and 5,300 branches. Centering on delivering customer service through innovation and technology, CaixaBank is one of Europe’s leading banks with 4.2 million active internet banking users and 2.6 million mobile bank clients. The bank received the Innovative Spirit Award at the 2014 Global Banking Innovation Awards, selected by Bank Administration Institute and Finacle. CaixaBank was also named Best Retail Bank for Technology Innovation by Euromoney magazine in 2013 and 2014.

Challenges • Integrate data from bank branches, ATMs, and internet and mobile banking to gain a complete understanding of customers and offer personalized banking solutions—gaining in customer loyalty and business competitiveness • Improve messaging to reach customers more effectively, better informing them of new bank services and products to boost sales • Implement a flexible, adaptable, scalable, and future-proof solution to ensure the bank’s competitiveness and ability to adapt to changes in business strategy or banking sector regulations • Input, manage, and analyze massive amounts of diverse data from both external and internal sources, efficiently and effectively extending traditional analysis with latest technologies to discover information, trends, and patterns, regardless of data source or format • Make a complex and expensive data extraction and transformation process agile and flexible, improving time-to-market and enabling the bank to quickly analyze, decide, adapt, and act as the business, its sector, and customers become data- oriented and mobile • Provide employees with access to non-sensitive information to promote creativity and innovation and improve productivity

MyStats 2015 Proceedings 155 Solutions • Consolidated data marts into data pools with Oracle Big Data Appliance, Oracle Exadata, and Oracle Big Data Connectors— integrating massive data from all points of sale and customers’ online and mobile profiles—enabling the bank to understand customers, their preferences, and their mood to quickly and flexibly offer them tailored solutions using Oracle Real-Time Decisions, Oracle Advanced Analytics, Oracle R Enterprise, and Oracle Business Intelligence on Oracle Exalytics • Improved click-through rate by 39% and real sells by 50% in less than a month using Oracle Real-Time Decisions, deploying personalized, targeted messages at the right moment by leveraging customer data and habits • Gained the ability to find new data patterns, correlations, uses, and transformations using Oracle Advanced Analytics, boosting the bank’s agility, flexibility, and time-to-market and enabling it to quickly make decisions and implement changes to respond to business strategies and adapt to evolving banking regulations • Provided a complete, unified view of all internal and external data relevant to the bank’s business needs—including institutional reports, information sources, quality assurance data, security data, standardization data, IT applications, business structure data, and analysis results—enabling it to optimize resources, saving time and costs • Enabled authorized users to easily share and access information, providing a structured platform that enables bank employees to submit suggestions and creative solutions and their superiors to follow up and implement changes

Customer “Oracle Big Data Appliance, Oracle Advanced Analytics, and Oracle Real- Quote Time Decisions enable us to quickly find patterns and correlations in our customers’ online interaction with us. We’ve improved our business agility and flexibility as well as our ability to know and serve our customers, ultimately focusing on creating value for them rather than solving IT issues.” – Luis Esteban Grifoll, Chief Data Officer, CaixaBank

Case Study http://www.oracle.com/us/corporate/customers/customersearch/caixabank- URL 1-big-data-2648217.html

156 MyStats 2015 Proceedings EDUCATION SECTOR : VALDOSTA STATE UNIVERSITY, USA

Valdosta State University Identifies Key Correlations in Student Data to Improve Student Retention, Progression, and Graduation Rates

Background Established in 1906, Valdosta State University (VSU) is an American public university and is one of the two regional universities in the University System of Georgia. Valdosta State includes five colleges offering 56 undergraduate degree programs and more than 40 graduate programs and degrees.

Challenges • Address a 67% one-year student retention rate, which costs the university US$6.5 million in lost revenue per year • Collect and analyze structured and unstructured student data from multiple sources to more quickly and effectively identify at-risk students, engage appropriate faculty and staff, and develop targeted programs to improve student retention • Enhance administrative productivity by accelerating response times to data requests from faculty and staff

Solutions • Used Oracle Business Intelligence Enterprise Edition’s interactive dashboards to improve administrative productivity by more than 500% by enabling administrative staff to fulfill 90% of all data requests from faculty, department heads, deans, and Office of Financial Aid grants staff within 24 hours—compared to the previous average response time of one to three weeks • Implemented Oracle Endeca Information Discovery to combine structured and unstructured student data—like student surveys and ID card usage data—in one system for analysis • Enabled the university to examine granular data to draw previously unknown correlations to make more informed decisions—for example, determining that students who eat breakfast on campus have a 10% higher retention rateUsed • Oracle Endeca Information Discovery to discover that freshmen who work on campus have an 85% retention rate, compared to the general freshman population, which has a 55% retention rate—helping spur a decision to invest US$200,000 in student jobs on campus, which will likely save the university US$2 million in retention costs over two years

MyStats 2015 Proceedings 157 • Streamlined IT responsibilities and cut database administration time in half, enabling database administrators and programmers to focus on core tasks, such as implementing new web applications, exploring new technologies, and building new reports

Customer Quote "With Oracle Business Intelligence and Oracle Endeca Information Discovery, Valdosta State University has achieved long-term goals in a very short period of time. We have much better insight into student data, helping us to identify at-risk students, promote student success, and improve graduation and retention rates.” – Brian Haugabrook, CIO, Valdosta State University Case Study URL http://www.oracle.com/us/corporate/customers/customersearch/vsu- 1-endeca-ss-2156270.html

Summary

No matter what type of data your researchers need to capture, access, and analyze and decide; Oracle’s industry-leading solutions deliver the capacity, security, and processing speed researchers want. Oracle has all the infrastructure components needed to support big data sharing and interoperability for global, heterogeneous, and increasingly multidisciplinary research environments and ecosystems.

158 MyStats 2015 Proceedings SPATIAL PATTERN OF LEVERAGE REGIONS Iffah Atqa1, Sonia Marophahita2, Leilya Kartika3, Khodija Kamila4, Risna Yuliani5, Tudzla Hernita6

Abstract structure such as bridge, hospital, traditional low rise market, school, office, pedestrian Presidential Regulation number 96 year 2012 road, play ground park. These 34 IKK leverage stated that allocation fund is weighted among regions are mostly located in Papua and Western others by Human Development Index (HDI), Papua of Papua Island. For visibility a bivariate Gross Domestic Product (GDP), and Index of minimum ellipse area is computed for 34 IKK Physical Construction Cost (IKK). Law 33 year leverage regions with regard to HDI. A bivariate 2004 part 03 chapter 06 paragraph 32 article 03 minimum ellipse area is computed for 39 GDP allows certain regions not to receive allocation leverage regions with regard to HDI. Further fund so that nation capital is separated from 497 these IKK leverage regions are linearly correlated regions. All data are available as public domain with HDI. The value of the linear correlation for the universe and sampling of regions is excluding Puncak Jaya district is -0,68. That is not neccessary. Weights affect allocation fund expensive regions are less developed in human so that any relatively distant weight deserves measurement. Hopefully additional allocation attention. Actual GDP data can be used to fund for IKK leverage regions can simultaneously reduce certain regional allocation fund more help human development and physical infra than application of weights. Similarly actual IKK structure development. Despite low HDI, data can be used to supplement certain regional allocation fund for GDP leverage regions does allocation fund on top of usual application of not need any change. weights. Affected regional allocation fund is hopefully based on actual bivarate data of HDI and GDP as well as HDI and IKK. Other weights 1. Introduction can also be taken into account if necessary. The remaining 491 regions are subject to leverage There are 28 GDP leverage regions [points] of regions [points] detection by median absolute only big cities and districts surrounding a big deviation [MAD] of GDP. There are 28 leverage city. For visibility Surabaya big city is excluded regions [points] of only big cities and districts from Figure 1. Indramayu is highly visible on surrounding a big city. Allocation fund is leftmost corner means low HDI. among other things supposed to raise HDI and these 28 highest GDP regions are expected to 2.1 GDP Results show high HDI. It turns out that three leverage regions of highest GDP have low HDI, these Indramayu is a district surrounded by Cirebon, are Karawang, Indramayu, Malang. These three Majalengka, Subang, and Sumedang. Although GDP leverage regions having low HDI however Indramayu has the highest GDP, its HDI is low are geographically surrounded by some higher among those four neighboring regions. It shows HDI of lower GDP regions. Next 34 highest IKK that its GDP is not enough for funding better leverage regions are separated that is regions qualities of component of HDI such as education where it is expensive to build physical infra facilities and health facilities.

1, 3 STIS 54 Economic Division, Sekolah Tinggi Ilmu Statistik, Jakarta, Indonesia 2, 4, 5, 6 STIS 53 Computer Division, Sekolah Tinggi Ilmu Statistik, Jakarta, Indonesia

MyStats 2015 Proceedings 159 Figure 1. Spatial Pattern of Leverage Regions (HDI and GDP Constant Price)

Figure 2. West Java Region

Figure 3. High IKK

160 MyStats 2015 Proceedings Indramayu is on the second place of pupil to regions can simultaneously help human teacher ratio among those five neighboring development and physical infra structure regions. Pupil to teacher ratio is number of development. pupils enrolled in primary school divided by number of primary school teachers. The 3. Conclusion expected ratio is 22 which show that a teacher should teach 22 pupils. Weights affect allocation fund so that any relatively distant region in terms ofweight 2.2 IKK Result deserves attention. Actual GDP data can be used to reduce certain regional allocation fund more Median Absolute Deviation [MAD] is applied than application of weights. Similarly actual IKK to find leverage regions in terms of IKK. Next data can be used to supplement certain regional 34 highest IKK leverage regions are separated, allocation fund on top of usual application of out of 463 regions, that is regions where it is weights. Affected regional allocation fund is expensive to build physical infra structure such hopefully based on actual bivarate data of HDI as bridge, hospital, traditional low rise market, and GDP as well as HDI and IKK. Other weights school, office, pedestrian road, play ground can also be taken into account if necessary. park. These 34 IKK leverage regions are mostly located in Papua and Western Papua of Papua Despite low HDI, allocation fund for GDP Island. For visibility a bivariate minimum ellipse leverage regions does not need any change. area is computed for 34 IKK leverage regions GDP leverage regions are supposed to take care with regard to HDI. A bivariate minimum ellipse of low HDI on their own. IKK leverage regions area is computed for 39 GDP leverage regions need more allocation fund. with regard to HDI. 4. End Note Figure 3. High IKK Regions The author and co-authors declare there is not Further these IKK leverage regions are linearly any potential conflict of interest with respect to correlated with HDI. The value of the linear the research, authorship, and/or publication of correlation excluding Puncak Jaya district this article. The views expressed here are those is -0,68. That is expensive regions are less of the individual author and co-authors and not developed in HDI measurement. Hopefully necessarily those of STIS or its board, or officers, additional allocation fund for IKK leverage or staff.

5. References

Rahmatullah Imon & Ali S. Hadi , Identification of multiple high leverage points in logistic regression. Journal of Applied Statistics, Volume 40, Issue 12, July 2013

Dini,Aulia and Idaman. (2014) “Temporary Redistribution of Allocation Fund,” ISIRSC 2014 - Abstract Book.pdf, 115-116

MyStats 2015 Proceedings 161 162 MyStats 2015 Proceedings National Statistics Conference (MyStats 2015) Proceedings

Papers

Advancement of Data Transmission through METS Tan Bee Bee 19

Synergising GST Rate with Direct Tax Rate in Sustaining Economic Growth in Malaysia: Is There A Laffer Curve? Sherly George, Y.B.Prof Dr Syed Omar Syed Agil 23

Development of ICT in Garnering Statistics Sabri Omar 39

DESA: Growing the Digital Economy from a National Perspective Mohd Jalallul @ Jasni Zain Mohd Isa, Syahida Ismail, Nur Asyikin Abdul Najib 53

A Fuzzy Approach to Enhance Uncertain Post Flood Damage Assessment for Quality Risk Analysis Dr. Sharifah Sakinah Syed Ahmad, Emaliana Kasmuri 61

Malaysian Household Consumption Expenditure: Rural vs Urban Dr. Wan Zawiah Wan Zin, Siti Fatin Nabilah 69

Open Data and Challenges Faced by National Statistics Office Siti Haslinda Mohd Din, Nur Aziha Mansor, Faiza Rusrianti Tajul Arus 77

Multivariate Time Series Similarity-Based Complex Network In Stocks Market Analysis: Case of NYSE During Global Crisis 2008 Professor Dr. Maman Abdurachman Djauhari, Gan Siew Lee 83

Perceived Happiness and Self-rated Health: The Twins? A Bivariate Ordered Probit Models Analysis Using World Value Survey Koay Ying Yin, Eng Yoke Kee, Wong Chin Yoong 89

MyStats 2015 Proceedings 163 The Cyclical Extraction Method and the Causality Model in Business Cycle Analyses: Do they complement or clash to one another? Abdul Latib Talib 95

Inflation of a Type II Error Rate in Three-Arm Non-Inferiority Trials Dr. Nor Afzalina Azmee 115

Generalized autoregressive moving average: An application to GDP in Malaysia Dr. Thulasyammal Ramiah Pillai 121

Academic Achievement on Student Motivation: Latent Class Analysis Across Gender Group Dr. Nurulkamal Masseran, Zainudin Awang, M.A.M. Asri, Ahmad Nazim Aimran, Hidayah Razali 133

Chikungunya Disease Mapping in Malaysia: an Analysis Based on SMR Method, Poisson-gamma Model, SIR-SI Model 1 and SIR-SI Model 2 Dr. Nor Azah Samat, S. H. Mohd Imam Ma’arof 143

Defect Identification: Application of Statistical Process Control in Production Line Erni Tanius, Noorul Ameera Adnan, Sharifah Zuraidah Syed Abdul Jalil, Che Manisah Mohd Kasim 149

Capture, Organize and Analyze Big Data for Research Enterprise Rakhisah Mat Zin 155

Pattern of Leverage Regions Iffah Atqa, Sonia Hajar Marophahita, Leilya Kartika, Khodija Kamila, Risna Yuliani, Tudzla Hernita 161

164 MyStats 2015 Proceedings MyStats 2015 Proceedings 165