Using Data Analytics to Extract Knowledge from Middle-Of-Life Product Data

Total Page:16

File Type:pdf, Size:1020Kb

Using Data Analytics to Extract Knowledge from Middle-Of-Life Product Data International Journal of Advanced Research and Publications ISSN: 2456-9992 Using Data Analytics To Extract Knowledge From Middle-Of-Life Product Data Fatima-Zahra Abou Eddahab, Imre Horváth Delft University of Technology, Industrial Design Engineering, Landbergstraat 15, 2628 CE Delft, Netherlands, [email protected] Delft University of Technology, Industrial Design Engineering, Landbergstraat 15, 2628 CE Delft, Netherlands, [email protected] Abstract: Data analytics needs dedicated tools, which are getting complex. In this paper we summarize the results of our literature research done with special attention to existing and potential future tools. The attention is paid mainly to processing big data, rather than to effective semantic processing of middle-of-life data (MoLD). The issue of handling of big MoLD of specific characteristics efficiently has not yet been addressed by the commercialized tools and methods. Another major limitation of the current tools is that they are typically not capable to adapt themselves to designers‟ needs and to support knowledge/experience reusability in multiple design tasks. MoLD require quasi-real life handling due to their nature and direct feedback relationships to the operation process and the environment of products. Nowadays products are equipped with smart capabilities. This offers new opportunities for exploiting MoLD. The knowledge aggregated in this study will be used in the development of a toolbox, which (i) integrates various tools under a unified interface, (ii) implements various smart and semantics orientated functions, and (iii) facilitates data transformations in contexts by the practicing designers themselves. Keywords: Data analytics, middle-of-life data, analytics tools, application practices, smart analytics toolbox. 1. Introduction circumstances. This may provide insights for them concerning how to transform use patterns to specific product 1.1. Setting the stage enhancements, and how to avoid deficiencies which may The rapid rise of emerging economies and deployment of occur under circumstances that were not completely known information technologies have led to remarkable changes in or specified in the development phase of their products. the lifecycle of products and services. Because of the rapid MoLD can be obtained in multiple ways. They can be development of hardware and software, focusing on aggregated by field observations and interrogations of the exploration of data became ubiquitous [1]. Companies need users, or studying failure log files and maintenance reports, to adjust their operations to the influences of the rapidly or from relevant web resources such as social media and user evolving markets and to better manage the lifecycle of their forums. Alternatively, they can be elicited directly from products. Towards this end goals, they have to combine (i) products by sensors or self-registrations. The last mentioned static process information with dynamic ones, (ii) product approaches are getting more popular as products advance information with processes and resources information, and from traditional self-standing products to network linked (iii) human aspect information with business information advanced products to awareness and reasoning enabled smart throughout the entire product lifecycle [2]. However, as products [7]. However, due to the dynamic change of sensor reflected by the related literature, most of the past efforts data, the large volumes of data aggregated over time, and the were dedicated to the methodological and computational unknown nature of data patterns, it is unfortunately not support of the begin-of-life and end-of-life models and straightforward to perform effective data analysis using the activities. Less efforts were made to exploit middle-of-life existing traditional techniques [8]. Feeding structured MoLD data (MoLD) and to value/knowledge creation based on this and use patterns back to product designers is an kind of data. Thanks to new information technologies such as insufficiently addressed issue [9]. The key challenge is to sensors, RFID and smart tags, the chunks of information find ways to effectively use data analytics techniques in conveyed by the middle-of-life (MoL) phase of product purposeful combinations, depending on the application usage can now be identified, tracked and collected [3]. Our contexts and the specific objectives of product designers background research was stimulated by the observation that [10]. there seems to be a lack of tools to support decision-making in design and servicing using MoLD. This was placed into 1.2. Reasoning model and contents the position of a concrete research phenomenon to be studied We completed a comprehensive literature study in two [4]. The above decision was underpinned by the following phases. Referred to as „shallow exploration‟, the first phase argumentation. Effective statistical and semantic processing was conducted to identify the most relevant domains of of MoLD is not only an academic challenge, but also a useful knowledge for the study. Based on a wide range of asset for the industry [5], since logistics, operation and keywords, we tried to develop a topographic landscape of the maintenance activities are located in the MoL phase [6]. It is related publications. This topographic landscape was important for product developer and production companies supposed not only to show the distribution (the clusters of to see how their products are used by different customers in the keyword-related publications), but also the peaks and the different application contexts and under different plains of the clusters. Figure 1 shows the clustering of the Volume 4 Issue 1, January 2020 1 www.ijarp.org International Journal of Advanced Research and Publications ISSN: 2456-9992 outcome of the keyword-based mapping of the related literature. The graphical image was built using VOSviewer based on the concerned articles. Figure 1 shows not only the neighboring (semantically related) keywords, but also the distances between them as they appear in the literature. The different colors indicate the frequency of the occurrence of keywords (i.e. the formation of peaks). The most frequently occurring keywords are shown in red (and dark orange) and the less frequent ones are shown in green (and light blue). The visual representation generated by the software application let us see that actually four major clusters of papers could be identified. In a kind of transitive ordering, Figure 2: Connectivity graph of the chosen family of these are: (i) changes in the nature of data, (ii) approaches of keywords transformation of data, (iii) tools and packages for data analytics, and (iv) design applications of data analytics. The above pieces of information obtained from the These were used as descriptors of the main domains of quantitative part of the literature study were used to develop interest in the detailed literature study, in addition to the key a reasoning model for the qualitative part of the study. This phrase „evolution of data analytics support of product part focused on the interpretation on the findings and design‟. Called „deep exploration‟, in the second phase of disclosed semantic relationships. The reasoning model, our study, various sources such as subscription-based and which was also used at structuring the contents of this paper, open access journals, conference proceedings, web- is shown in Figure 3. Indicated are only the first and second repositories, and professional publications were used for level key terms while, as mentioned above, the study was collecting several hundreds of relevant publications. The actually done with key terms of the third decomposition findings made it possible to define further relevant key terms level. on a third level (not shown in Figure 1). Figure 3: Reasoning model of the literature research The considered papers were published in different periods of time, ranging from the mid-fifties until today. An important Figure 1: Citation of the chosen family of keywords in the observation was that the concepts identified by the first level literature key terms are in an implicative relationship with each other. Namely, if the nature of data changes, then it entails a The second phase was also used to provide a quantitative change in the approaches of data transformation, that in turns characterization of the interrelationships among the key implies the need for different data transformation methods and tools. These enablers can provide support for a broader terms belonging to the same cluster. Figure 2 shows the range of existing applications and can facilitate new data interrelationships found. If two terms are used in the same analytics applications with regards to enabling product document, then there is a line between them, and the enhancement and innovation by design. The investigation thickness of the line indicates how frequently they occur. In concerning the changes in the nature of data was focused other words, the thick lines refer to combinations of terms mainly on product-related use, maintenance and service data, that appear in multiple papers, whereas the thin ones refer to and on data describing the conditions and behavior of combinations that rarely appear in the studied publications. It products. can be observed in the connectivity diagram shown in Figure 2 that the thickest lines are between the above four key terms 1.3. Organization of the paper – a fact that underlines their significance and relatedness. In In the
Recommended publications
  • Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning
    UPTEC F 19050 Examensarbete 30 hp August 2019 Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Per Hedbrant Master Thesis in Engineering Physics Department of Engineering Sciences Uppsala University Sweden Abstract Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Teknisk- naturvetenskaplig fakultet UTH-enheten Motivation A challenge for researchers at CBCS is the ability to efficiently manage the Besöksadress: different data formats that frequently are changed. Significant amount of time is Ångströmlaboratoriet Lägerhyddsvägen 1 spent on manual pre-processing, converting from one format to another. There are Hus 4, Plan 0 currently no solutions that uses pattern recognition to locate and automatically recognise data structures in a spreadsheet. Postadress: Box 536 751 21 Uppsala Problem Definition The desired solution is to build a self-learning Software as-a-Service (SaaS) for Telefon: automated recognition and loading of data stored in arbitrary formats. The aim of 018 – 471 30 03 this study is three-folded: A) Investigate if unsupervised machine learning Telefax: methods can be used to label different types of cells in spreadsheets. B) 018 – 471 30 00 Investigate if a hypothesis-generating algorithm can be used to label different types of cells in spreadsheets. C) Advise on choices of architecture and Hemsida: technologies for the SaaS solution. http://www.teknat.uu.se/student Method A pre-processing framework is built that can read and pre-process any type of spreadsheet into a feature matrix. Different datasets are read and clustered. An investigation on the usefulness of reducing the dimensionality is also done.
    [Show full text]
  • Advanced Data Mining with Weka (Class 5)
    Advanced Data Mining with Weka Class 5 – Lesson 1 Invoking Python from Weka Peter Reutemann Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1: Invoking Python from Weka Class 1 Time series forecasting Lesson 5.1 Invoking Python from Weka Class 2 Data stream mining Lesson 5.2 Building models in Weka and MOA Lesson 5.3 Visualization Class 3 Interfacing to R and other data mining packages Lesson 5.4 Invoking Weka from Python Class 4 Distributed processing with Apache Spark Lesson 5.5 A challenge, and some Groovy Class 5 Scripting Weka in Python Lesson 5.6 Course summary Lesson 5.1: Invoking Python from Weka Scripting Pros script captures preprocessing, modeling, evaluation, etc. write script once, run multiple times easy to create variants to test theories no compilation involved like with Java Cons programming involved need to familiarize yourself with APIs of libraries writing code is slower than clicking in the GUI Invoking Python from Weka Scripting languages Jython - https://docs.python.org/2/tutorial/ - pure-Java implementation of Python 2.7 - runs in JVM - access to all Java libraries on CLASSPATH - only pure-Python libraries can be used Python - invoking Weka from Python 2.7 - access to full Python library ecosystem Groovy (briefly) - http://www.groovy-lang.org/documentation.html - Java-like syntax - runs in JVM - access to all Java libraries on CLASSPATH Invoking Python from Weka Java vs Python Java Output public class Blah { 1: Hello WekaMOOC! public static void main(String[]
    [Show full text]
  • Introduction to Weka
    Introduction to Weka Overview What is Weka? Where to find Weka? Command Line Vs GUI Datasets in Weka ARFF Files Classifiers in Weka Filters What is Weka? Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Where to find Weka Weka website (Latest version 3.6): – http://www.cs.waikato.ac.nz/ml/weka/ Weka Manual: − http://transact.dl.sourceforge.net/sourcefor ge/weka/WekaManual-3.6.0.pdf CLI Vs GUI Recommended for in-depth usage Explorer Offers some functionality not Experimenter available via the GUI Knowledge Flow Datasets in Weka Each entry in a dataset is an instance of the java class: − weka.core.Instance Each instance consists of a number of attributes Attributes Nominal: one of a predefined list of values − e.g. red, green, blue Numeric: A real or integer number String: Enclosed in “double quotes” Date Relational ARFF Files The external representation of an Instances class Consists of: − A header: Describes the attribute types − Data section: Comma separated list of data ARFF File Example Dataset name Comment Attributes Target / Class variable Data Values Assignment ARFF Files Credit-g Heart-c Hepatitis Vowel Zoo http://www.cs.auckland.ac.nz/~pat/weka/ ARFF Files Basic statistics and validation by running: − java weka.core.Instances data/soybean.arff Classifiers in Weka Learning algorithms in Weka are derived from the abstract class: − weka.classifiers.Classifier Simple classifier: ZeroR − Just determines the most common class − Or the median (in the case of numeric values) − Tests how well the class can be predicted without considering other attributes − Can be used as a Lower Bound on Performance.
    [Show full text]
  • Mathematica Document
    Mathematica Project: Exploratory Data Analysis on ‘Data Scientists’ A big picture view of the state of data scientists and machine learning engineers. ����� ���� ��������� ��� ������ ���� ������ ������ ���� ������/ ������ � ���������� ���� ��� ������ ��� ���������������� �������� ������/ ����� ��������� ��� ���� ���������������� ����� ��������������� ��������� � ������(�������� ���������� ���������) ������ ��������� ����� ������� �������� ����� ������� ��� ������ ����������(���� �������) ��������� ����� ���� ������ ����� (���������� �������) ����������(���������� ������� ���������� ��� ���� ���� �����/ ��� �������������� � ����� ���� �� �������� � ��� ����/���������� ��������������� ������� ������������� ��� ���������� ����� �����(���� �������) ����������� ����� / ����� ��� ������ ��������������� ���������� ����������/�++ ������/������������/����/������ ���� ������� ����� ������� ������� ����������������� ������� ������� ����/����/�������/��/��� ����������(�����/����-�������� ��������) ������������ In this Mathematica project, we will explore the capabilities of Mathematica to better understand the state of data science enthusiasts. The dataset consisting of more than 10,000 rows is obtained from Kaggle, which is a result of ‘Kaggle Survey 2017’. We will explore various capabilities of Mathematica in Data Analysis and Data Visualizations. Further, we will utilize Machine Learning techniques to train models and Classify features with several algorithms, such as Nearest Neighbors, Random Forest. Dataset : https : // www.kaggle.com/kaggle/kaggle
    [Show full text]
  • WEKA: the Waikato Environment for Knowledge Analysis
    WEKA: The Waikato Environment for Knowledge Analysis Stephen R. Garner Department of Computer Science, University of Waikato, Hamilton. ABSTRACT WEKA is a workbench designed to aid in the application of machine learning technology to real world data sets, in particular, data sets from New Zealand’s agricultural sector. In order to do this a range of machine learning techniques are presented to the user in such a way as to hide the idiosyncrasies of input and output formats, as well as allow an exploratory approach in applying the technology. The system presented is a component based one that also has application in machine learning research and education. 1. Introduction The WEKA machine learning workbench has grown out of the need to be able to apply machine learning to real world data sets in a way that promotes a “what if?…” or exploratory approach. Each machine learning algorithm implementation requires the data to be present in its own format, and has its own way of specifying parameters and output. The WEKA system was designed to bring a range of machine learning techniques or schemes under a common interface so that they may be easily applied to this data in a consistent method. This interface should be flexible enough to encourage the addition of new schemes, and simple enough that users need only concern themselves with the selection of features in the data for analysis and what the output means, rather than how to use a machine learning scheme. The WEKA system has been used over the past year to work with a variety of agricultural data sets in order to try to answer a range of questions.
    [Show full text]
  • Artificial Intelligence for the Prediction of Exhaust Back Pressure Effect On
    applied sciences Article Artificial Intelligence for the Prediction of Exhaust Back Pressure Effect on the Performance of Diesel Engines Vlad Fernoaga 1 , Venetia Sandu 2,* and Titus Balan 1 1 Faculty of Electrical Engineering and Computer Science, Transilvania University, Brasov 500036, Romania; [email protected] (V.F.); [email protected] (T.B.) 2 Faculty of Mechanical Engineering, Transilvania University, Brasov 500036, Romania * Correspondence: [email protected]; Tel.: +40-268-474761 Received: 11 September 2020; Accepted: 15 October 2020; Published: 21 October 2020 Featured Application: This paper presents solutions for smart devices with embedded artificial intelligence dedicated to vehicular applications. Virtual sensors based on neural networks or regressors provide real-time predictions on diesel engine power loss caused by increased exhaust back pressure. Abstract: The actual trade-off among engine emissions and performance requires detailed investigations into exhaust system configurations. Correlations among engine data acquired by sensors are susceptible to artificial intelligence (AI)-driven performance assessment. The influence of exhaust back pressure (EBP) on engine performance, mainly on effective power, was investigated on a turbocharged diesel engine tested on an instrumented dynamometric test-bench. The EBP was externally applied at steady state operation modes defined by speed and load. A complete dataset was collected to supply the statistical analysis and machine learning phases—the training and testing of all the AI solutions developed in order to predict the effective power. By extending the cloud-/edge-computing model with the cloud AI/edge AI paradigm, comprehensive research was conducted on the algorithms and software frameworks most suited to vehicular smart devices.
    [Show full text]
  • Statistical Software
    Statistical Software A. Grant Schissler1;2;∗ Hung Nguyen1;3 Tin Nguyen1;3 Juli Petereit1;4 Vincent Gardeux5 Keywords: statistical software, open source, Big Data, visualization, parallel computing, machine learning, Bayesian modeling Abstract Abstract: This article discusses selected statistical software, aiming to help readers find the right tool for their needs. We categorize software into three classes: Statisti- cal Packages, Analysis Packages with Statistical Libraries, and General Programming Languages with Statistical Libraries. Statistical and analysis packages often provide interactive, easy-to-use workflows while general programming languages are built for speed and optimization. We emphasize each software's defining characteristics and discuss trends in popularity. The concluding sections muse on the future of statistical software including the impact of Big Data and the advantages of open-source languages. This article discusses selected statistical software, aiming to help readers find the right tool for their needs (not provide an exhaustive list). Also, we acknowledge our experiences bias the discussion towards software employed in scholarly work. Throughout, we emphasize the software's capacity to analyze large, complex data sets (\Big Data"). The concluding sections muse on the future of statistical software. To aid in the discussion, we classify software into three groups: (1) Statistical Packages, (2) Analysis Packages with Statistical Libraries, and (3) General Programming Languages with Statistical Libraries. This structure
    [Show full text]
  • MLJ: a Julia Package for Composable Machine Learning
    MLJ: A Julia package for composable machine learning Anthony D. Blaom1, 2, 3, Franz Kiraly3, 4, Thibaut Lienart3, Yiannis Simillides7, Diego Arenas6, and Sebastian J. Vollmer3, 5 1 University of Auckland, New Zealand 2 New Zealand eScience Infrastructure, New Zealand 3 Alan Turing Institute, London, United Kingdom 4 University College London, United Kingdom 5 University of Warwick, United Kingdom 6 University of St Andrews, St Andrews, United Kingdom 7 Imperial College London, United Kingdom DOI: 10.21105/joss.02704 Software • Review Introduction • Repository • Archive Statistical modeling, and the building of complex modeling pipelines, is a cornerstone of modern data science. Most experienced data scientists rely on high-level open source modeling Editor: Yuan Tang toolboxes - such as sckit-learn (Buitinck et al., 2013; Pedregosa et al., 2011) (Python); Weka (Holmes et al., 1994) (Java); mlr (Bischl et al., 2016) and caret (Kuhn, 2008) (R) - for quick Reviewers: blueprinting, testing, and creation of deployment-ready models. They do this by providing a • @degleris1 common interface to atomic components, from an ever-growing model zoo, and by providing • @henrykironde the means to incorporate these into complex work-flows. Practitioners are wanting to build increasingly sophisticated composite models, as exemplified in the strategies of top contestants Submitted: 21 July 2020 in machine learning competitions such as Kaggle. Published: 07 November 2020 MLJ (Machine Learning in Julia) (A. Blaom, 2020b) is a toolbox written in Julia that pro- License Authors of papers retain vides a common interface and meta-algorithms for selecting, tuning, evaluating, composing copyright and release the work and comparing machine model implementations written in Julia and other languages.
    [Show full text]
  • A Formal Methodology for the Comparison of Results from Different Software Packages
    A formal methodology for the comparison of results from different software packages Alexander Zlotnik Department of Electronic Engineering. Technical University of Madrid. & Ramón y Cajal University Hospital Structure of Presentation Why? Is the output really different? Is the algorithm documented? Closed source vs open source An example Summary Multi-core performance Why? Multidisciplinary and multi-location research is becoming the norm for high-impact scientific publications. Sometimes one part of your team uses one software package (R, Python, MATLAB, Mathematica, etc) and the other part uses a different one (Stata, SAS, SPSS, etc). After a lot of frustrating testing, you realize that you are getting different results with the same operations and the same datasets. Why? In other situations, you simply want to develop your own functionality for an existing software package or a full independent software package with some statistical functionality. Since you know Stata well, you decide to compare the output of your program with the one from Stata and guess what …? Source: elance.com This seems doable, but there are likely to be many hidden surprises when comparing results for both implementationsc Structure of Presentation Why? Is the output really different? Is the algorithm documented? Are the same numerical methods used? Closed source vs open source An example Summary Multi-core performance Is the output really different? Are you using the same dataset? Is the data converted in the same way? Are there any issues with numerical precision? Is the output really different? Are you using the same dataset? Sometimes someone changes the dataset but doesn’t rename it.
    [Show full text]
  • Statistický Software
    Statistický software 1 Software AcaStat GAUSS MRDCL RATS StatsDirect ADaMSoft GAUSS NCSS RKWard[4] Statistix Analyse-it GenStat OpenEpi SalStat SYSTAT The ASReml GoldenHelix Origin SAS Unscrambler Oxprogramming Auguri gretl language SOCR UNISTAT BioStat JMP OxMetrics Stata VisualStat BrightStat MacAnova Origin Statgraphics Winpepi Dataplot Mathematica Partek STATISTICA WinSPC EasyReg Matlab Primer StatIt XLStat EpiInfo MedCalc PSPP StatPlus XploRe EViews modelQED R SPlus Excel Minitab R Commander[4] SPSS 2 Data miningovýsoftware n Cca 20 až30 dodavatelů n Hlavníhráči na trhu: n Clementine, IBM SPSS Modeler n IBM’s Intelligent Miner, (PASW Modeler) n SGI’sMineSet, n SAS’s Enterprise Miner. n Řada vestavěných produktů: n fraud detection: n electronic commerce applications, n health care, n customer relationship management 3 Software-SAS n : www.sas.com 4 SAS n Společnost SAS Institute n Vznik 1976 v univerzitním prostředí n Dnes:největšísoukromásoftwarováspolečnost na světě (více než11.000 zaměstnanců) n přes 45.000 instalací n cca 9 milionů uživatelů ve 118 zemích n v USA okolo 1.000 akademických zákazníků (SAS používávětšina vyšších a vysokých škol a výzkumných pracovišť) 5 SAS 6 SAS 7 SAS q Statistickáanalýza: Ø Popisnástatistika Ø Analýza kontingenčních (frekvenčních) tabulek Ø Regresní, korelační, kovariančníanalýza Ø Logistickáregrese Ø Analýza rozptylu Ø Testováníhypotéz Ø Diskriminačníanalýza Ø Shlukováanalýza Ø Analýza přežití Ø … 8 SAS q Analýza časových řad: Ø Regresnímodely Ø Modely se sezónními faktory Ø Autoregresnímodely Ø
    [Show full text]
  • The R Project for Statistical Computing a Free Software Environment For
    The R Project for Statistical Computing A free software environment for statistical computing and graphics that runs on a wide variety of UNIX platforms, Windows and MacOS OpenStat OpenStat is a general-purpose statistics package that you can download and install for free. It was originally written as an aid in the teaching of statistics to students enrolled in a social science program. It has been expanded to provide procedures useful in a wide variety of disciplines. It has a similar interface to SPSS SOFA A basic, user-friendly, open-source statistics, analysis, and reporting package PSPP PSPP is a program for statistical analysis of sampled data. It is a free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions TANAGRA A free, open-source, easy to use data-mining package PAST PAST is a package created with the palaeontologist in mind but has been adopted by users in other disciplines. It’s easy to use and includes a large selection of common statistical, plotting and modelling functions AnSWR AnSWR is a software system for coordinating and conducting large-scale, team-based analysis projects that integrate qualitative and quantitative techniques MIX An Excel-based tool for meta-analysis Free Statistical Software This page links to free software packages that you can download and install on your computer from StatPages.org Free Statistical Software This page links to free software packages that you can download and install on your computer from freestatistics.info Free Software Information and links from the Resources for Methods in Evaluation and Social Research site You can sort the table below by clicking on the column names.
    [Show full text]
  • Cumulation of Poverty Measures: the Theory Beyond It, Possible Applications and Software Developed
    Cumulation of Poverty measures: the theory beyond it, possible applications and software developed (Francesca Gagliardi and Giulio Tarditi) Siena, October 6th , 2010 1 Context and scope Reliable indicators of poverty and social exclusion are an essential monitoring tool. In the EU-wide context, these indicators are most useful when they are comparable across countries and over time. Furthermore, policy research and application require statistics disaggregated to increasingly lower levels and smaller subpopulations. Direct, one-time estimates from surveys designed primarily to meet national needs tend to be insufficiently precise for meeting these new policy needs. This is particularly true in the domain of poverty and social exclusion, the monitoring of which requires complex distributional statistics – statistics necessarily based on intensive and relatively small- scale surveys of households and persons. This work addresses some statistical aspects relating to improving the sampling precision of such indicators in EU countries, in particular through the cumulation of data over rounds of regularly repeated national surveys. 2 EU-SILC The reference data for this purpose are EU Statistics on Income and Living Conditions, the major source of comparative statistics on income and living conditions in Europe. A standard integrated design has been adopted by nearly all EU countries. It involves a rotational panel, with a new sample of households and persons introduced each year to replace one-fourth of the existing sample. Persons enumerated in each new sample are followed-up in the survey for four years. The design yields each year a cross- sectional sample, as well as longitudinal samples of 2, 3 and 4 year duration.
    [Show full text]