Result Analysis: Statistical Distributions: LRE Algorithm and Binomial Proportion Cis
Total Page:16
File Type:pdf, Size:1020Kb
Result Confidence Assessment and Trial Planning Maciej Muehleisen Ericsson (ERI) [email protected] 29th of March, 5GCroCo Lunchtime Web-Seminar 1 (Hosted by 5G-PPP) 2 5G Cross Border Control Innovation Action H2020-ICT-18-2018 Contract 825050 Cooperative, Connected and Autonomous Mobility (CCAM) a 5G-PPP Phase III Project Before we Start… • This presentation is being recorded and recording will be shared • Slides will be shared • MATLAB scripts for presented methods are listed at the end Outline • About Me • Background: 5GCroCo Deliverable D4.2 • Thoughts about Terms: PoC vs. Demo vs. Test vs. Trial • Use Case: Anticipated Cooperative Collision Avoidance (ACCA) • Trial Execution • How to determine if KPI requirements are achieved? • How to assure “identical” experiments • Result analysis • Plausibility Checks • Student-t Confidence Intervals (CIs) using Batch Means Method • Statistical Distributions: Limited Relative Error (LRE) Algorithm and Binomial Proportion CIs • Summary and Conclusion About Me Key research interest: Modelling, design, evaluation, and certification of highly reliable / safety critical communication systems • 2008 – 2012 ComNets - RWTH Aachen University • Leading Communication Networks I Exercise • 4G IMT-Advanced Evaluation • Open Wireless Network Simulator developer • PhD research on “VoIP Performance of LTE Networks: VoLTE versus OTT” (2015) • 2012 – 2016 ComNets - Hamburg University of Technology (TUHH) • Lecturer Communication Networks I • Group leader “Mobile & Vehicular Communication” (focus on aviation, maritime) • Sometimes acting group leader for “Sensor Networks and IoT” & “Future Internet and Network Planning ” • Since 2017 Ericsson Research Germany • Research Area “Networks” – Master Researcher - Industry Verticals Coordination (focus on automotive) • Coordination of tech. work in external associations (5GAA, AECC, ETSI-ITS) and projects (5GCroCo, 5GMOBIX, 5G-ROUTES, ART-04 SHOW) • Deputy Technical Coordinator 5GCroCo 5GCroCo Deliverable D4.2 • For first version (v1.0) of 5GCroCo result Deliverable D4.2, many experiments did not go as expected • Many trials started later due to COVID and no time to repeat • Often too few samples collected and/or measurement equipment failed (e.g. clock drift) • v2.0 end of March; v3.0 June • Second trial round starting in summer and lasting until end of December Link Thoughts about Terms Proof of Concept (PoC) & demo are usually the same, but sometimes the following applies: • Demo: physically show the actual service/product; shortcuts can exist, e.g. Ethernet instead of 4G/5G • PoC: can be same as demo; in ICT context it often requires that the actual communication system is used to prove it is capable to deliver the service In 5GCroCo we strictly distinguish between tests and trials: • Test: one or more components, or even all components, are evaluated if they behave as expected; can be by inspection or quantitative (measurement); typically done before setting up demo, PoC or trial • Trial: quantitative evaluation of the system by measuring previously defined KPIs in defined scenarios • KPIs are usually on application/service level • Further measurements can be collected to better understand/explain measured KPIs (“PIs”) • Quantitative test results (see bullet above) can also be used for better understanding of measured KPIs Use Case: Anticipated Cooperative Collision Avoidance (ACCA) • User Story 1: “stationary vehicle” broke down and sends a “Hazard Report” to the backend • User Story 2: no “Hazard Report” but backend Link to video: analyzes Cooperative Awareness Messages (CAMs) to detect the “stationary vehicle” and https://www.youtube.com/watch?v=jehWj4sq9Zc send “Hazard Notifications” • User Story 3: Backend analyzes CAMs to detect traffic jam and send “Hazard Notifications” • Round 2 (second half 2021): Detection by vehicle sensors Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level Reliability: ≥99% • Too late ➔ lost (1 s delay allowed) Hazard ACCA Backend Notification Hazard Report JSON over MQTT (Renault uses for PSA ETSI-ITS DENM over UDP) Hazard Notification DENM over UDP for Renault Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level Reliability: ≥99% • Too late ➔ lost (1 s delay allowed) ACCA Backend • Counting Hazard Report intended (Renault uses receivers, e.g. ETSI-ITS DENM • FailureCount += over UDP) 3 if Hazard Report lost Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level Reliability: ≥99% • Too late ➔ lost (1 s delay allowed) Hazard ACCA Backend Notification • Counting Hazard Report JSON over MQTT intended (Renault uses for PSA receivers, e.g. ETSI-ITS DENM over UDP) Hazard • FailureCount += Notification 3 if Hazard DENM Report lost over UDP • FailureCount += for Renault 1 for every lost Hazard Notification Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level Reliability: ≥99% • How many “Hazard Reports” do we need to confidently determine the KPI? • How long will it take? • Can we speed it up? Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level ●Answers: Reliability: ≥99% • How many “Hazard ●100 – 1000 “events” as a rule of Reports” do we need thumb: to confidently 1 / 100 = 99% ➔ determine the KPI? (100 to 1000) / (10000 to 100000) = 99% • How long will it take? • Can we speed it up? Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level ●Answers: Reliability: ≥99% • How many “Hazard ●100 – 1000 “events” as a rule of Reports” do we need thumb: to confidently 1 / 100 = 99% ➔ determine the KPI? (100 to 1000) / (10000 to 100000) = 99% • How long will it ●10000 to 100000 “hazards” if one take? vehicle receives • Can we speed it up? Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level ●Answers: Reliability: ≥99% • How many “Hazard ●100 – 1000 “events” as a rule of Reports” do we need thumb: to confidently 1 / 100 = 99% ➔ determine the KPI? (100 to 1000) / (10000 to 100000) = 99% • How long will it ●10000 to 100000 “hazards” if one take? vehicle receives • Can we speed it up? ●Yes, but watch out for pitfalls Trial Execution: How to Determine if KPI Requirements are Achieved? Tipps and precautions when “tricking time”: Influence of power saving (DRX) • The exponential distribution models “random occurrence” well • It prevents unintended correlations from periodicities, e.g.: • Transmission Time Interval (TTI) slot boundaries • Time Division Duplex (TDD) frame durations • Gradually decrease the mean interarrival time to check if results remain the same • Control Plane time outs • Network overload How to Assure “Identical” Experiments • In simulation everything can be the same except from random number initialization • In real-world trials you can just try your best, esp. on public roads • Static test in perfect radio conditions before drive testing • Ping and Iperf tests for preparation • Repeat the experiments with changing parameters (e.g. 4G vs. 5G) directly one after another • Keep antenna placement identical • Adjust trial duration and path to the „largest time- scale effect“ (usually radio channel quality from distance) • Think about what impacts the KPIs and create according scenarios (see 5GCroCo Deliverable D4.1 Section 6.3 and D4.2 „Influence on KPIs“ sections for each use case) Result Analysis: Plausibility Checks • Check for known maxima and ? minima: • Maximum throughput according to iperf/nuttcp and/or known Spectral Efficiency • Minimum latency / round trip time according to Medium Access Control (MAC) protocol and RAN config Result Analysis: Plausibility Checks • First check time series (raw samples) before doing statistical analysis; consider publishing them • Check timestamps of all nodes for monotonous increase Result Analysis: Plausibility Checks # Confidence • # Samples CI / Do not consider erroneous samples User Story Test Case # Samples Mean [ms] Interval CI [ms] Considered Mean in analysis but explain why/how you (CI) censored Batches PSA➔PSA 444 230 638.0 5 ± 12.6 2.0 % PSA➔RSA 130 115 633.1 5 ± 6.9 1.1 % 1 RSA➔RSA 167 155 625.7 5 ± 4.3 0.7 % RSA➔PSA 204 95 608.9 5 ± 13.8 2.3 % TMS➔PSA 894 655 474.6 10 ± 5.7 1.2 % 3 TMS➔RSA 243 195 41.8 10 ± 1.7 4.1 % D4.2 v1.0 Table 3-26: Application Level Latency Result Analysis: Plausibility Checks # Confidence • # Samples CI / Do not consider erroneous samples User Story Test Case # Samples Mean [ms] Interval CI [ms] Considered Mean in analysis but explain why/how you (CI) censored Batches PSA➔PSA 444 230 638.0 5 ± 12.6 2.0 % • Or just let go of it and repeat the PSA➔RSA 130 115 633.1 5 ± 6.9 1.1 % 1 trial RSA➔RSA 167 155 625.7 5 ± 4.3 0.7 % RSA➔PSA 204 95 608.9 5 ± 13.8 2.3 % • „Anything that can go wrong, will go TMS➔PSA 894 655 474.6 10 ± 5.7 1.2 % 3 wrong“, Edward A. Murphy, TMS➔RSA 243 195 41.8 10 ± 1.7 4.1 % Aerospace Engineer D4.2 v1.0 Table 3-26: Application Level Latency User Story Test Case # # Samples Mean # CI CI / Max. LRE ➔ Determine required time for Samples Considered [ms] Confidence [ms] Mean Confidence trials and take it time 3 Interval (CI) with Rel. • Batches Error Below Try 1: get familiar with each 5% [ms] / other and the equipment Percentile • Try 2: Good results, but some PSA1➔PSA1 4445 Due to several problems, summarized in Section 4.3 together with 1 PSA2➔PSA2 444 solutions that are being applied, these results cannot be processed failed experiments or filtered to allow a sensible analysis •