Result Analysis: Statistical Distributions: LRE Algorithm and Binomial Proportion Cis

Result Confidence Assessment and Trial Planning Maciej Muehleisen Ericsson (ERI) [email protected] 29th of March, 5GCroCo Lunchtime Web-Seminar 1 (Hosted by 5G-PPP) 2 5G Cross Border Control Innovation Action H2020-ICT-18-2018 Contract 825050 Cooperative, Connected and Autonomous Mobility (CCAM) a 5G-PPP Phase III Project Before we Start… • This presentation is being recorded and recording will be shared • Slides will be shared • MATLAB scripts for presented methods are listed at the end Outline • About Me • Background: 5GCroCo Deliverable D4.2 • Thoughts about Terms: PoC vs. Demo vs. Test vs. Trial • Use Case: Anticipated Cooperative Collision Avoidance (ACCA) • Trial Execution • How to determine if KPI requirements are achieved? • How to assure “identical” experiments • Result analysis • Plausibility Checks • Student-t Confidence Intervals (CIs) using Batch Means Method • Statistical Distributions: Limited Relative Error (LRE) Algorithm and Binomial Proportion CIs • Summary and Conclusion About Me Key research interest: Modelling, design, evaluation, and certification of highly reliable / safety critical communication systems • 2008 – 2012 ComNets - RWTH Aachen University • Leading Communication Networks I Exercise • 4G IMT-Advanced Evaluation • Open Wireless Network Simulator developer • PhD research on “VoIP Performance of LTE Networks: VoLTE versus OTT” (2015) • 2012 – 2016 ComNets - Hamburg University of Technology (TUHH) • Lecturer Communication Networks I • Group leader “Mobile & Vehicular Communication” (focus on aviation, maritime) • Sometimes acting group leader for “Sensor Networks and IoT” & “Future Internet and Network Planning ” • Since 2017 Ericsson Research Germany • Research Area “Networks” – Master Researcher - Industry Verticals Coordination (focus on automotive) • Coordination of tech. work in external associations (5GAA, AECC, ETSI-ITS) and projects (5GCroCo, 5GMOBIX, 5G-ROUTES, ART-04 SHOW) • Deputy Technical Coordinator 5GCroCo 5GCroCo Deliverable D4.2 • For first version (v1.0) of 5GCroCo result Deliverable D4.2, many experiments did not go as expected • Many trials started later due to COVID and no time to repeat • Often too few samples collected and/or measurement equipment failed (e.g. clock drift) • v2.0 end of March; v3.0 June • Second trial round starting in summer and lasting until end of December Link Thoughts about Terms Proof of Concept (PoC) & demo are usually the same, but sometimes the following applies: • Demo: physically show the actual service/product; shortcuts can exist, e.g. Ethernet instead of 4G/5G • PoC: can be same as demo; in ICT context it often requires that the actual communication system is used to prove it is capable to deliver the service In 5GCroCo we strictly distinguish between tests and trials: • Test: one or more components, or even all components, are evaluated if they behave as expected; can be by inspection or quantitative (measurement); typically done before setting up demo, PoC or trial • Trial: quantitative evaluation of the system by measuring previously defined KPIs in defined scenarios • KPIs are usually on application/service level • Further measurements can be collected to better understand/explain measured KPIs (“PIs”) • Quantitative test results (see bullet above) can also be used for better understanding of measured KPIs Use Case: Anticipated Cooperative Collision Avoidance (ACCA) • User Story 1: “stationary vehicle” broke down and sends a “Hazard Report” to the backend • User Story 2: no “Hazard Report” but backend Link to video: analyzes Cooperative Awareness Messages (CAMs) to detect the “stationary vehicle” and https://www.youtube.com/watch?v=jehWj4sq9Zc send “Hazard Notifications” • User Story 3: Backend analyzes CAMs to detect traffic jam and send “Hazard Notifications” • Round 2 (second half 2021): Detection by vehicle sensors Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level Reliability: ≥99% • Too late ➔ lost (1 s delay allowed) Hazard ACCA Backend Notification Hazard Report JSON over MQTT (Renault uses for PSA ETSI-ITS DENM over UDP) Hazard Notification DENM over UDP for Renault Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level Reliability: ≥99% • Too late ➔ lost (1 s delay allowed) ACCA Backend • Counting Hazard Report intended (Renault uses receivers, e.g. ETSI-ITS DENM • FailureCount += over UDP) 3 if Hazard Report lost Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level Reliability: ≥99% • Too late ➔ lost (1 s delay allowed) Hazard ACCA Backend Notification • Counting Hazard Report JSON over MQTT intended (Renault uses for PSA receivers, e.g. ETSI-ITS DENM over UDP) Hazard • FailureCount += Notification 3 if Hazard DENM Report lost over UDP • FailureCount += for Renault 1 for every lost Hazard Notification Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level Reliability: ≥99% • How many “Hazard Reports” do we need to confidently determine the KPI? • How long will it take? • Can we speed it up? Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level ●Answers: Reliability: ≥99% • How many “Hazard ●100 – 1000 “events” as a rule of Reports” do we need thumb: to confidently 1 / 100 = 99% ➔ determine the KPI? (100 to 1000) / (10000 to 100000) = 99% • How long will it take? • Can we speed it up? Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level ●Answers: Reliability: ≥99% • How many “Hazard ●100 – 1000 “events” as a rule of Reports” do we need thumb: to confidently 1 / 100 = 99% ➔ determine the KPI? (100 to 1000) / (10000 to 100000) = 99% • How long will it ●10000 to 100000 “hazards” if one take? vehicle receives • Can we speed it up? Trial Execution: How to Determine if KPI Requirements are Achieved? • Application Level ●Answers: Reliability: ≥99% • How many “Hazard ●100 – 1000 “events” as a rule of Reports” do we need thumb: to confidently 1 / 100 = 99% ➔ determine the KPI? (100 to 1000) / (10000 to 100000) = 99% • How long will it ●10000 to 100000 “hazards” if one take? vehicle receives • Can we speed it up? ●Yes, but watch out for pitfalls Trial Execution: How to Determine if KPI Requirements are Achieved? Tipps and precautions when “tricking time”: Influence of power saving (DRX) • The exponential distribution models “random occurrence” well • It prevents unintended correlations from periodicities, e.g.: • Transmission Time Interval (TTI) slot boundaries • Time Division Duplex (TDD) frame durations • Gradually decrease the mean interarrival time to check if results remain the same • Control Plane time outs • Network overload How to Assure “Identical” Experiments • In simulation everything can be the same except from random number initialization • In real-world trials you can just try your best, esp. on public roads • Static test in perfect radio conditions before drive testing • Ping and Iperf tests for preparation • Repeat the experiments with changing parameters (e.g. 4G vs. 5G) directly one after another • Keep antenna placement identical • Adjust trial duration and path to the „largest time- scale effect“ (usually radio channel quality from distance) • Think about what impacts the KPIs and create according scenarios (see 5GCroCo Deliverable D4.1 Section 6.3 and D4.2 „Influence on KPIs“ sections for each use case) Result Analysis: Plausibility Checks • Check for known maxima and ? minima: • Maximum throughput according to iperf/nuttcp and/or known Spectral Efficiency • Minimum latency / round trip time according to Medium Access Control (MAC) protocol and RAN config Result Analysis: Plausibility Checks • First check time series (raw samples) before doing statistical analysis; consider publishing them • Check timestamps of all nodes for monotonous increase Result Analysis: Plausibility Checks # Confidence • # Samples CI / Do not consider erroneous samples User Story Test Case # Samples Mean [ms] Interval CI [ms] Considered Mean in analysis but explain why/how you (CI) censored Batches PSA➔PSA 444 230 638.0 5 ± 12.6 2.0 % PSA➔RSA 130 115 633.1 5 ± 6.9 1.1 % 1 RSA➔RSA 167 155 625.7 5 ± 4.3 0.7 % RSA➔PSA 204 95 608.9 5 ± 13.8 2.3 % TMS➔PSA 894 655 474.6 10 ± 5.7 1.2 % 3 TMS➔RSA 243 195 41.8 10 ± 1.7 4.1 % D4.2 v1.0 Table 3-26: Application Level Latency Result Analysis: Plausibility Checks # Confidence • # Samples CI / Do not consider erroneous samples User Story Test Case # Samples Mean [ms] Interval CI [ms] Considered Mean in analysis but explain why/how you (CI) censored Batches PSA➔PSA 444 230 638.0 5 ± 12.6 2.0 % • Or just let go of it and repeat the PSA➔RSA 130 115 633.1 5 ± 6.9 1.1 % 1 trial RSA➔RSA 167 155 625.7 5 ± 4.3 0.7 % RSA➔PSA 204 95 608.9 5 ± 13.8 2.3 % • „Anything that can go wrong, will go TMS➔PSA 894 655 474.6 10 ± 5.7 1.2 % 3 wrong“, Edward A. Murphy, TMS➔RSA 243 195 41.8 10 ± 1.7 4.1 % Aerospace Engineer D4.2 v1.0 Table 3-26: Application Level Latency User Story Test Case # # Samples Mean # CI CI / Max. LRE ➔ Determine required time for Samples Considered [ms] Confidence [ms] Mean Confidence trials and take it time 3 Interval (CI) with Rel. • Batches Error Below Try 1: get familiar with each 5% [ms] / other and the equipment Percentile • Try 2: Good results, but some PSA1➔PSA1 4445 Due to several problems, summarized in Section 4.3 together with 1 PSA2➔PSA2 444 solutions that are being applied, these results cannot be processed failed experiments or filtered to allow a sensible analysis •

Result Analysis: Statistical Distributions: LRE Algorithm and Binomial Proportion Cis

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support