Definition and Schedule of MB-NG Tasks

MB-NG Project Document

Document title MB-NG Technical Goals ( “Experiments” ) for QoS and Managed Bandwidth

Relevant TASK (if any) TASK-1 Document number MB-NG-Doc-T1-1.3-TechnicalGoals Date: 30-July-2002 Status Version 1.3 FINAL Description This document defines the high level technical goals of the project in the QoS and Managed Bandwidth sector

The purpose is to specify and complete enough to allow the detailed work of all other tasks to be carried out (TASK-2 onward).

It is the “output” of Task-1 as defined in the project task definition document.

A similar addendum document will be prepared for the high throughput programme. Co-ordinating Editors P.Clarke, J.Sharp Change Record 17-6-02: Revise Draft in light of comments from J.Sharp, J,Crowcroft, R.Tasker 05-07-02 F. Saka, A Di Donato, Y. Li 06-07-02 P.Clarke re-vamps 08-07-02 Modifications by R.Tasker 30-07-02 Final consolidation of additions from J.Sharp

1 Introduction

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 1 This document forms the output of Task-1:

Its purpose it to describe a baseline set of experiments which MB-NG will perform. These descriptions should be specific enough to allow subsequent tasks to identify all necessary “components” and define and implement technical solutions for those “components”. The descriptions should not be so specific as to unduly constrain or pre-empt technical choices (the distinction is of course a bit arbitrary, and will therefore inevitably lead to some ambiguity). A good gauge is that these descriptions should be at about the level one might include in some future high profile presentation about what the project has done and why it has been a success.

With this in mind the document contains two main sections:

 Section 2 contains pre-amble and generalities.

 Section 3 contains the main thrust of the document, attempting to describe the “experiments” as envisioned.

2 Pre-Amble

This section contains some (pretty obvious) pre-amble, just to remind us all of what was written in the original proposal and choices made since.

2.1 Reminder of the “High Level Goals”

The following is based upon the description of high level goals listed in the original proposal. It is somewhat expanded to provide a little more detail to correspond to the position we are starting from now.

The “mission statement” of MB-NG is:

1.To demonstrate e2e managed bandwidth services in a multi-domain environment, in the context of Grid project requirements. Also implicitly to determine how well QoS works and to develop suitable policies and potentially separate service level specifications for specific applications

2. To investigate and develop high performance data transport mechanisms for Grid data transfer across heterogeneous networks.

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 2 Both the e2e and multi-domain are crucial – whatever managed bandwidth model is used, it has to be shown to work between components of Grid application software, and address the issues which would arise if different administrative domains would attempt to configure inter-domain services according to SLSs. Managed bandwidth in this context is the provision of protected bandwidth channels between defined endpoints.

In more detail the principal goals are:

- to demonstrate configuration of managed bandwidth (MB) services separately within each of several independent administrative domains ;

- to demonstrate interoperation of MB between multiple administrative domains simulating a Core network and local site networks;

- To consider the policy issues associated with the provision of MB and in particular define a Service Level Specification for MB within each domain and a end to end SLA within the JANET context. It is envisaged that SLS’s and an SLA will be defined at the start of the experimentation and will be revised as appropriate following practical experience.

- to demonstrate use of MB services by Grid applications traffic, i.e. using real Grid high and low priority traffic classes;

- to interface MB services to Grid middleware APIs in some way to be specified (i.e. some notification from API to at least ingress);

- to provide a vehicle for UKERNA to gain experience of the use of MPLS for traffic engineering within SuperJANET;

- to allow UKERNA to implement a mechanism to effectively replace the previous ATM managed bandwidth service;

- where possible to demonstrate end-to-end network services to the USA in collaboration with US Grid groups;

- where possible to demonstrate end-to-end network services to CERN as part of our EU-DataGrid commitments.

The actual “deliverables” derived from the original proposal are: UCL

For end-to-end QoS and MPLS configuration in the Core: - Month 3: Procurement and installation of equipment

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 3 - Month 12: Initial demonstration of end-to-end guaranteed bandwidth and QoS. Interim report. Presentation of results at network venues. - Month 12: Final report on use of MPLS in the core. - Month 18: Advanced demonstration including use of other QoS techniques. - Month 24: Final report. Presentation of results at networking venues.

For UK<=> US and UK <=> CERN test - Month 3: Investigation into partnerships. - Month 12: Interim report on progress and tests made to date. - Month 24: Final report.

For high throughput - Month 9: Demonstration of reliable transport at > 100 Mbit/s over WAN. - Month 18: Demonstration of reliable transport at > 1 Gbit/s.

We may wish to vary some of these deliverable - with justification - in the light of detailed planning we are now doing. However we should otherwise assume to stick to them in general terms insofar as possible, and so tasks leaders should explicitly figure these deliverables in during the work which is about to take place to define tasks in detail.

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 4 Figure 1 The MB-NG Infrastructure

2.2 Infrastructure and basic technology choices

The MB-NG infrastructure is shown in figure 1. It shows a “Core” network and three “Access” networks. These are labelled as AD(C) – for Administrative domain-Core and AD(M) for Administrative domain–MAN.

Having said that this document shouldn’t constrain detailed technology choices, we have nevertheless already de-facto chosen to base MB-NG upon the use of some underlying technology. These are listed here just to set the context for the main section:

 Diffserv based QoS , and in particular the use of EF PHB as a baseline (AF may be used if we so decide later)  QBSS “Scavenger service” less than best efforts PHB  The use of MPLS in the core network where appropriate, and in the Access networks if time permits.  The TCP/IP protocol suite.

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 5  The capabilities of the CISCO routing equipment which has been specified.

2.3 General approach to all experiments We must clearly illustrate the problems we are aiming to solve, i.e. illustrate the performance of our chosen “applications” in a best effort environment. The experimental programme will be designed to provide further insight and hopefully solutions to these problems.

With respect to Table 1, this is with No QoS selected in both the access and the core networks. These experiments must be performed with varying background traffic (load, number of streams, traffic pattern). The definition of background traffic should be made in Task 2.

Given the mechanisms for QoS provisioning we will be using are predefined and more importantly widely accepted, then focus of our work will be an assessment of how well these mechanisms work and discovering efficient configurations (congestion control, queue management techniques, scheduling techniques and monitoring and policing) for deployment. On the safe assumption that QoS and traffic engineering mechanisms do what they are designed to do, then our goals will have been achieved when we know the conditions under which QoS and traffic engineering works effectively from end-to-end, and how does it help?

In general the approach will be to:

 Reveal the problem with best effort applied to all traffic classes using simple synthesised (background and emulated application) traffic. What are the best and worst case conditions?  Demonstrate with the use of simple synthesised traffic that there is an unambiguous improvement in the QoS enabled network over the best effort network.  Demonstrate with realistically synthesised background and probe traffic.  Demonstrate with real application traffic. Contrast and compare with best effort model. (A description of possible types of HEP traffic is given in an appendix)

The experiments should be carried out using different policy configurations, and background traffic and should be run between different end-points in different combinations.

As example Table 1 illustrates the parts of the network and the type of QoS/traffic engineering combinations we could decide to run.

Phase End host Access Network Core Network 1 BE BE BE 2 BE IP QoS BE

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 6 3 BE IP QoS IP QoS 4 BE IP QoS MPLS QoS 6 BE MPLS QoS MPLS QoS Table 1 The different QoS enabling techniques and where in the network they could be implemented.

It is left to the detailed task’s planning to justify which combinations are sensible and worth investigating.

2.4 “Traffic classes”

Specific traffic classes which we can foresee using in this project depend upon the application. They are also enumerated in general in the QoS think tank report. www.ja.net/development/qos/ which maps traffic classes into service classes.

Within MB-NG we will perform most experiments using concurrent traffic classes which are appropriate for:  a low loss and delay service,  a best efforts service, and  a less than best efforts service.

For experiments using only simulated traffic, than one of the first technical tasks will be to define simulated traffic which is appropriate to test these three services.

For experiments with real applications then we will need to select application traffic classes suitable for each service. Some possible real HEP traffic classes are shown in the appendix.

In both cases we will be looking for traffic classes with different requirements upon:

 Delay: Some applications may be tolerant to delays and others may have strict bounds on delays.  Jitter: Some applications have strict minimum jitter requirements. Jitter is generally alleviated by buffering  Loss: Some applications may be tolerant to loss and other may use recovery mechanism to help deal with any losses by the network. Other applications may be simply intolerant to any loss (Signalling and network configuration data).  Throughput: Applications differ widely in their requirements of end-to-end throughputs. These range from short bursts of low throughput traffic to long and sustained bursts of traffic.

3 Experiments

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 7 This section is meant to be the main focus of the document, listing the series of experiments MB-NG will perform.

The intention is to be specific enough to guide detailed work in other tasks, but not be so specific as to unduly constrain implementation/technology.

The purpose of trying to write this all down here is so that we all have some common (agreed) idea of where we are heading over the next two years.

This should be regarded as the baseline. We might decide to vary these later if we have good reason, but such should only be done at a formal meeting, and then recorded appropriately.

Hence until such time the following experiments define the MB-NG project.

3.1 Experiment 0: Component Characterisation

Experiment 0 is perhaps obvious, but it is nevertheless felt to be useful to write it down explicitly to emphasise the importance.

We should perform a systematic study of each hardware and software component as we build up from the very simplest configuration (i.e. back to back PCs) to the full infrastructure with e2e best efforts capability.

:We will measure, calibrate and understand

a. the baseline performance of each end node.

b. the baseline performance of each switch/router within the system.

c. the collective effect observed as components are strung together

3.2 Experiment 1: Basic e2e QoS using simulated traffic

Description:

This experiment will use simulated traffic to demonstrate the effect of QoS mechanisms configured upon the full MB-NG infrastructure. Simulated application traffic will be produced by PCs or specialist equipment connected at the edges of the Access networks. Simulated background traffic may be injected at all sensible points. Various combinations of concurrent traffic classes , end points and QoS configurations will be used in a series of test. The effects on each traffic class will be measured and hence the effect of QoS demonstrated. The measurements should include both network level (i.e. measurements made at nodes, low level metrics) and e2e application level metrics (i.e. some measures of the effect which would be observed by an end user).

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 8 In more detail the experiment will include the following steps:

- Understand some real application traffic profile requirements.

- Define a set of traffic profiles/classes which are suitable to simulate real applications and background flows, and which can be assigned to the different prioritised services.

- Define and implement a classification policy mechanism at ingress and policies etc. between the domains. Note: measurement viewed from o Application o Network level should be deployed, including qualitative visual.

- Develop generation and measurement mechanisms for all traffic.

- Run a baseline “problem” scenario, i.e. with only best efforts configured, chose a traffic generation scheme which induces a “problem” which can be observed and quantified.

- Run all sensible application traffic combinations with different conditions and observe effects. Our parameter space consists of: o End nodes (MAN,RAL,UCL) o Background traffic o QoS Configuration . Congestion control . Queue management techniques . Scheduling . Policing . Monitoring o Symmetric and asymmetric QoS.

- Observe the interplay between traffic flows assigned into different priority classes. Metrics and quantities to be measured to illustrate the achieved QoS will be defined in Task 2. These may include low level network metrics such as packet loss, RTT, re-transmits, but MUST also include metrics more related to the user perception of performance (e.g. time for job to run, some “qualitative” measure of VC performance...)

Results should be expressed quantitatively as well as visually (for presentation and report purposes) .

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 9 3.3 Experiment 2: basic e2e QoS using a HEP application

Description

This experiment is identical to experiment-1, except that we use real HEP application traffic instead of simulated traffic.

The details are the same as for experiment-1 with the addition of

- Identify suitable HEP application traffic for assignment to the different priorities services. See Appendix A.

- Identify where these will be sourced from (presumably existing application specific machines in various places) and perform the necessary negotiation and engineering to connect these sources to the MB-NG infrastructure for scheduled periods.

3.4 Experiment 3: Access to QoS from applications

Description:

This experiment builds upon experiments 1 and 2.

The aim of this experiment is to demonstrate that suitable access to QoS services can be made through middleware, and hence be available at the application level. Exactly what form of, and how, QoS services are accessed cannot be specified until some research is done, and therefore this experiment is cast in general terms. The following is therefore a suggested baseline set of objectives: We will identify (or write) a suitable API. This may be very simple, e.g. it may simply give access to some portion of a resource at the edges, assuming the Access network and Core network are already configured -.i.e. as simple as some TOS bits ?.This will be installed on the end nodes will be tested initially using simulated traffic as per experiment-1. Assuming this is successful we will identify and equip one or more applications to use it and demonstrate it working as per experiment 2.

This experiment may change its form substantially as the relevant task is executed.

In more detail steps are:

- Review existing QoS APIs and determine suitability. Some leads are GARA (already known to CISCO – Volker Sander), AQUILLA (EU industrial project) , the GGF WG and associated document being published, others ???.

- Review our applications and form some initial working decisions on what class of applications should get API access to QoS (i.e at which layer) and

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 10 the level of indirection, i.e. should such applications deal directly with the network (probably not) or through some authorised agent.

- Determine whether any existing software is suitable for an MB-NG demonstration. If not consider whether to develop our own simple code (DataTAG effort can be shared on this – as this has to be done for that project.)

- Deploy and validate one or more systems on the testbed (i.e. just looking at bits set using measuring equipment).

- Using the simulated traffic flows selected for experiment-1, adapt a traffic generator to use the API and demonstrate the expected behaviour.

- Identify a suitable application to modify to use the API.

3.5 Experiment 4: Managed Bandwidth

Description:

This experiment will demonstrate a simple managed bandwidth services (MBS). This is to be understood as developing and demonstrating the mechanisms needed to allow a user to request and be granted access to some specified service through some clear interface. Determining sensible services to be offered are part of the dedicated MBS task, but to be definite the baseline is what it sounds like : something like a protected fixed bandwidth allocation (e.g. “100 Mbit/s with no loss” a là old ATM based pilot service ). This is distinguished from demonstration of the underlying QoS services mainly by the interface presented to the user, whereby the fact that the service may be provisioned internally through aggregate diffserv means is hidden. The experiment will show a user successfully obtaining the allocated service in the presence of other with which it would otherwise compete. This experiment adds value to the previous ones by addressing reservation and allocation mechanisms.

In more detail the experiment includes the following steps:

- Define sensible services to offer users. The default baseline will be a service similar to the old ATM pilot service, i.e. N Mbit/s zero loss.

- Develop a suitable and simple user interface for requesting a service reservation.

- Develop a “process” for sending and receiving the request and causing all necessary negotiations to take place, and then grant permission. [Note: to start with this could be a simple pro-forma email to some designated point, and may notionally involve lightweight human intervention with a turnaround time of a few days. In the longer term the experiment should

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 11 aim to move towards an automated control plane implementation. If it is feasible to implement an API interface then this should be considered].

- Investigate and implement methods of classifying and authenticating individual users traffic at the point of access to the service (interfaces or IP addresses are unlikely to be very useful in practice – user authenticated mechanisms should be investigated).

- Develop a workable procedure to allocate resources as necessary within each domain. (this will presumably mean allocation of aggregate resources within the Core and between domains, and only flow specific at the ingress and egress points of the Access networks).

- Run a baseline “problem” scenario, i.e. with only best efforts configured, chose a traffic scheme which induces a “problem” which can be observed and quantified.

- Enable the MBS provision and demonstrate and quantify the user successfully obtaining the contracted service.

Note: Leads to follow are GARA, AQUILLA,GGF working group and document ???

3.6 Experiment 5: e2e QoS to end points in the US and EU

Description:

This experiment is to investigate and configure end to end capability across multiple domains outside of our control. If connectivity can be achieved we then run a sub-set of the tests outlined earlier.

A fairly simple demonstration will achieve the desired goal, more than this will be icing on the cake.

3.7 Experiment 6: Demonstrate use of QoS by a non HEP Application

Description This is the same as experiment-2, but using a different non-HEP application. We will seek to identify such an application and negotiate use of its traffic at some scheduled time.

Possibilities are (but not discussed yet with relevant people:

AccessGrid

Visualisation

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 12 Reality Grid

Appendix A: Example HEP applications

“Real” HEP GRID traffic classes and characteristics are typified by:

4 BULK: Bulk data replication (large raw datasets): - Up to 100 TBytes (far in future) - Require reliable completion in ~ 1 Week (needs check-pointing) - Often would benefit from multicast based transport application - No other special characteristics on latency or delay.

4.1.1 SDA: Small data set access - Could be up to 100 GBytes - Require reliable completion ~ 8 hours - No other special requirements on latency or delay - Perhaps benefit from TCP enhancements.

5 CALIB: Calibration file access - Small calibration data sets. Perhaps a ~ few kBytes. - Needed during job execution. Perhaps many such accesses needed during entire job. - Sensitive to cumulative delay caused by TCP backoff if congestion => perhaps requirements on packet loss but not delay.

6 GCNTL: Grid Control traffic?? - Typically small LDAP queries, or other small messages. - Perhaps sensitive to cumulative effect if complete query/response suffers from unreliability, but not in context of a many hour job. See below though

6.1.1 INT: Interactive We will construct Grid portals to give truly interactive access to “the Grid”. This means a person sitting in front of a GUI. - Interactive access to databases, catalogues,... - Interactive access to Grid query services such as resource information or replica information - Sensitive to “annoying” delays in query/response over WAN - Small data sets for immediate processing. Note this may have overlap with BioGrid applications. - Requirements on packet loss – or more correctly reliable communications without large delays in case of errors. (and perhaps total delay, but unlikely)

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 13 7 VIDEO: Video/VoIP - Nothing new here: real time requirements for collaboration meetings in parallel with normal Grid operations.

Task 1. Specific technical goals (MB-NG-Doc-T1-1.1-TechnicalGoals.doc) 14