See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228773987

Explaining Structured Errors in Gigabit Ethernet

ARTICLE · APRIL 2005

READS 13

4 AUTHORS, INCLUDING:

Andrew W. Moore University of Cambridge

105 PUBLICATIONS 2,202 CITATIONS

SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Andrew W. Moore letting you access and read them immediately. Retrieved on: 22 January 2016

Explaining Structured Errors in

Gigabit Ethernet

Andrew W. Moore, Laura B. James, Richard Plumb and Madeleine Glick

IRC-TR-05-032

March 2005

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice.

Copyright © Intel Corporation 2003 * Other names and brands may be claimed as the property of others.

1

Explaining Structured Errors in Gigabit Ethernet

¡ ¡ ¢

Andrew W. Moore , Laura B. James , Richard Plumb and Madeleine Glick

University of Cambridge, Computer Laboratory

[email protected] ¡

University of Cambridge, Department of Engineering, Centre for Photonic Systems £

lbj20,rgp1000 ¤ @eng.cam.ac.uk ¢ Intel Research, Cambridge [email protected]

Abstract— A physical layer coding scheme is designed to I. INTRODUCTION make optimal use of the available physical link, providing Many modern networks are constructed as a series of functionality to higher components in the network stack. layers. The use of layered design allows for the modular This paper presents results of an exploration of the errors construction of protocols, each providing a different observed when an optical Gigabit Ethernet link is subject service, with all the inherent advantages of a module- to attenuation. In the unexpected results, highlighted by some data payloads suffering from far higher probability based design. Network design decisions are often based of error than others, an analysis of the coding scheme on assumptions about the nature of the underlying layers. combined with an analysis of the physical light path leads For example, design of an error-detecting algorithm, such us to conclude that the cause is an interaction between the as a packet checksum, will be based upon a number of physical layer and the 8B/10B block coding scheme. We premises about the nature of the data over which it is to consider the physics of a common type of laser used in work and assumptions about the fundamental properties optical communications, and how the frequency charac- of the underlying communications channel over which it teristics of data may affect performance. A conjecture is is to provide protection. made that there is a need to build converged systems, with Yet the nature of the modular, layered design of the combinations of physical, data-link, and network layers optimised to interact correctly. In the mean time, what will network stacks has caused this approach to work against become increasingly necessary is both an identification of the architects, implementers and users. There exists a the potential for failure and the need to plan around it. tension between the desire to place functionality in the

most appropriate sub-system, ideally optimised for each

incarnation of the system, and the practicalities of mod-

ular design intended to allow independent developers to

Topic Keywords: Optical Networks, Network construct components that will inter-operate with each

Architectures, System design. other through well-defined interfaces. Past experience has lead to assumptions being made in the construction not explicitly taken into account by existing layers (such or operation of one layer’s design that can lead to as networks, transport-systems or applications). In fact, incorrect behaviour when combined with another layer. quite the opposite situation occurs where existing layers

There are numerous examples describing the problems are assumed to operate normally. caused when layers do not behave as the architects Outline of certain system parts expected. One might be the Section II describes our motivations for this work previous vendor recommendation (and common practice) including new optical networking technologies which for disabling UDP checksums, because the underlying change previous assumptions about the physical layer Ethernet FRC was considered sufficiently strong, led and the implications of the limits on the quantity of to data corruptions for UDP-based NFS and network power useable in optical networks. name (DNS) services [1]. Another example is the re- use of the 7- scrambler for data payloads [2], [3]. We present a study of the 8B/10B block-coding system

The 7-bit scrambling of certain data payloads (inputs) of Gigabit Ethernet [6], the interaction between an (N,K) results in data that is (mis)identified by the SONET [4] block code, an optical physical layer, and data trans- system as belonging to the control channel rather than the ported using that block code in Section III. In Section IV information channel. Also, Tennenhouse [5] noted how we document our findings on the reasons behind the layered multiplexing had a significant negative impact interactions observed. upon an application’s jitter. Further to our experiments with Gigabit Ethernet, in Section V we illustrate how the issues we identify It is our conjecture that this na¨ıve layering, while often have ramifications for systems with increasing physical considered a laudable property in computer communica- complexity. Alongside conclusions, Section VI states a tions networks, can lead to irreconcilable faults due to number of the directions that further work may take. differences in such fundamental measures as the number and nature of errors in a channel, a misunderstanding of II. MOTIVATIONS the underlying properties or needs of an overlaid layer. A. Optical Networks

We concentrate upon the na¨ıve layering present when Current work in all areas of networking has led assumptions are made about the properties of protocol to increasingly complex architectures: our interest is layers below or above the one being designed. This is focused upon the field of optical networking, but this further exacerbated by layers evolving as new technolo- is also true in the wireless domain. Our exploration gies drive speed, availability, etc. Often such evolution is of the robustness of network systems is motivated by the increased demands of these new optical systems. Additionally, we are increasingly impacted by operator

Wavelength Division Multiplexing (WDM), capable for practice. For example, researchers have observed that example of 160 wavelengths each operating at 10 Gbps up to 60% of faults in an ISP-grade network are due per channel, able to operate at over 5,000 km without to optical events [13]: defined as ones where it was regeneration [7], is a core technology in the current com- assumed errors results directly from operational faults munications network. However the timescale of change of in-service equipment. While the majority of these will for such long haul networks is very long: the same be catastrophic events (e.g., cable breaks), a discussion scale as deployment, involving from months to years of with the authors allows us to speculate that a non-trivial planning, installation and commissioning. To take advan- percentage of these events will be due to the issues of tage of capacity developments at the shorter timescales layer-interaction discussed in this paper. relevant to the local area network, as well as system B. The Power Problem and storage area networks, packet switching and burst switching techniques have seen significant investigation If all other variables are held constant an increase

(e.g. [8], [9]). Optical systems such as [10] and [11] have in data rate will require a proportional increase in lead us to recognise that the need for higher data-rates transmitter power. A certain number of photons per bit and designs with larger numbers of optical components must be received to guarantee any given bit error rate are forcing us towards what traditionally have been (BER), even if no thermal noise is present. The arrival technical limits. of photons at the receiver is a Poisson process. If 20

photons must arrive per bit to ensure a BER of ¥§¦©¨

Further to the optical-switching domain, there have for a given receiver, doubling the bit rate means that been changes in the construction and needs of fibre- the time in which the 20 photons must arrive is halved. based computer networks. In deployments containing The number of photons sent per unit time must thus be longer runs of fibre using large numbers of splitters for doubled (doubling the transmission power) to maintain measurement and monitoring and active optical devices, the BER. the overall system loss may be greater than in today’s The ramifications of this are that a future system point-to-point links and the receivers may have to cope operating at higher rates will either require twice the with much-lower optical powers. Increased fibre lengths power (a 3dB increase), be able to operate with 3dB used to deliver Ethernet services, e.g., Ethernet in the less power for the given information rate (equivalent

first mile [12], along with a new generation of switched to the channel being noisier by that proportion) or a optical networks, are examples of this trend. compromise between these. Fibre nonlinearities impose limitations on the maximum optical power able to be for the electrical signals of the upcoming PCI Express used in an optical network. Subsequently, we maintain standard [17]. that a greater understanding of the low-power behaviour The 8B/10B codec defines encodings for data octets of coding schemes will provide invaluable insight for and control codes which are used to delimit the data future systems. sections and maintain the link. Individual codes or For practical reasons including availability of equip- combinations of codes are defined for Start of Packet, ment, its wide deployment, a tractability of the problem- End of Packet, line Configuration, and so on. Also, Idle space and and well documented behaviour we concen- codes are transmitted when there is no data to be sent trate upon the 8B/10B codec. to keep the transceiver optics and electronics active. The

C. 8B/10B Block Coding Physical Coding Sublayer (PCS) of the Gigabit Ethernet specification [6] defines how these various codes are The 8B/10B codec, originally described by Widmer & used. Franaszek [14], is widely used. This scheme converts

8 of data for transmission (ideal for any - Individual ten-bit code-groups are constructed from orientated system) into a 10 bit line code. Although the groups generated by 5B/6B and 3B/4B coding on the this adds a 25% overhead, 8B/10B has many valuable first five and last three bits of a data octet respectively. properties; a transition density of at least 3 per 10 bit During this process the bits are re-ordered, such that code group and a maximum run length of 5 bits for clock the last bits of the octet for transmission are encoded at recovery, along with virtually no DC spectral component. the start of the 10-bit group. This is because the last

These characteristics also reduce the possible signal 5 bits of the octet are encoded first, into the first 6 damage due to jitter, which is particularly critical in bits of code, and then the first 3 bits of the octet are optical systems, and can also reduce multimodal noise encoded to the final 4 transmitted bits. Some examples in multimode fibre connections. are given in Table I; the running disparity is the sign

This coding scheme is royalty-free, well understood, of the running sum of the code bits, where a one is and sees current use in a wide range of applications. counted as 1 and a zero as -1. During an Idle sequence

In addition to being the standard Physical Coding Sub- between packet transmissions, the running disparity is layer (PCS) for Gigabit Ethernet [6], it is used in the changed (if necessary) to -1 and then maintained at

Fibre Channel system [15]. This codec is also used for that value. Both control and data codes may change the 800Mbps extensions to the IEEE 1394 / Firewire the running disparity or may preserve its existing value; standard [16], and 8B/10B will be the basis of coding examples of both types are shown in Table I. The code- TABLE I

EXAMPLES OF 8B/10B CONTROL AND DATA CODES

Type Octet Octet bits Current RD - Current RD + Note

data 0x00 000 00000 100111 0100 011000 1011 preserves RD value data 0xf2 111 10010 010011 0111 010011 0001 swaps RD value control K27.7 111 11011 110110 1000 001001 0111 preserves RD value control K28.5 101 11100 001111 1010 110000 0101 swaps RD value

group used for the transmission of an octet depends (also negative disparity); these decode to give with upon the existing running disparity – hence the two 4 bits of difference. In addition, the running disparity alternative codes given in the table. A received code- after the code-group may be miscalculated, potentially group is compared against the set of valid code-groups leading to future errors. There are other similar examples for the current-receiver running disparity, and decoded in [6]. to the corresponding octet if it is found. If the received III. BIT ERROR RATE VERSUS PACKET ERROR code is not found in that set, the specification states that the group is deemed invalid. In either case, the As illustration of the discontinuity of measurements received code-group is used to calculate a new value for made by engineers of the physical layer and those the running disparity. A code-group received containing working in the network layers, we compare bit error rate errors may thus be decoded and considered valid. It (BER) measurements versus measurements of packet- is also possible for an earlier error to throw off the loss. This Section contains an extended form of the running disparity calculation causing a later code-group results first presented in James et al. [18]. may be deemed invalid because the running disparity A. Experimental Method at the receiver is no longer correct. This can amplify We investigate Gigabit Ethernet on optical fibre, the effect of a single bit error at the physical layer. (1000BASE-X [6]) under conditions where the received Line coding schemes, although they handle many of the power is sufficiently low as to induce errors in the physical layer constraints, can introduce problems. In the Ethernet frames. We assume that while the Functional case of 8B/10B coding, a single bit error on the line can Redundancy Check (FRC) mechanism within Ethernet is lead to multiple bit errors in the received data . For sufficiently strong to catch the errors, the dropped frames example, with one bit error the code-group D0.1 (current and resulting packet loss will result in a significantly running disparity negative) becomes the code-group D9.1 higher probability of packet errors than the norm for both original and errored frames are stored for later

analysis.

1) Octet Analysis: Each octet for transmission is

coded by the Physical Coding Sublayer of Gigabit

Ethernet using 8B/10B into a 10 bit code-group or Fig. 1. Main test environment symbol, and we analyze these for frames which are

received in error at the octet level. By comparing the certain hosts, applications and perhaps users. two possible transmitted symbols for each octet in the In our main test environment an optical attenuator is original frame to the two possible symbols corresponding placed in one direction of a Gigabit Ethernet link. A to the received octet we can deduce the bit errors which traffic generator feeds a Fast Ethernet link to an Ethernet occurred in the symbol at the physical layer. In order switch, and a Gigabit Ethernet link is connected between to infer which symbol was sent and which received, this switch and a traffic sink and tester (Figure 1). The we assume that the combination giving the minimum variable optical attenuator and an optical isolator are number of bit errors on the line is most likely to have placed in the fibre in the direction from the switch to occurred. This allows us to determine the line errors the sink. We had previously noted interference due to which most probably occurred. reflection and took the precaution to remove this aspect from the results. Various types of symbol damage may be observed. One of these is the single-bit error caused by the low A packet capture and measurement system is im- signal to noise ratio at the receiver. A second form of plemented within the traffic sink using an enhanced error results from a loss of bit clock causing smeared driver for the SysKonnect SK-9844 network interface bits: where a subsequent bit is read as having the value card (NIC). Among a number of additional features, of the previous bit. A final example results from the the modified driver allows application processes to re- loss of symbol clock synchronization. This can lead ceive error-containing frames that would normally be to the symbol boundaries being misplaced, so that a discarded. As well as purpose-built code for the receiving sequence of several symbols, and thus several octets, will system we use a special-purpose traffic generator and be incorrectly recovered. comparator. Pre-constructed test data in tcpdump-format is transmitted from one or more traffic generators using 2) Real Traffic: Some results presented here are an adapted version of tcpfire [19]. Transmitted frames are conducted with real network traffic referred to as the compared to their received versions and if they differ, day-trace. This network traffic was captured from the 1 interconnect between a large research institution and 0.9 0.8 the Internet over the course of two working days. We 0.7 0.6 0.5 consider it to contain a representative sample of network 0.4 0.3 traffic for an academic/research organisation of approx- 0.2 0.1 0 imately 150 users. 0 256 512 768 1024 1280 1536 Offset of octet within frame

Structured test data Pseudo-random data Other traffic tested included pseudo-random data, consisting of a sequence of frames of the same number Fig. 2. Cumulative distribution of errors versus position in frame and size as the day-trace data, although each is filled transmitted repeatedly and the error count examined. with a stream of octets whose values were drawn from a For the BER measurements presented here, a directly pseudo-random number generator. Structured test data modulated 1548nm laser was used. The optical signal consists of a single frame containing repeated octets: was then subjected to variable attenuation before return- 0x00–0xff, to make a frame 1500 octets long. The low ing via an Agilent Lightwave (11982A) receiver unit error testframe consists of 1500 octets of 0xCC data into the BERT (Agilent parts 70841B and 70842B). The (selected for a low symbol error rate); the high error BERT was programmed with a series of bit sequences, testframe is 1500 octets of 0x34 data (which displays a each corresponding to a frame of Gigabit Ethernet data high symbol error rate). encoded as it would be for the line in 8B/10B. Purpose-

3) Bit Error Rate Measurements: Optical systems built code is used to convert a frame of known data experiments commonly use a Bit Error Rate Test kit into the bit-sequence suitable for the BERT. The bit

(BERT) to assess the behaviour of the system [20]. error rates for these packet bit sequences were measured

This comprises both a data source and a receiving at a range of attenuation values, using identical BERT unit and compares the incoming signal with the known settings for all frames (e.g. 0/1 thresholding value). transmitted one. Individual bit errors are counted both B. Initial Results during short sampling periods and over a fixed period

(defined in time, bits, or errors). The output data can be We illustrate how errors are position independent but used to modulate a laser and a photodiode can be placed dependent upon the encoded data. Figure 2 contrasts the before the BERT receiver to set up an optical link; optical position of errors in the pseudo-random and structured system components can then be placed between the two frames. The day-trace results are not shown because parts of the BERT for testing. Usually, a pseudo-random this data, consisting of packets of varying length, is not bit sequence is used but any defined bit sequence can be directly comparable. The positions of the symbols most 1 and bit error with an Agilent Lightwave receiver, each 0.9 have different sensitivities. We see that these different 0.8

0.7 testframes lead to substantially different BER perfor-

0.6 mance. Importantly the relationship between the test data 0.5

0.4 and BER results has little relationship with the packet- 5 10 15 20 25 30 35 40 45 50 Symbols in error per Frame error rates for the same test data. While not presented Structured test data Low error testframe High error testframe here, the results of Figure 4 have been validated for an

Fig. 3. Cumulative distribution of symbol errors per frame optical power-range of 4dB.

What we can illustrate here is that a uniformly- subject to error can be observed. For the random data the distributed set of random data, once encoded with position distribution is uniform — illustrating that there 8B/10B will not suffer code-errors with the same uni- is no frame-position dependence in the errors — but the formity. We considered that the Gigabit Ethernet PCS, structured data clearly shows that certain payload-values 8B/10B, was actually the cause of this non-uniformity. display significantly higher error-rates. While not illustrated here, sets of wide-ranging ex- Figure 3 indicates the number of incorrectly received periments allowed us to conclude that Ethernet-frames symbols per frame. The low error testframe generated containing a given octet of certain value were up to no symbols in error and thus no frames in error. The 100 times more likely to be received in error (and thus high error testframe results show an increase in errored dropped), when compared with a similar-sized packet symbols per frame. It is important to note that the that did not contain such octets. errors occur uniformly across the whole packet and that IV. CAUSE AND EFFECT there are no correlations evident between the positions of errors within the frame. We interpret this result as A. Effects on data sequences confirming that errors are highly localised within a frame We have found that individual errored octets do not and from this we are able to assume that the error- appear to be clustered within frames but are independent inducing events occur over small (bit-time) time scales. of each other. However, we are interested in whether

Figures 4(b) and 4(a) show packet error rate and bit earlier transmitted octets have an effect on the likelihood error rate, respectively, for a range of received optical of a subsequent octet being received in error. We had power. The powers in these two figures differ due to the anticipated that the use of running-disparity in 8B/10B different experimental setup; packet error is measured would present itself as correlation between errors in using the SysKonnect 1000BASE-X NIC as a receiver, current codes and the value of previous codes. 1e-05 0.1

1e-06 0.01

1e-07 0.001 1e-08 0.0001 Bit error rate 1e-09 Packet error rate

1e-05 1e-10

1e-11 1e-06 -8.6 -8.4 -8.2 -8 -7.8 -7.6 -23.4 -23.2 -23 -22.8 -22.6 -22.4 -22.2 -22 Received power / dBm Received power / dBm Structured test data Low error testframe Structured test data Low error testframe High error testframe High error testframe

(a) Bit error rate versus received power (b) Packet error rate versus received power

Fig. 4. Contrasting packet-error and bit-error rates versus received power

We collect statistics on how many times each trans- in Figure 5(a) are indicative of an octet that is error- mitted octet value is received in error, and also store prone independently of the value of the previous octet. the sequence of octets transmitted preceding this. The In contrast, horizontal bands indicate a correlation of error counts are stored in 2D matrices (or histograms) errors with the value of the previous octet.

of size     , representing each pair of octets in It can be seen from Figure 5 that while correlation the sequence leading up to the errored octet: one for between errors and the value in error, or the immediately

the errored octet and its immediate predecessor, one previous value, are significant, beyond this there is no 

for the predecessor and the octet before that, and so apparent correlation. The equivalent plot for 

¨  ¨  on. We normalise the error counts for each of these produces a featureless white square. histograms by dividing by the matrix representing the B. 8B/10B code-group frequency components and their frequency of occurrence of this octet sequence in the effects original transmitted data. We then scale each histogram It is illustrative to consider the octets which are most matrix so that the sum of all entries in each matrix is 1. subject to error, and the 8B/10B codes used to represent

Figure 5(a) shows the error frequencies for the “cur- them. In the psuedo-random data, the following ten

rent octet”  (the correct transmitted value of octets octets give the highest error probabilities (independent received in error), on the x-axis, versus the octet which of the preceding octet value): 0x43, 0x8A, 0x4A, 0xCA,

was transmitted before each specific errored octet,  , 0x6A, 0x0A, 0x6F, 0xEA, 0x59, 0x2A. It can be seen ¨

on the y-axis. Figure 5(b) shows the preceding octet that these commonly end in A, and this causes the first 

and the octet before that:  vs . Vertical lines 5 bits of the code-group to be 01010. The octets not

¨ ¨  1.0 1.0 0xFF

0.5 0.5 i - 1 x 0 0 0x7F 0 200 400 625 0 200 400 625 Frequency / MHz Frequency / MHz

(a) FFT of code-group (b) FFT of code-group

0x00 for high error octet for high error octet 0x00 0x7F 0xFF 0x4A 0x0A xi

1.0 1.0 " $#&% (a) Error counts for ! vs.

0.5 0.5

0xFF

0 0 0 200 400 625 0 200 400 625 Frequency / MHz Frequency / MHz i - 2 x

0x7F (c) FFT of code-group (d) FFT of code-group for low error octet for low error octet 0x9D 0xAD

Fig. 6. Contrasting FFTs for a selection of code-groups 0x00 0x00 0x7F 0xFF xi - 1

code-groups of 8B/10B. Examining the FFTs of the " $#' (b) Error counts for ! $#&% vs. code-groups for the high error octets, Figures 6(a) Fig. 5. Error counts for pseudo-random data octets and 6(b), for example, the peak corresponding to the base

frequency (625MHz, half the line rate) is pronounced in beginning with this sequence in general contain at least 4 most cases, although there is no such feature in the FFTs alternating bits. Of the ten octets giving the lowest error of the code-groups of the low error octets (Figures 6(c) probabilities (independent of previous octet), which are and 6(d)).

0xAD, 0xED, 0x9D, 0xDD, 0x7D, 0x6D, 0xFD, 0x2D, The pairs of preceding and current octets leading

0x3D and 0x8D, the concluding D causes the code- to the greatest error (which are most easily observed groups to start with 0011. in Figure 5) give much higher error probabilities than

Fast Fourier Transforms (FFTs) were generated for the individual octets. The noted high error octets (eg. data sequences consisting of repeated instances of the 0x8A) do occur in the top ten high error octet pairs and traces from "good" data blocks C. Laser Physics 0.5 traces from "bad" 10101010 block 0.06

0.4 0.05 An eye diagram is commonly used to assess the

0.3 0.04 460 480 500 520 physical layer of a communications system, where the 0.2 effects of noise, distortion and jitter can be seen in

Receiver output amplitude 0.1

0 0 200 400 600 800 1000 a single diagram [20]. Received analogue waveforms time in ps (before any decision circuit) for each bit of a pseudo-

Fig. 8. Eye diagram: DFB laser at 1.25Gbps NRZ random sequences are superimposed as in Figure 8.

The central open part of the eye represents the margin

between “1”s and “0”s, where both the height (repre- normally follow an octet giving a code-group ending in senting amplitude margin) and the width (representing 10101 or 0101, such as 0x58, which serves to further timing margin) are significant. The diagram shown is empasise that frequency component. computed and includes statistical jitter and amplitude

The 8B/10B codec defines both data and control noise from the laser source but not thermal noise in the encodings, and these are represented on a 1024x1024 receiver, the inclusion of which would have broadened space in Figure 7(a), which shows valid combinations the lines significantly and makes the whole diagram less clear. Electrically, semiconductor lasers are just simple of the current code-group ( (! ) and the preceding one diodes, but the interaction between electron and photon

( () ). The regions of valid and invalid code-groups are ¨ defined by the codec’s use of 3B/4B and 5B/6B blocks populations within the device makes the modulation

(see II-C). response complex. A first-order representation of the laser and driver may be obtained via a pair of rate In Figure 7(a) the octet errors found in the day-trace equations, one each for electrons (N) and photons (P):

have been displayed on this codespace, showing the

021 0>

5

1JI 1 C 1 1 1BADCFE

 

?HG 043 0&1@?

( *

9

regions of high error concentration for real Internet data. 687:9<;8=

0&E E

It can be seen that these tend to be clustered and that 0>

1 1BADCFEMI 1



?

023 0&1

*,L L+N

;8= 9 K 9 the clusters correspond to certain features of the code- O§P groups. Two groups of clusters have been ringed, those However, DFB lasers at frequencies above 1 Gbit/s

that are indicated as (!+*,¦¦©¥¥.-§-§- represent those codes really need multiple coupled equations of this sort in

with a low-error suffix. In contrast the ringed values order to account for spatial variations within the laser,

 */¦©¥§¦©¥§¦©¥.-§-§- indicated as ( indicates the error-prone as described in Carroll et al. [21]. A significant range symbols with a suffix of 0xA. of behaviour is possible as bias, drive conditions, and 0x400

C = 0011... C = 010101... 0x400 i i

C i - 1

0x200 i - 1 C

0x200

0x00 0x00 0x200 0x400 Ci control, data control, data, 0x00 or Invalid 0x400 control data 0x00 0x200

data, control Ci

QSR Q (a) Valid $#&%UTWV pairs (b) Errors using day-trace as a function of code-groups

Fig. 7. The codebook for 8B/10B represented on a 1024x1024 space physical structure vary. In general, jitter becomes much ability of error. In addition to increasing the chances worse if laser bias is reduced below threshold, and ampli- of frame-discard due to data-contents, the occurrence tude noise in the “0” levels becomes a problem if bias is of such hot-spots also has implications for higher level increased. With ideal bias, just at threshold, some lasers network protocols. Frame-integrity checks, such as a have sufficient “memory” to react to the high frequency cyclic redundancy check, assume that there will be a energy in 10101010 strings, and slight eye closure may uniformity of errors within the frame, justifying detec- result, as modeled and shown in Figure 8. The effect is tion of single-bit errors with a given precision. While small, but enough to increase the probability of error Jain [22] demonstrates that the FRC as used in Ethernet for such a data block. In addition, laser drive control is sufficiently strong as to detect all 1, 2 and 3 bit errors loops, receiver timing loops, and the more sophisticated for frames up to 8 KBytes in length, problems may be bandwidth limiting filters in receivers, could in principle encountered for certain combinations of errors above be disturbed slightly by particular bit sequences, and this. Recall that in Section II-C we noted that many hence give increased error rates for those sequences. single-bit errors on the physical layer will translate into

multi-bit errors following decoding by the PCS. V. IMPLICATIONS

In Section III we documented the occurrence of error hot-spots: data and data-sequences with a higher prob- Network/Transport Layer Issues question our assumption that only increased packet-loss

will be the result of the error hot-spots. Instead of just Up until now we have concentrated upon the interac- lost packets, Stone et al. noted certain “unlucky” data tion between the physical layer and the data-link layer, would rarely have errors detected. A further example of such as that embodied in 1000BASE-X. We briefly note related work is Jain [22] which illustrates how errors the interaction that data-link layer effects may have with will impact the PCS of FDDI and require specific error the network and transport layer. detection/recovery in the data-link layer. In our original paper on this work (James et al. [18]) we highlighted the non-uniform distribution of packet er- VI. CONCLUSIONS rors that result from an interaction between the physical In this paper we have illustrated how there is scope coding conditions, the physical coding sub-layer of Giga- for problems to develop in the design, development and bit Ethernet, and particular data to be transported through evolution of the physical components, algorithms, proto- the network. That work identified that certain data-values cols and applications that constitute our networks. These had a substantially higher probability of being received problems are caused by the undesirable interactions in error, which resulted in packets with those payloads between network modules. They may be slow to evolve, being discarded with a higher-than-normal probability. such as the inevitable frequency problems that have

An analysis of the contents of day-trace data along arisen with the limited scrambler and SONET; alterna- with other data derived as part of our network-monitoring tively they may be more abrupt, such as the undesirable work allows us to conclude that in addition to (user) interactions between ATM and TCP/IP documented by data-payloads the error-concentrating effects will cause a Stone et al. [23]. significant level of loss due to the network and transport- We define na¨ıve layering: the development of protocol layer header contents. In one hypothetical case, if a user layers beyond the scope of the original specification. was on a machine with an IP address that consisted Often this layer evolution is combined with assumptions of several high-error-rate octets their data will be at by a developer about the layers below and above. Best a proportionally higher risk of being corrupted and described by example, na¨ıve layering can result in the discarded at the Gigabit Ethernet layer. construction or specification of a network module with-

Further, the occurrence of error hot-spots has ram- out sufficient understanding of the layers with which it ifications quite independent of the Gigabit Ethernet must inter-operate. standard. Stone et al. [23] discuss the impact this has Examining the 8B/10B code used by Gigabit Ethernet, for the checksum of TCP. These results may call into we have documented the form and cause of failures that occur in the low-power regime, inducing, at best, poor different ways. performance and, at worst, undetected errors that may Acknowledgements focus upon specific networks, applications and users.

The errors observed in Gigabit Ethernet in a low-power Many thanks to Ian White, Richard Penty, Derek regime are not uniform as may be assumed. Some McAuley, Bradley Booth, Jon Crowcroft, David G. packets will suffer greater loss rates than the norm. This Cunningham, Eric Jacobsen, Peter Kirkpatrick, Barry content-specific effect is particularly insidious because it O’Mahony, Ralphe Neill, Ian Pratt, Adrian P. Stephens, occurs without a total failure of the network. Kevin Williams, and Adrian Wonfor. Additionally, An-

We applied a scrambler as a form of noise-whitener drew Moore acknowledges the Intel Corporation’s gener- and were able to successfully illustrate that its impact ous support of his research fellowship and Laura James removed the error hot-spotting in the data-space. would like to thank Marconi for its support of her PhD

We have seen how future optical networks will consist research. of an increasingly large number of diverse elements REFERENCES and will have very limited optical power budgets. This will cause a fundamental change in the physical optical [1] J. Stone and C. Partridge, “When the CRC and TCP checksum disagree,” in Proceedings of ACM SIGCOMM 2000. ACM layer, which will also impact the construction of these Press, Aug. 2000. networks. Alongside this we would maintain that these [2] W. Simpson, “PPP over SONET/SDH,” IETF, RFC 1619, May changes will lead to occurrences of na¨ıve layering be- 1994. [3] A. Malis and W. Simpson, “PPP over SONET/SDH,” IETF, coming more prevalent. RFC 2615, June 1999.

Future Work [4] ANSI, “T1.105-1988, Synchronous Optical Network (SONET) — Digital Hierarchy: Optical Interface Rates and Formats We anticipate further study of the effects of scramblers Specification,” 1988. on Gigabit Ethernet, and by extension, on other N,K [5] D. L. Tennenhouse, “Layered multiplexing considered harm- ful,” in Protocols for High-Speed Networks. North Holland, block coded systems. In particular we would like to Amsterdam, May 1989. fully understand the interactions between block coding [6] IEEE, “IEEE 802.3z — Gigabit Ethernet,” 1998, standard. schemes and scrambling. We are also interested in the [7] A. R. Pratt et al., “5745 km DWDM transcontinental field trial effects of jitter in the interactions between the layers of using 10Gbit/s dispersion managed solitons and dynamic gain equalization,” in Proceedings of OFC-2003, 2003, post-deadline an optical network, as the jitter requirements for various paper PD26. network specifications (e.g. Gigabit Ethernet, SONET) [8] J. S. Turner, “ burst switching,” J. High Speed Netw., are not just different in quantity, but measured in quite vol. 8, no. 1, pp. 3–16, 1999. [9] P. Gambini et al., “Transparent optical packet switching: net- face (FDDI),” IEEE Transactions on Communications, vol. 38, work architecture and demonstrators in the KEOPS project,” no. 8, pp. 1244–1252, 1990. IEEE Journal on Selected Areas in Communications, vol. 16, [23] J. Stone, M. Greenwald, C. Partridge, and J. Hughes, “Perfor- no. 7, pp. 1245–1259, Sept. 1998. mance of Checksums and CRCs over Real Data,” in Proceed-

[10] D. McAuley, “Optical Local Area Network,” in Computer ings of ACM SIGCOMM 2000, Stockholm, Sweden, Aug. 2000. Systems: Theory, Technology and Applications, A. Herbert and K. Sparck-Jones,¨ Eds. Springer-Verlag, Feb 2003.

[11] L. James, G. Roberts, M. Glick, D. McAuley, K. Williams, R. Penty, and I. White, “Wavelength Striped Semi-synchronous Optical Local Area Networks,” in London Communications Symposium (LCS 2003), Sept. 2003.

[12] IEEE, “IEEE 802.3ah — Ethernet in the First Mile,” 2004, standard.

[13] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and C. Diot, “Characterization of failures in an IP backbone,” in Proceedings of IEEE INFOCOMM 2004, Hong Kong, Mar. 2004, Sprint ATL Research Report.

[14] A. X. Widmer and P. A. Franaszek, “A DC-Balanced, Partitioned-Block, 8B/10B Transmission Code,” IBM Journal of Research and Development, vol. 27, no. 5, pp. 440–451, Sept. 1983.

[15] The Fibre Channel Association, Fibre Channel Storage Area Networks. Eagle Rock, VA: LLH Technology Publishing, 2001.

[16] IEEE, “IEEE 1394b — High-Performance Serial Bus,” 2002, standard.

[17] E. Solari and B. Congdon, The Complete PCIExpress Reference. Hillsboro, OR: Intel Press, 2003.

[18] L. B. James, A. W. Moore, and M. Glick, “Structured Errors in Optical Gigabit Ethernet,” in Passive and Active Measurement Workshop (PAM 2004), Apr. 2004.

[19] “tcpfire,” 2003, http://www.nprobe.org/tools/.

[20] R. Ramaswami and K. N. Sivarajan, Optical Networks. Morgan Kaufmann, 2002, pp. 258–263.

[21] J. E. Carroll, J. Whiteaway, and R. Plumb, Distributed Feedback Semiconductor Lasers, ser. IEE Circuits, Devices & Systems Series. Co-published by the IEE and SPIE Press, 1998, no. 10.

[22] R. Jain, “Error Characteristics of Fiber Distributed Data Inter-