Connectivity in the LAC Region in 2020

Author: Agustín Formoso Coordination/Revision: Guillermo Cicileo Edition and Design: Maria Gayo, Carolina Badano, Martín Mañana Project: Strengthening Regional Infrastructure Department: Internet Infrastructure R&D

We often talk about connectivity, but what is it that we are actually talking about? Since the moment networks were connected to each other, operators have been working to improve connectivity between them. In this article, we will present a connectivity study explaining how we measure connectivity in the region of and the and how this has evolved in recent years.

Introduction

The term connectivity is often used in the Internet industry, yet its meaning may vary depending on the context. Connectivity can be measured based on bandwidth capacity, number of hops, or, in the case of this article, latency. In this sense, when we say that two locations are well connected, it means that the latency between them is low, i.e., the time it takes for a message to travel from source to destination is short.

We at LACNIC wish to understand in greater detail the characteristics of network interconnection in Latin America and the Caribbean so that operators will have access to information they can leverage when designing their growth strategies. With this in mind, it is very useful to perform connectivity measurements that cover the whole region, including the entire Caribbean region and not just the LACNIC service region. Connectivity measurements are typically performed between one origin and one destination, or between a few origins and a few destinations. Measurements are generally initiated from nodes in our own infrastructure, or towards our own infrastructure. However, in order to obtain connectivity measurements that cover the entire region, it is necessary to initiate measurements from third-party networks, a challenge that requires the collaboration of multiple actors. Below is a list of platforms that offer the possibility of initiating measurements from third-party networks, along with a brief explanation of their characteristics.

- RIPE Atlas is a well-known example that encourages users who wish to collaborate to host a probe (hardware or software) and keep it connected to the Internet. This allows other users of the platform to use these probes to initiate measurements. While the number of RIPE Atlas probes in LAC has been increasing over time, their presence is still insufficient to conduct studies covering the entire region. - CAIDA Ark or Archipelago is a platform similar to RIPE Atlas which is based on second generation Raspberry Pi. Unfortunately, there are only 11 CAIDA Ark probes in 10 autonomous systems in the LAC region. - M-Lab or Measurement Lab is a platform that offers a series of tools to measure various network parameters. Unfortunately, these tools do not focus on latency measurements between third-party networks, but on bandwidth measurements using Network

1

Diagnosis Tools (NDT) between the client and M-Lab servers (not many of which are located in the LAC region). - On the other hand, Speedchecker is a platform with similar functionalities that offers better coverage in the LAC region. Unlike RIPE Atlas, this platform uses only software probes, which makes the system more unstable than the others. However, given that we had used this platform in prior studies, we decided to use it once again so we would be able to compare results with greater confidence.

Methodology and Prior Studies

LACNIC has conducted similar studies in previous years [click here to read the studies conducted in 2016 and 2017 (in Spanish)]. Generally speaking, these studies provide an overview of the region each year and show significant improvements in the measurements year after year.

The measurements were performed using the following methodology:

1. In every country of the LAC region, measurements were scheduled to run every hour. 2. These measurements were ICMP pings, with 10 packets sent for each measurement. 3. The packets’ destination was a random IP address selected from a pool of known IP addresses. a. This pool of IP addresses was comprised of all the Speedtest servers in the region. This allowed having a large number of vantage points located in multiple networks across the region, with a reasonably good uptime (it was highly probable that the IP address would be reachable at the time of performing the measurement). 4. The results were collected and later processed. a. Geolocation information was obtained from Maxmind b. Routing information (autonomous system, announced prefix) was obtained from the RIPE Routing Information Service (RIPE RIS).

2

Results of the 2020 Study

For this edition of the study, we began a measurement campaign that ran from early September to early November. During this period, 13,000 measurements were initiated from 26 countries and their destinations were in 25 countries. In turn, 13,000 measurements originated in 332 different autonomous systems.

Results

Once the data was obtained, measurements were grouped considering three different categories:

1. Country of origin: This category contained all the measurements obtained with probes located in the country, excluding those pointing to the same country. In other words, these were only outgoing measurements.

2. Destination country: This category contained all the measurements obtained with vantage points (servers) located in the country, excluding those originating in the same country. In other words, these were only incoming measurements.

3. Both country of origin and destination: This category contained all the measurements for which the country of origin was the same as their destination country. In other words, these measurements are internal to the country. This category contained all measurements performed with probes and vantage points located in the same country.

The results can be seen in the three charts below, which show the median of the results obtained for each country.

3

Measurements with their destination in Measurements with their origin and the country destination in the country

Mediciones originadas en el país Medições originadas no país Measurements originating in the country

A first glance shows that the values for both the outgoing latency (1) and the incoming latency are between 150 and 200 milliseconds. The charts also show that that, naturally, the internal latency (3) is lower than the outgoing and incoming latencies.

Another aspect evidenced by this campaign is that the number of countries with active probes is much higher than the number of countries with test servers (37 and 26, respectively). This difference is the reason why the first chart has more entries than charts 2 and 3. In addition, not all combinations have been covered by this measurement campaign, at least not at this time.

Some countries with poor latencies in the previous charts are worth noting, as this is mainly the result of a bias in the measurement methodology. As mentioned earlier, the destination of the measurements is a pool of servers in the LAC region. Because these servers are not equally distributed across the region (larger countries tend to have more servers), when a new measurement is scheduled there is a higher probability that it will be scheduled for these larger countries. In fact, until the moment of writing this report, the countries that appear in figure 1 with the highest latencies — (CU), Turks and Caicos Islands (TC), French (GF), (SR), Guyana (GY), and (VE) —have results exclusively towards those

4

countries. As time goes by and more measurements are scheduled to more countries, this bias will decrease, and measurements will better reflect the reality of regional connectivity.

The information in the charts above can also be represented on a map:

*Latency – Measurements originating in the country

**Latency with their destination in the country

***Latency within a country

5

2020 versus 2017

We already mentioned that we conducted a similar study in 2017. So, how do the results obtained in 2020 compare with the ones obtained in 2017? Just as the results presented in the previous section, it is possible to group and compare the measurements. This chart represents the measurements that originated in each country.

The 2017 measurement campaign shows that latencies in certain countries were much higher than those measured in 2020, particularly in (UY), (PY), (CL) and (BO), as can be seen in the graph to the left. We were able to verify that this was not due to a small number of samples, as they had between 100 and 1000 samples. The only country with a measurement bias is Cuba (CU), which had only 1 sample during 2017.

Nevertheless, a more detailed analysis of the data shows that most of the countries had worse latency values toward the region than in 2017. Of the 37 countries included in the chart, 28 (75%) had higher values. But what is the reason for this increase? To answer this question, we had to Measurements originating in the country perform a more in-depth analysis of the data.

We asked ourselves the following question: Is it possible that connectivity is now worse in these countries? We analyzed this possibility and proposed the following hypothesis: Could it be that the measurements we performed in 2020 covered longer distances than those covered in 2017? This is a possibility, particularly considering that:

1. The vantage points of 2020 were not the same as the ones of 2017: Speedtest nodes are enabled and disabled daily! 2. Every measurement we schedule randomly selects a vantage point, so there is a strong random element at the beginning of the experiment, when we do not yet have a statistically significant number of samples. Could it be that the points selected at the start of this experiment were farther away?

6

To answer this question, we used a geolocation database1 to determine the country of origin and destination city for each measurement. Then, considering geodesics2, we determined the approximate distance traveled by each measurement. If we assign each measurement the distance it traveled and represent these distances for the years 2017 and 2020, we obtain a plot like the one below, where the lines located farther to the left represent smaller (shorter) distances. This is known as a cumulative distribution function (CDF)3 and, in this case, it represents the fraction of measurements that are below a certain distance. The x-axis represents the distances traveled by the measurements; the y-axis, the fraction of distances that are below the value of x.

The fact that the red line is located to the left of the graph means that the 2017 dataset contains shorter distances, i.e., points that are closer to each other, so it follows that the dataset contains lower absolute latencies. The graph shows that, in 2017, the 50% corresponding to the shortest measurements covered a distance of only 3,535 km, while, in 2020, 50% of these measurements covered a distance of 4,170 km.

Once the distances had been compared, we added the latency component. For example, for a distance of 1,000 km, a 10 ms ping is not the same as a 20 ms ping — we consider that the 10 ms ping travels faster, i.e., it has greater speed. The latency component was added by analyzing the ping speed, that is, the distance-time relationship of the measurement vs. the speed of light through a fiber-optic medium (an approximation commonly used is c/3, one-third of the speed of light in vacuum). The result is a fraction, where a value close to 1 means that the ping travels as fast in practice as in theory, and values close to 0 mean suboptimal speeds

Distance from origin to destination (resulting from paths that are longer than a direct line, intermediate hops, and other potential sources of delay). In this case, unlike latency measurements, higher values are better.

1 Maxmind - versions June 2017 and November 2020 2 Más sobre geodésica en Wikipedia 3 Más sobre CDF en Wikipedia

7

Once the correction was introduced, we calculated the new metric (ping speed) and performed the same comparison once again. As we can see, 31 out of 35 countries show

Ping Speed

If we define speed as distance/time, we can say that a ping covers a distance d in a time t and travels with a speed of d/t. The units usually used for distances and ping times are kilometers (km) and milliseconds (ms), but the resulting unit km/ms does not make much sense. This is why, in order to define ping speed, we compare d/t with a reference speed. This reference speed is c/3, the theoretical speed of light through a fiber-optic medium.

The comparison is therefore:

d/t ------c/3

Where the resulting unit is a fraction that tells us how close our ping is to the theoretical speed of propagation. Values close to 1 represent pings that propagate as fast as theory would indicate, while values close to 0 represent slow pings.

improvements in 2020 compared to Ping speed 2017 (88%), and only 4 do not (Bolivia, Paraguay, , and the Turks and Caicos Islands). This new metric helps us to normalize different datasets under a single metric that allows us to compare them and thus measure an improvement in regional connectivity.

8

A Comparison of Aggregate Latency

The measurements can also be represented in a general latency graph. Just as for the graphs above, we divided the measurements into two groups, those that were performed between points located in the same country and those that were performed between points in different countries. The following shows latencies for 2017 and 2020.

Here, we can compare the lines plotted based on the data obtained in the studies conducted in 2017 and 2020, respectively. The dotted line represents latency times in 2017, while the solid line shows latency times in 2020. In turn, the graph shows intra-country measurements (i.e., measurements between points in the same country) and inter- country measurements (i.e., measurements between points in different countries).

In the graph we can observe a clear improvement in the times for internal connections within each country. If we focus on the lines to

Intra-country / inter-country latency by year the left (blue), we can see that the 2020 line (solid line) is entirely to the left of the 2017 line, in other words, it represents lower latencies. For 2020, 50% of the measurements (median) are below 39 ms, compared to 52 ms in 2017. An analysis of the slower measurements shows that 95% of the measurements (median) are below 166 ms, compared to 150 ms in 2017.

However, an analysis of inter-country connectivity (red lines), i.e., the latency between points located in different countries, the situation is not su clear. Part of the measurements have improved as compared to 2017. We can observe that, for lower latency values, the 2020 curve is faster; however, this changed at the 25th percentile (122 ms on the x-axis) and the 2017 curve becomes faster. Above the 72nd percentile (199 ms), the situation changes once again and the 2020 curve is again the faster of the two. Yet the question is, what is the reason for this behavior? Is this another case of measurement bias?

First, let's analyze the components that remain constant between the two datasets. For example, we can analyze the latency between the autonomous systems that were part of both the 2017 and

9

the 2020 campaigns. It is important to remember that measurements have an origin autonomous system and a destination autonomous, and that for proper comparison we must maintain the complete tuple . On the one hand, maintaining the complete tuple involves aggressive filtering and keeping only a small subset of the original dataset. On the other, however, it has the benefit of producing measurements are comparable with each other, and which allow us to determine whether there has been an actual change over the three years. Of a total of 10,750 combinations that were part of the 2020 dataset, we kept 373 samples. The CDF for this subset is plotted below.

We can see that, for this subset of measurements, the intra-country latency has the same behavior (2020 shows improvements compared to 2017), while the inter-country latency has a slightly clearer behavior: The values for 2020 are lower in almost all parts of the latency spectrum, particularly above the 29th percentile.

An analysis of this subset of measurements allows us to conclude that latency has improved for the autonomous systems for which we had measurements and vantage points both in 2017 and 2020.

Intra-country / inter-country latency by year (subset) To understand more clearly the behavior of the original CDF, we repeated the method described earlier according to which we considered the geodetic distance between origin and destination and compared ping speeds against their theoretical value. Let us recall that this comparison is performed using a value that can range from 0 to 1, where low values indicate inefficiencies in ping propagation. As routing improves, the actual conditions approach theory and this value approaches 1. In this case, we considered the entire dataset for 2020 and 2017. The following figure shows the CDF of the ping speed.

10

Latencia según año The graph shows that, in fact, the ping speed for 2020 is consistently better than for 2017 (plotted on the right), which proves that regional connectivity has seen improvements over the last 3 years. The 2017 curve (dotted line) shifts approximately 20% to the right to its value in 2020. Furthermore, this new metric allows us to conclude that the mixed behavior of latencies for inter-country measurements is due to the difference in the samples collected in one year and the other. Once the samples are normalized based on geographical distance, the mixed behavior disappears.

11

Latency Clusters

Based on the latency times between different countries, just as for the studies carried out in previous years, we grouped the countries that are closer to each other than to the rest of the region.

The unsupervised cluster analysis returned a total of four clusters, two of which have less than 20 measurements, so we had to make some considerations. After the data was filtered, we had the two clusters shown in the maps below, each in a different color.

We can see that there is a group of countries in the southern part of the American continent, which includes , Uruguay, Paraguay, , and Chile, among other countries. This is cluster #1 and it is the one with the best interconnection, with an internal latency of 68 ms (median). These groups are interconnected via fiber optic cables deployed by different carriers which allow a high degree of integration. This is the reason why latency times between them are low.

12

Cluster #0 comprises mostly countries of Central America, some countries of Definition of Clusters 0 & 1 South America, and some countries of the Caribbean. It has an internal latency Cluster 0 Cluster 1 of 129 ms (2,8 times higher than cluster Anguilla Bolivia Antigua and Barbuda #1). The image on the right shows the Argentina definition of these two clusters. Bahamas Barbados Brazil Comparison with 2017 Guyana Chile Cuba Dominica The same study was conducted during 2017. As a result, dour clusters were Peru Guadeloupe obtained. Suriname Turks and Caicos Islands Jamaica The following two figures allow us to Venezuela Paraguay compare the latency curves in a CDF, Puerto Rico Saint Lucia just as we did earlier. In this case, the Uruguay clusters of 2017 and 2020 are compared. Virgin Islands Latencies are for measurements between countries of the same cluster.

A first observation shows that, while in 2020 there were four clusters, in 2017 there were only two. An analysis of the data for 2017 shows that there was a group of countries in cluster 3 with low latency, less than 74 ms in 30% of the cases. Latencies then increased, but 63% of the measurements in this group were below 125 ms. This group included countries such as Chile, Argentina, Uruguay, Brazil and Paraguay (see cluster 3 in the graph).

There was a second group of countries where 73% of the measurements were also below 125 ms and in many cases it performed better than cluster 3. Latency by cluster This group included countries such as Honduras, Guatemala, Venezuela and El Salvador (see cluster 1). As in the previous case, 27% of the measurements exceeded 125 ms and others deviated from this value, reaching values as high as 300 ms.

13

A third group of countries comprising Mexico, the Dominican Republic, Cuba, Belize and Trinidad and Tobago had a more uniform behavior: although below 100 ms latencies were slightly higher than in clusters 1 and 3, above 100 ms the curve behaved similarly to cluster 3. In this case, however, we can observe that more than 90% of the measurements were below 175 ms (see cluster 0).

Finally, there was a fourth group of countries (cluster 2) that included Colombia, Panama, Costa Rica, Nicaragua and Bolivia. In this case, latencies were higher than in the other clusters and above 100 ms in 70% of the cases. Latency by cluster With respect to 2020, the first thing that stands out is an improvement in latency times in the group of countries in cluster 1. Countries that were in cluster 3 in 2017 are now in cluster 1: Chile, Argentina, Brazil and Paraguay. This cluster now includes a larger number of countries, mostly in the Caribbean, for example, Jamaica, the Bahamas, and the Virgin Islands, among others.

Two things should be noted: the definition of the countries that were included in the cluster changed from 2017 to 2020 (Caribbean countries were added), as did the data. Therefore, we can analyze the evolution of latency Cluster comprising Brazil, Argentina, Chile and other countries based on two criteria:

1. Analyzing cluster 3 of 2017, comparing the data of 2017 with the data of 2020 2. Analyzing cluster 1 of 2020, comparing the data of 2017 with the data of 2020

14

To do so, we grouped the data under different curves in the CDF above.

We can see that in both cases there were improvements in the cluster's connectivity:

1. If we consider the definition of 2017 (solid blue vs. dotted blue line), we can see a clear improvement in latencies below the 75th percentile: the 40th percentile shifts from 112 ms to 63 ms, an improvement of 49 ms. 2. If we consider the definition of 2020 (solid red vs. dotted red line), we can see that the curves behave very similarly and they shift slightly to the left. For example, the 50th percentile shifts from 138 ms to 103 ms, an improvement of 35 ms.

This shows a significant improvement in the interconnection between this group of countries, probably due to increased peering and transit relationships between operators. However, just as in 2017, the highest 20% of latencies are above 200 ms and some are even much higher (300 ms or more). This may be due to measurements from autonomous systems with poor connectivity at the regional level which end up exchanging traffic in the United States or Europe.

Likewise, we repeated the comparison for cluster 0 (2020 definition, 2020 data versus 2017 data). This comparison does not allow us to conclude that there were improvements in the past few years. However, it is important to bear in mind that cluster 0 is not actually a cluster but a group of countries that were not placed into a specific cluster. As more measurements are obtained, it is highly likely that other clusters with better internal latency will appear.

15

Conclusions

In summary, we can see that there is an important group of countries (which includes some of the countries with the largest populations and Internet development in the region) which, in 2020, exhibit a significant improvement with regard to 2017, as the median measured latency improved by 49 ms.

Improvements can be observed if one considers the subset of networks (autonomous systems) that were measured in both campaigns. Furthermore, if the data is considered in terms of their geographic distance (ping speed), improvements can also be observed.

Another aspect worth noting is that this latency measurement campaign left active measurements running permanently. This will allow using the same methodology to compare future regional connectivity measurements. Likewise, traceroutes started to be collected during this campaign using the same methodology.

It is proposed that a future study might complement this analysis with interconnection data for networks in this group of countries at BGP table level.

Final Notes

The analysis supporting this study is available at the LACNIC Labs GitHub.

The measurements supporting this study are part of LACNIC's SIMON Project. Click here to download the measurements.

16