<<

An Overview of Low latency for Wireless Communications: an Evolutionary Perspective

Xin Fan, Yan Huo School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, China E-mail: {fanxin, yhuo}@bjtu.edu.cn

Abstract—Ultra-low latency supported by the fifth generation (). The current requirement for low latency cannot be () give impetus to the prosperity of many wireless network satisfied by legacy mobile communication systems because applications, such as autonomous driving, robotics, telepresence, of outdated inherent network architecture and communica- virtual reality and so on. Ultra-low latency is not achieved in a moment, but requires long-term evolution of network structure tion techniques. In order to address requirement for the low and key enabling communication technologies. In this paper, latency, 5G needs to make a significant further evolution we provide an evolutionary overview of low latency in mobile to the network architecture, while its relevant key enabling communication systems, including two different evolutionary technologies require breakthrough innovation with respect to perspectives: 1) network architecture; 2) physical layer air previous generations. Therefore, it is extremely important to interface technologies. We firstly describe in detail the evolution of communication network architecture from the second gener- trace back the evolution of the previous and current network ation (2G) to 5G, highlighting the key points reducing latency. architectures and communication technologies. Moreover, we review the evolution of key enabling technologies in the physical layer from 2G to 5G, which is also aimed at A. Related Work reducing latency. We also discussed the challenges and future In the existing literature, it can be easily found some research directions for low latency in network architecture and physical layer. overviews toward 5G networks, including network architecture Index Terms—Low latency, physical layer, network architec- [4]–[6], physical layer technologies [6], [7]. Apart from this, ture, evolution. overviews on low latency for 5G networks in Internet [8], cloud computing [9], Internet-of-Things (IoT) applications I.INTRODUCTION [10], and even a comprehensive survey of latency reduction With the development of mobile communication technolo- solutions in cellular networks towards 5G [2] can be also gies, the requirements of human beings are increasing con- available. However, to the best of our knowledge, these stantly. Once current requirements are met by communication overviews only cover 5G mobile communication systems, and technologies, new needs will arise and then new technologies there is no clear evolutionary route on how latency has being will be expected to be updated. In this way, with the mutual reduced step by step. In other words, horizontal overviews on promotion of requirements and communication technologies, low latency can be found, but longitudinal ones are missing. we have been entering the fifth generation (5G) era. 5G is committed to creating a single platform to provide a B. Contribution and Motivation wide range of services that are classified by the Interna- In this paper, we provide an overview of the latency reduc- tional Telecommunication Union (ITU) into three categories: tion for mobile communication systems from two different enhanced (eMBB), massive machine-type evolutionary perspective. Firstly, we discuss the reduction of communication (mMTC), ultra-reliable and low latency com- latency by changing the network structure, including the radio munications (URLLC) [1]. To support these services, diverse access network (RAN), core network, and bearer network each arXiv:2107.03484v1 [cs.NI] 7 Jul 2021 sets of key performance indicators (KPIs) need to be achieved, generation network structure changes from 2G to 5G mobile which is a challenge for communication technologies. Among communication system. Further, focusing on the physical these KPIs, low latency that end-to-end (E2E) latency of 1 ms layer, we present the communication technologies involved or less is perhaps the most challenging, due to the fact that in each generation of mobile communication systems for most latency-sensitive services need to simultaneously meet attaining low latency, including packet size, frame structure, other KPIs, such as high reliability as 99.9999% [2]. minimum transmission interval, modulation schemes, coding As mentioned above, the requirement for low latency is schemes and so on. Last but not the least, we also point out becoming more stringent step by step. The requirement for the major current challenges in reducing latency and possible E2E latency of 1ms in 5G mobile systems is based on the future research directions in terms of network architecture and fact that the fourth generation () mobile communications physical layer technologies. can achieve 30 − 100 ms E2E latency [3]. The E2E latency of Our motivation is to present a longitudinal and evolutionary 4G is significantly improved on the basis of several hundred perspective on the development of low-latency schemes, with milliseconds of the third generation () mobile communica- a view to seeking further space through reflecting on history. tions that also developed from the previous second generation It should be noted that although we give several evolutionary routes of network architecture and physical layer technologies, the detailed comparison between these solutions involved in TABLE I: List of the major abbreviations each evolutionary route is not within our scope of this work. Abbreviation Definition RAN Radio access network The rest of this paper is organized as follows. Section II E2E End-to-end states the components of latency that includes the overall GSM Global Systems for Mobile Communications latency and physical layer latency. In the following Section GPRS General Packet Radio Service UMTS Universal Mobile Telecommunications System III, we discuss the changes in network architecture from 2G BTS Base transceiver station to 5G, including radio access network, bearing network and BSC Base station controller core network. Then we present changes in the physical layer Cs Circuit switching Ps Packet switching technologies for low latency in the Section IV. Following RNC Radio network controller that, we point out the current challenges and future research SGW Service gateway directions of reducing latency in the Section V. Finally, this PGW Packet data gateway SGSN Service GPRS supported node paper is summarized in the final Section VI. For convenience, GGSN Gateway GPRS supported node a list of major abbreviations is presented in Table I. C-RAN Centralized, Cooperative, Cloud and Clean RAN BBU Base Band Unit RRU Remote Radio Unit II.COMPONENTSOF LATENCY NR New radio SA Standalone It is important to understand the generation and composition NSA Non-Standalone of latency in order to better discuss the reduction of latency. CU Centralized unit AAU Active antenna unit In general, the cellar network latency can be divided into two DU Distribute unit aspects: 1) control plane latency; 2) user plane latency. The NFV Network function virtualization control plane latency generally refers to the time required for NFV Network function virtualization SDN Software defined network a terminal to switch from idle state to connected state; and the D2D Device-to-Device user plane latency refers to the time required for an IP message MSC Mobile switching center (ping packet) to be sent from a terminal to the application GMSC Gateway mobile switching center MGW Media Gateway server and then returned to the terminal. Since the users’ MME entity experience of network services mainly depends on the user SBA Service Based architecture plane latency (the control plane latency mainly affects network NF Network function MEC Mobile edge computing switching), low-latency communication is more focused on the PDH Plesiochronous digital hierarchy user plane. SDH Synchronous digital hierarchy In terms of network architecture, the user plane la- MSTP Multi-service transmission platform PTN Packet transport network tency consists of several components, including air inter- OTN Optical transport network face, bearing network, core network and public data network PON Passive optical network (PDN)/Internet. As shown in Fig.1, the total unidirectional NP Processor TM Traffic manager transmission latency can be expressed as: TTI Transmission time interval TDMA Time division multiple access RLC Radio Link Control T = TRadio + TBearing + TCore + TPDN (1) EDGE Enhanced data rate for GSM evolution CDMA Code division multiple access LTE Long Term Evolution where CP Cyclic prefix HSDPA High speed downlink packet access • TRadio is the latency from the user terminal to the radio OFDM Orthogonal frequency division multiplexing access network. This part of the latency is also known as 3GPP The 3rd Generation Partnership Project the air interface latency, which is mainly affected by the LDPC Low-density parity check physical layer transmission. IDMA Interleave division multiple access SCMA Sparse code multiple access • TBearing is the latency for transmission on the bear- NOMA Non orthogonal multiple access ing/backhaul network that bears the connection between FBMC filter bank multi-carrier the radio access network and the core network, and UFMC Universal filtered multi-carrier GFDM Generalized frequency division multiplexing between the core network and the PDN. AUSF Authentication Server Function • TCore is the processing latency inside the core network. AMF Core Access and Mobility Management Function The processing includes mobility management, users’ IP SMF Session Management Function UPF User plane Function address allocation, security management, bearer control, NEF Network Exposure Function etc. NRF NF Repository Function PCF Policy Control function • TPDN is the latency of content delivery for PDN to UDM Unified Data Management process requests and establish default bearers. AF Application Function Obviously, the E2E latency is approximately twice as long NSSF Network Slice Selection Function as the above latency, i.e., 2 × T . As the main part of TRadio, the physical layer transmission latency can be divided into five evolution of network architecture from 2G to 5G 1, including distinct components as follows: three parts: radio access network, core network and bearing network, as shown in Fig. 1. TPL = Tque + Tttt + Tproc + Tprop + Tretr (2) A. Radio Access Network where In fact, the earliest 2G network is known as Global Systems • T is the queuing latency that is the time needed for que for Mobile Communications (GSM) networks, which also does the current packet to wait for the completion of the not support data transmission, but only used digital signals to transmission of the previous packet. The queuing latency provide telephone services. From GSM network to General of a particular packet depends on the number of packets Packet Radio Service (GPRS) networks (called as 2.5G), arriving in advance and waiting for transmission to the packet switching service was introduced, and then began to link. If the queue is empty and no other data packets are provide data service. Therefore, it should be noted that the currently being transmitted, the queuing latency of the evolution of network architecture in terms of latency reduction data packet is 0. starts at 2.5G. • Tttt is the time-to-transmission latency that is the time required to push all the bits of a data packet to the link 1) 2G Radio Access Network: The radio access network in (from the first bit of the transmitted data packet to the 2G networks is composed of Base Transceiver Station (BTS) last bit of the packet). and Base Station Controller (BSC). BTS receives the wireless signal from mobile station (MS) through the Um air interface, • Tproc is the processing latency including encoding and decoding, modulation and demodulation, channel inter- then transmits it to BSC through the Abis interface. BSC is leaving, channel estimation, rate matching, layer mapper, responsible for the management and configuration of wireless scrambling, data and control multiplexing, etc. These resources (such as power control, channel allocation, etc.), and depend not only on physical layer technologies, but also then transmits the received signal to the core network through on the processing capacity of user terminals and base the A interface. stations. 2) 2G-2.5G Radio Access Network Evolution: The original GSM network is based on circuit switching technology and • Tprop is the propagation latency that is the time it takes for electromagnetic wave to propagate a certain distance does not have the function of supporting packet switching in the channel. services. Therefore, in order to support packet services, several functional entities have been added to the original GSM net- • Tretr is the latency of retransmission. Low link reliability can easily result in packet loss, which involves retrans- work structure, which is equivalent to adding a small network mitting. on the basis of the original network to form a GPRS network. For the radio access network, Packet Control Unit (PCU) E2E latency is the sum of latency on multi-segment paths. is added to the BSC to provide packet switching channel. It can not satisfy the extreme latency requirement of 1 ms only Starting from the GPRS network structure, two concepts are by optimizing a local latency. Therefore, the implementation introduced, as follows. of 5G ultra-low latency requires a series of organically com- bined technologies. On the one hand, evolutionary changes • Circuit switching (Cs) domain: based on circuit switch- in network architecture are needed to flatten the network ing, mainly including voice services, also including structure and sink content providers. On the other hand, it is circuit-based data services, the most common is the fax necessary for air interface be reconstructed to greatly reduce service; the transmission latency of physical layer. The vision of • Packet switching (Ps) domain: based on packet switching, reducing latency can not be achieved overnight, but requires mainly for common data services, including streaming long-term efforts. media services, voice over IP (VOIP) and so on. In the following two sections, we will discuss how to reduce 3) 2.5G-3G Radio Access Network Evolution: The 3G latency from two aspects of the network architecture and mobile cellular system has several standard, among which physical layer air interface technologies in an evolutionary the Universal Mobile Telecommunications System (UMTS) perspective, respectively. is currently the most widely used. UMTS, sometimes also referred to as 3GSM, emphasizes the integration of 3G tech- III.EVOLUTIONARY NETWORK ARCHITECTUREFOR LOW nologies and is a successor to the GSM standard. And packet LATENCY switching system in UMTS is evolved from GPRS system, so Mobile communication networks have different architec- the architecture of the system is quite similar but not exactly tures in different periods. Every generation of network archi- the same. tecture change is an innovation or evolution, which may be Instead of BTS and BSC, the composition of the radio very wide-ranging for different requirements. This paper only access network is replaced by NodeB and Radio Network highlights the significant changes in reducing latency, which 1The first generation mobile communication () only provides voice may not necessarily provide a comprehensive overview of service with analog signals, but no data transmission. Therefore, this paper changes in network architecture. In this section, we provide an ignores the discussion of 1G network architecture. Radio Access Networks Core Networks External Networks

2G GSM Um A E MSC GMSC UE Abis

Gb BSC SGSN GGSN BTS Gn Ai 2G GPRS Um BTS RNC UE Iu-Ps Gi Iub MSC Server Iu-Cs Uu Mc PSTN 3G Iur

UE NodeB MGW 34

Gn S3

MME SGs eNodeB S4 S1-MME

X2 S11 S-GW LTE-Uu P-GW

SGi 4G S5 S1-U UE

RRU IP Networks (Internet) NSSF NEF NRF PCF UDM AF BBU Bool Nnssf Nnef Nnrf Npcf Npcf Naf

DU Nausf Namf Nsmf AAU AUSF AMF SMF N6 N4 CU UPF gNB N2 N3 NR-Uu

5G User And Control Plane

UE User Plane Universal Xn Server Control Plane

Fig. 1: The network architecture from 2G to 5G

Controller (RNC). The functions of BTS and NodeB are are enhanced. The concepts of physical layer retrans- basically the same, but there are still some differences between mission, spreading/despreading, fast resource scheduling them. The differences are instrumental in reducing the latency and the closed (inner) loop power control, are introduced as follows: at the NodeB level. By introducing these functions on NodeB closer to the air interface, which should be only • The interface between BTS and BSC is Abis, while the available with RNC, the efficiency of retransmission and interface between NodeB and RNC is Iub. In physical air resource scheduling is accelerated. As a result, by layer transmission, Abis is a private interface that sup- sinking several control functions from RNC to NodeB, ports either ATM (E1/T1) or IP, while Iub is a open in- lower latency is achieved. terface that supports ATM, IP, and Hybrid (ATM and IP). In addition to the changes in NodeB, there are also improve- Because these interfaces are logical, many interfaces can ments between RNC and BSC that are beneficial to reducing be multiplexed and merged into a single transmission line. the latency, as follows This reduces the number of interfaces and facilitates the switching efficiency between lines, which is conducive to • The PCU entity was removed and its functions were the reduction of latency. incorporated into RNC, which helps to reduce latency. • In GSM, the overwhelming majority of control functions • The interfaces between the radio access network and the are implemented by BSC. BTS only plays the function core network are A interface in GSM and Gb interface of completing the coding and transmission of physical in GPRS, respectively, whereas the two interfaces are layer according to the instructions of RNC, and BTS itself consolidated into the Iu interface in UMTS. The Iu inter- basically has no ability to control and schedule physical face connecting the RNC to circuit-switched core network resources. However, in order to achieve greater through- is known as Iu-CS and the Iu interface connecting the put capability on the air interface, the functions of NodeB RNC to packet-switched core network is known as Iu- PS. Iu-CS and Iu-PS interfaces share the control plane and RNC is interrupted, the NodeB will become a lone using Radio Access Network Application Part (RANAP) point. However, because one base station is connected protocol and have similar user planes [11]. Therefore, the to many base stations, any fault between two points can conversion between interfaces is easier and the latency is also be connected through other channels, which will not reduced. cause a base station to become a lone point. The interface • In GPRS, there is no connection between BSCs, whereas X2 between facilitates 4G wireless network to RNCs can be interconnected through the newly added coordinate the work between the network elements and Iur interface in UMTS. Benefiting from Iur interfaces, enhances the robustness of networks. This efficient and RNCs have the ability of macro diversity and can provide robust network cooperation is conducive to the reduction soft handover. Meanwhile, interaction efficiency between of latency. entities in access network has been improved, so as to • “Fat” base station: In UMTS, NodeB is responsible reduce the latency for resource scheduling. for radio frequency processing and baseband processing. • In GPRS, some radio-related management functions are RNC is mainly responsible for controlling and coordi- controlled in the core network, whereas these functions nating the cooperation between base stations, including have been moved from the core network into the radio ac- system access control, bearer control, mobility manage- cess network in UMTS [12]. This change makes the radio ment, wireless resource management and other control access network and the core network more independent in functions. After removing RNC from eUTRAN, its un- function, and reduces the latency of signaling interaction. derlying functions are allocated to eNodeB, and its high- Due to the closer distance between the radio management level functions are allocated to AGW (access gateway, functional entity and user terminals, the latency is also including service gateway SGW and packet data gateway lower. PGW) of core network. The functions of eNodeB are Apart from above changes, the Um air interface in the mainly evolved from the functions of NodeB, RNC, the previous network structure was replaced by Uu interface. How- service GPRS supported node (SGSN) and the gateway ever, without air interface protocol and technology, the change GPRS supported node (GGSN) of 3G. That is to say, of interface is of little significance. Thus, we will describe Some functions of the core network sink into the access the evolution of physical layer air interface technologies in network again, which results in lower latency. the next section. Here, we emphasize that Iub, Iur and Iu • Centralized, Cooperative, Cloud and Clean radio access (including Iu-CS and Iu-PS) interfaces realize the separation network (C-RAN): An eNodeB consists mainly of a Base of control plane and user plane in the protocol. Control plane Band Unit (BBU) and a Remote Radio Unit (RRU). latency is different from user plane latency (the number of A BBU can support multiple RRUs. In C-RAN, RRUs control signaling is related to the number of users, while extend to the areas closer to user terminals through optical the amount of user data is largely related to new services fiber, while BBUs are centralized to form BBU resource and applications, as well as the performance of devices. The pools. By this, C-RAN centralizes all or part of the data quantity is different, the latency is different naturally). baseband processing resources, and uniformly manages Thus, separating them and optimizing them separately is more and dynamically distributes them. While improving re- beneficial to reduce network latency. source utilization and reducing energy consumption, C- RAN improves network performance through effective 4) 3G-4G Radio Access Network Evolution: The evolution support for collaborative technologies. Due to the faster of the architecture of 3G-4G radio access network can be resource scheduling capability provided by C-RAN, the summarized as follows: processing latency can be reduced. • One layer less: The RNC is removed from the 4G radio 5) 4G-5G Radio Access Network Evolution: When 5G access network, which means that the whole network network is deployed in the existing network, it can be divided structure is reduced by one layer, namely, flattening. into two forms: A more flat network structure reduces the complexity of the system, and reduces the multi-node overhead of • Standalone (SA): refers to the reconstruction of 5G information interaction between base stations and core networks, including new base stations, backhaul links and networks. Therefore, flattening is beneficial to reduce core networks. While introducing new network elements latency. Not only the latency of user plane is greatly and interfaces, SA will also adopt new technologies such shortened, but also the time of state migration is reduced, as network virtualization and software-defined networks due to the fact that the process of control plane from sleep on a large scale, and integrate them with 5G new radio state to activation state is simplified. (NR). At the same time, the technical challenges faced • One interface more: 4G radio access network consists of by SA in protocol development, network planning, de- several Evovled NodeBs (eNodeBs), with an X2 inter- ployment and interoperability will surpass those faced by face added between eNodeBs. In 3G, there were only 3G and 4G systems. interfaces (i.e., Iur) between RNCs, while no interfaces • Non-Standalone (NSA): refers to the deployment of 5G between NodeBs. Once the connection between NodeB networks using existing 4G infrastructure. The 5G carrier based on NSA architecture only carries user data, and its puting nodes are closer to user terminals. Therefore, introduc- control signal is still transmitted through the 4G network. ing fog computing into wireless access network to build fog In this paper, we only consider the case of SA. In 5G RAN (F-RAN) can alleviate the pressure of forward/backhaul network, C-RAN still maintains the characteristics of the four links by virtue of the computing and caching potential of “C” but also has some significant evolution for low latency. users and edge devices. In such a F-RAN network architecture, Firstly, because the demands in 5G is diversified, the net- the distributed computing and caching capabilities of network work needs to be diversified; because the network needs to be edges can be effectively integrated through collaboration, diversified, it needs to be sliced; because it needs to be sliced, which enhances the local real-time processing, transmission the network elements need to be able to move flexibly; because and control capabilities. By carrying sinking network func- the network elements need to move flexibly, the connections tions and edge applications, local information processing and between the network elements also needs to be flexible. As service distribution can be realized, providing lower the E2E a result, 5G radio access network is redefined as NG-RAN, latency performance. in which the base station is no longer eNodeB but gNB. The Last but not the least, Device-to-Device (D2D) communi- gNB reconstructs BBUs and RRUs into the following three cation is not a new technology proposed by 5G, but D2D is functional entities: destined to develop in 5G. Network participants share part of their hardware resources, including information processing, • Centralized Unit (CU): The non-real-time part of the storage and network connectivity. These shared resources pro- original BBU will be separated and redefined as CU, vide services and resources to the network and can be accessed which is responsible for handling non-real-time protocols directly by other users without passing through intermediary and services. entities. This kind of direct communication mode will greatly • Active Antenna Unit (AAU): Part of the physical layer reduce the communication E2E latency. processing functions of original BBU is combined with the original RRU and passive antenna to form AAU. B. Core Network • Distribute Unit (DU): The remaining functions of the The core network is the “management center”, which is original BBU are redefined as DU, which handles phys- mainly responsible for managing data, sorting data, and then ical layer protocols and real-time services. distributing data. The functions implemented by each gen- According to different service requirements and perfor- eration of core network are slightly different, which also mance indicators, the network is divided into logical combi- corresponds to the changes of architecture. We enumerate the nations of network functional entities, and the sliced network structural changes for latency reduction, mainly involving the is used to provide specified services for target users and sinking of functions and the separation of control and user terminals. The reconstructed three functional entities coincide plane. with the implementation of slicing. According to the 5G 1) 2G-2.5G Core Network: In GSM, the core network is standard, CU, DU and AAU can be separated or co-located, mainly composed of Mobile Switching Center (MSC), Visit so there will be a variety of network deployment patterns. Location Register (VLR), Home Location Register (HLR), Au- In low-latency scenarios, DU needs to be deployed close to thentication Center (AUC), Equipment Identity Register (EIR) user terminals. The deployment of units that handle real-time and other functional entities. MSC is the core, responsible for services independently close to user terminals can greatly dealing with the specific service of users. VLR and HLR are reduce latency. mainly responsible for mobility management and user database Secondly, Network Function Virtualization (NFV) and Soft- management functions. AUC and EIR are responsible for ware Defined Network (SDN) are introduced into 5G net- security functions. In addition, the Gateway Mobile Switching works. On the one hand, with the introduction of NFV, 5G Center (GMSC) is responsible for providing access to external network is constructed into a virtualized network environment. network interfaces. After virtualization, differentiated software functions run on In the core network of GPRS, SGSN and GGSN are added, the same hardware devices, and different network functions whose functions are consistent with MSC and GMSC, except will share hardware computing, storage and communication that they deal with packet services and external network access resources. On the other hand, the introduction of SDN im- to IP network, respectively. proves the network’s programmability and separates the data 2) 2.5G-3G Core Network: The most significant change of and control aspects of the network. Under the NFV/SDN 3G core network is the introduction of softswitch to separate network architecture, resource pools that are inherited from 4G the call control function from the media gateway (transport can be evolved into virtualized cloud resource pools (VCRPs). layer) in CS domain. The basic call control function is Resource allocation in VCPR can maximize the reuse of implemented by software, which realizes the separation of call network resources, and bring more flexible and rapid sharing transmission and call control, and establishes a separate plane capability. NFV/SDN packages a series of network functions for control, switching and software programmable functions. into a single action to minimize network sessions, which Specifically, the bearer and control functions of the MSC means a reduction in latency. are separated and divided into two nodes, MSC-server and Thirdly, compared with cloud computing nodes, fog com- Media Gateway (MGW). Call control, mobility management, and media control functions are performed on the MSC-server, Both of the above characteristics are beneficial to the reduc- while the service bearer and media conversion functions are tion of latency. In order to achieve these two characteristics, completed on the MGW. The biggest change brought by the the 5G core network has the following two evolutions from structure of the separation of bearer and control is that MSC- 4G: servers and MGWs can be deployed separately. MSC-servers • Traditional network entities are split into multiple net- are usually concentrated in provincial capitals or regional work functions (NFs) modules. In line with the concept centers. The centralized management of MSC-servers can of SBA, each NF is independently autonomous, and improve the efficiency of operation and maintenance, while individual changes do not affect other NFs. The functions the MGW can be set according to the best service point. of the 4G core network elements can be found in the NFs This centralized management and distributed service delivery of the 5G core network, but the architecture has changed network architecture is conducive to providing better services from monolithic to micro-service. The most obvious (including low latency). external manifestation of this change is the substantial Compared with CS domain, the PS domain only separates increase in network elements. These elements seem a lot, the user plane from the control plane logically, but not in fact, the hardware is virtualized in the virtualization physically. The 3G core network in PS domain does not have platform. The purpose of this change is to make the separate entities to implement control plane and user plane, network more flexible, open and scalable, so as to realize respectively 2. network slicing and provide better services. 3) 3G-4G Core Network: The evolution of 4G core network • The traditional point-to-point communication between has two main aspects as follows: network elements is abandoned. The interface of each • The CS domain is removed and the network architecture NF is a service interface. Each NF provides services of a single PS domain is implemented. This single net- through its own service-oriented interface, and allows work architecture reduces signaling interaction and thus other authorized NFs to access or invoke their own reduces latency. services. Because the underlying transport protocols are • The control plane and the user plane are completely the same, all service interfaces can be transmitted on separated, physically and logically. The functions of the same bus, that is, bus communication mode. This control plane and user plane are assumed by different bus communication mode can provide higher information network entities respectively. The control plane element transmission efficiency and lower latency. is Mobility Management Entity (MME), which is mainly As the network elements are subdivided, the network el- used for user access control and mobility management. ements on the user plane can further sink to mobile edge The user plane network element is the System Architec- computing (MEC) nodes. The MEC technology enables appli- ture Evolution-Gateway (SAE-GW), including Service- cations, services and content to be deployed locally, near-by Gateway (S-GW) and Packet Data Network-Gateway (P- and distributed by migrating computing storage and service GW), is mainly used to bear data services. The processing capabilities of the core network to the edge of the network. efficiency of the core network is improved, so the latency Benefiting from this, the content of service caching is close is reduced. to the user terminal device, thus greatly reducing the service connection and response latency 3. 4) 4G-5G Core Network: With the complexity and diversity of services in 5G era, the integrated network element structure C. Bearer Network of 4G core network can not flexibly cope with the changing Bearer networks, sometimes referred to as transport net- service applications. In order to make the network elements works or backhaul networks, are responsible for bearing more flexible and better respond to diversified applications, and transmitting information. Generally speaking, the bearer the 5G core network is evolving to a discrete Service Based network is the connecting part between the access network and architecture (SBA), which has two characteristics: the core network. In fact, the connections between the internal • Firstly, the separation of network functions absorbs the nodes of the radio access network and the core network should original design idea of NFV cloud, hoping to build the also be included. Although the bearer network is not the main network in a way of software-based, modularized and target of low latency improvement, it also has to undertake service-oriented. some improvements in low latency. The latency of bearer net- • Secondly, the user-side functions are free from the “cen- work comes from two parts: 1) the time of signal propagation tralization” constraints, so that they can not only be in medium; 2) the forwarding latency of transmission devices. flexibly deployed in the core network, but also can be deployed in the radio access network. 3After introducing MEC technology, by superimposing MEC servers on the base station side, content extraction and caching can be accomplished directly by MEC servers. In this way, when other terminals within the same base 2Direct tunnel technology is an innovative technology in 3G [13], [14], station call the same content, they can obtain directly from MEC servers. No which this paper does not focus on because of its uniqueness. The direct tunnel more duplicate acquisition through the core network, which effectively saves technology refers to the establishment of a “direct channel” from RNC to the system resources on the core network side. At the same time, due to the GGSN. User plane data is transmitted in the “direct channel” without passing sinking of service content, the corresponding service response latency will be through SGSN to realize the flattening of network plane. significantly shortened. We list some macro-evolutionary measures for lower latency switching (MPLS) 4, the processing delay of forwarding as follows. between devices is reduced. 1) Transmission distance: As mentioned above, the func- • From PTN to OTN, integrating SDH and wavelength tions of core networks are sinking from 2G to 5G networks. division multiplexing (WDM) technologies, OTN realizes What’s more, the application of MEC enables some low- optical crossover instead of fiber hopping to provide more latency services to be implemented directly in MEC devices flexible scheduling while providing large capacity for without going through the core network. In this way, the long-distance transmission. By enhancing packet process- transmission distance of the bearer network will be reduced, ing and routing forwarding capabilities, OTN can meet which means the transmission latency will be reduced. In the needs of 5G bearer network, such as large bandwidth, addition, the flattening of the network architecture reduces low latency, high reliability, network slicing and so on. the number of entities and forwarding hops, thus reducing the • From OTN to PON, PON replaces electrical devices with forwarding latency. optical devices in order to realize all-optical network, and thus reduces the latency of electrical/optical and 2) Transmission media: In the early stage of communi- optical/electrical conversion between devices. cation development, T1/E1 copper lines (circuit switching) was used in bearer networks. With the rapid increase of In addition to the above mainstream technologies, 5G has mobile devices, the development of 3G technologies have also emerged new bearer network technologies to reduce brought tremendous operational expenditure (OPEX) pressure latency as follows: to bearer networks. Due to its low price and some other advan- • Cut-through switching: The traditional data forwarding tages (larger transmission bandwidth, larger channel capacity, method is that the port checks and forwards after obtain- lower line loss, longer transmission distance, stronger anti- ing a complete data packet, which will introduce partial interference ability, etc.), optical fiber transmission has been latency. Cut-through switching is the fastest forwarding widely used. In the 4G network, removing the CS domain mode for a switch. After receiving the destination MAC to promote “All-IP”, optical fiber has become the main force address of a data frame, the switch immediately forwards of bearer networks. However, there are still some electrical data to the destination port. Subsequent data is forwarded nodes in 4G networks, which cause some performance (in- one byte at a time, which greatly reduces the serial cluding latency) losses due to the photoelectric conversion forwarding latency. between optical nodes and electrical nodes. Thus, 5G puts • Flexible ethernet (FlexE): FlexE achieves physical iso- forward the concept of all-optical network (referring to the lation between sub-MACs and guarantees low-latency electrical/optical and optical/electrical conversion of signal service bandwidth. At the same time, the low-latency only when it comes in and out of the network) to improve identifications of FlexE can be passed to network pro- network performance. cessor (NP), traffic manager (TM) so as to achieve E2E On the other hand, with the rise of millimeter wave and low-latency channel. large-scale Multiple-Input Multiple-Output (MIMO) technolo- • Latency-aware priority: Optimizing NP kernel to sense gies, microwave has become a new solution for bearer net- the priority of low-latency services, a dedicated channel works. Generally, has lower latency for low-latency services can be opened up. and lower OPEX than optical transmission [2]. • Priority-based latency scheduling: According to the pri- ority of the delay traffic from NP, TM adopts message- 3) Transmission technologies: The transmission technolo- through scheduling and preemptive scheduling mecha- gies of bearer networks have experienced the evolution of nism to guarantee the requirement of low-latency ser- plesiochronous digital hierarchy (PDH), synchronous digital vices. hierarchy (SDH), multi-service transmission platform (MSTP), packet transport network (PTN), optical transport network IV. EVOLUTIONARY PHYSICAL LAYER SOLUTIONSFOR (OTN) and passive optical network (PON). The main purpose LOW LATENCY of these technological evolution is not to lower latency (it pay more attention to capacity, bandwidth and cost), but In order to achieve low latency, it is not only necessary it still have some impact on latency. The impact of these to change the network architecture, but also the wireless air transmission technologies on latency in the evolution process interface technologies. In this section, we mainly focus on the can be summarized as follows: evolution of physical layer technologies to reduce TPL, includ- ing frame structure, scheduling, multiple access, modulation, • From PDH to SDH, the rate standard is standardized, channel coding and signal carrier. The main evolutionary the interface is unified, and the management capability is physical layer solutions for low latency are summarized in enhanced. Table II. • From SDH to MSTP, the ip-based interfaces (IP over SDH) are implemented to enhance the capacity of multi- 4MPLS establishes a label forwarding channel (label switching path, LSP) service bearing and scheduling. for messages through pre-assigned labels. At each device in the channel, only • From MSTP to PTN, by adopting multi-protocol label quick label switching is required (one lookup), thus saving processing time. TABLE II: Summary of evolutionary physical layer technologies for low latency.

Frame Scheduling Channel Multiple Typical structure Modulation schemes coding access frequency bands The minimum Frame length TTI scheduling unit Uplink (UL): GPRS A TDMA 20 ms Convolutional TDMA; GMSK; 2G A RLC block NAN 890-915 MHz; frame: code FDMA QPSK Downlink (DL): E-EDGE 60 ms 10 ms 935-960 MHz A superframe: 720 ms; R99: R99: UL: A radio frame: a radio frame; 10 ms; 1940-1955 MHz; WCDMA 10 ms; HSDPA: HSDPA: DL: A short frame: a short frame 2 ms 2130-2145 MHz 2 ms Transferring A superframe: some radio 720 ms; UL: interface GMSK; A radio frame: Turbo 1880-1900 MHz; 3G TD-SCDMA A subframe 5 ms control CDMA QPSK; 10 ms; code DL: functions 16 QAM A subframe: 2010-2025 MHz from RNC 5 ms to base A superframe: station 720 ms; UL: A radio frame: 1920-1935 MHz; CDMA2000 A slot 1.67 ms 10 ms; DL: A slot: 2110-2125 MHz 1.67 ms; A radio frame: GMSK; UL: 10 ms; Pre-scheduling; QPSK; 2500-2570 MHz; 4G A subframe: A subframe 1 ms Semi-static OFDM 16 QAM; DL: 1 ms scheduling 64 QAM 2620-2690 MHz 4f = 15kHz A radio frame: 10 ms; NOMA; A subframe: Allocation GMSK; Data: IDMA; FR1: 1 ms; An OFDM by group; QPSK; See LDPC; SCMA; 0.45-6 GHz; 5G The specific symbol Priority 16 QAM; Table III Control: FBMC; FR2: intra-frame (including CP) preemption 64 QAM; Polar UFMC; 24.25-52.6 GHz structure scheduling 256 QAM GFDM is shown in the Table III.

A. Frame Structure TABLE III: The supported transmission numerologies in 5G. Parameter / Numerology (n) 0 1 2 3 4 In GPRS, the structure of 26-multiframe and 52-multiframe 4f (KHz) 15 30 60 120 240 for CS in GSM is replaced by a new 52 time division multiple A slot (µs) 1000 500 250 125 62.5 access (TDMA) frame structure in PS domain. This new frame The number of OFDM symbols per slot 14 14 14 14 14 structure transmits data in Radio Link Control (RLC) block (Normal CP) The effective length of mode. One RLC block contains 4 TDMA frame, and one 66.67 33.33 16.67 8.33 4.17 TDMA frame contains eight time slots. All the 52 TDMA an OFDM symbol (µs) Length of a CP (µs) 4.69 2.34 1.17 0.57 0.29 frames constitute 12 RLC blocks and 4 idle blocks. Since the TTI (µs): The length of duration of each slot is 0.577 milliseconds, the duration of all an OFDM symbol 71.35 35.68 17.84 8.92 4.46 52 TDMA frames is 52 ∗ 0.577 ∗ 8 = 240ms. Therefore, each (including CP) 240 RLC block period is 12 = 20ms. This RLC block period is referred to as the transmission time interval (TTI) in UMTS. Enhanced data rate for GSM evolution (EDGE) is a direct TTI represents the minimum data transmission time, refer- evolution of GPRS, often referred to as 2.75G 5. In the evolu- ring to the length of an independent decoded transmission in tionary version of EDEG (called as E-EDGE), the original a wireless link, and is the basic unit of resource scheduling RLC block with 4 consecutive TDMA frames on a single and management. Reducing TTI is equivalent to reducing channel is changed to 2 consecutive TDMA frames on a dual T , T and T . At the same time, a shorter TTI can ttt proc prop channel. As a result, the TTI is reduced from 20 ms to 10 ms. increase the number of physical layer retransmissions in a There are many formats for 3G networks, the main- given time, thus ensuring link efficiency, i.e. reducing T . retr stream of which are wideband code division multiple access TTI is the main source of data exchange latency, so most of the evolutionary schemes for reducing the latency in physical 5Since EDGE uses the same architecture of GPRS, we did not introduce it layer begin with reducing TTI. in the evolution of network architecture. (WCDMA), time division - synchronous code division multi- to a terminal, so as to obtain very low latency. ple access (TD-SCDMA), and code division multiple access 2000 (CDMA2000). The frame structure of WCDMA and B. Scheduling Schemes TD-SCDMA is composed of superframe and radio frame. A Resource scheduling latency is also an important component superframe consists of 72 radio frames. Each radio frame of air interface latency, and a fast scheduling scheme can lasts for 10 ms. In the 3rd Generation Partnership Project greatly reduce T , T and T . Fast scheduling is first (3GPP) Release99 version of WCDMA, the minimum unit of que proc retr proposed in the 3G HSDPA, which achieves more efficient resource scheduling is frame length, i.e., TTI is 10ms. And in scheduling and faster retransmit by transferring some radio the 3GPP Release5 version, the high speed downlink packet interface control functions from RNC to base station (closer access (HSDPA) is applied to WCDMA, which introduces a to air interface and shorter frame length make base station short frame structure. The duration of each short frame is 2 scheduling faster and more efficient). ms, which is the minimum unit of resource scheduling, that In 4G LTE before Release 14, equipment manufacturers is, TTI is reduced to 2 ms. TD-SCDMA introduces subframes generally used pre-scheduling to improve latency. The main as resource scheduling units, each of which has a length of idea of this method is that base stations periodically allo- 5 ms, i.e., TTI is 5 ms. In CDMA 2000, the radio frame is cate corresponding wireless resources to terminals, and the composed of 16 slots. The slot length of 1.67ms is the basic terminals can send data directly on pre-allocated wireless unit of resource scheduling, so TTI is 1.67ms. In summary, resources when they have data to send. No need to request the minimum TTI that 3G can achieve is 1.67 ms. resources from the network side, so it reduces the time of In Long Term Evolution (LTE) systems, the radio frame the whole resource request process. But this method has structure is also adopted. The radio frame length is 10 ms some disadvantages: Whether or not users use pre-scheduled and consists of two half-frames with a length of 5ms. Each wireless resources, they are always allocated to terminals. half-frame consists of five sub-frames with a length of 1 ms, After receiving the wireless resource scheduling, if there is no including four ordinary sub-frames and one special sub-frame. data to transmit, the terminal will always upload the padding Therefore, the whole frame can also be understood to be data using the allocated wireless resources. This results in the divided into 10 sub-frames with length of 1ms as the unit waste of precious wireless resources, the power consumption of data scheduling and transmission (i.e., TTI). of equipments and the increase of noise level. Following previous generations of mobile communications, In view of this, in 2016, 3GPP proposed semi-static schedul- 5G should adopt a shorter sub-frame structure. But unlike ing in Release 14 to improve pre-scheduling. In semi-static many people’s expectations, 5G still adopts the same 1ms scheduling, even if terminals are allocated wireless resources, sub-frame as 4G LTE . This is mainly due to the long-term they do not need to send padding data. existence of LTE, so 5G needs to consider the compatibility In 4G, semi-static scheduling resources are generally al- of new radio (NR) and LTE. In order to achieve low latency, located to each user individually. Therefore, when there are the compromise is that the number of orthogonal frequency many users in the network, the waste will be very large, division multiplexing (OFDM) symbols in a sub-frame is no because the terminal does not necessarily use the reserved longer always 14. In 5G NR, resources are scheduled in units wireless resources. In 5G, reserved resources can be allocated of OFDM symbols instead of sub-frames. That is, the length to a group of users, and a collision resolution mechanism is of TTI depends on the length and number of OFDM symbols. designed when multiple users collide on the same wireless 5G NR support multiple numerologies (including subcarrier resources at the same time. In this way, the utilization of spacing and symbol length) for different services. There is precious wireless resources is guaranteed while reducing the only one slot fixed in each subframe in LTE, whereas the latency. In addition, the priority preemption scheduling can be number of slots contained in NR subframe is related to the used into 5G networks to ensure the latency requirement of specific numerology. LTE uses a fixed subcarrier spacing of low-latency services. If idle resources in time and frequency 15KHz, while the subcarrier spacing in NR is 4f = 15 ∗ 2n domain are available, the user B with high priority will KHz, n ∈ {0, 1, 2, 3, 4}. The OFDM symbol duration is be given priority in scheduling idle resources. Without idle 1 ms, n ∈ {0, 1, 2, 3, 4}. Therefore, TTI consists of m 15∗2n resources available, user B will preempt the resources of other OFDM symbol duration and the duration of cyclic prefix (CP). users (e.g. user A), even if the user A has been originally The specific parameter sets (i.e., numerologies) are shown in scheduled in the corresponding slot. the Table III. In addition, 5G NR uses a more efficient mechanism to C. Channel Coding Schemes achieve low latency, that is, the so-called “mini-slot” transmis- sion mechanism. This “mini-slot” mechanism allows one part In 1949, R. Hamming and M. Golay proposed the first of a slot to be transmitted at a time. A mini-slot even has only practical error control coding scheme, i.e., Hamming code. one OFDM symbol. This transmission mechanism can also be Subsequently, with the help of cyclic shift, cyclic redundancy used to change the order of data transmission queues, so that check (CRC) code was proposed, which greatly reduces the the ”mini-slot” transmission data is immediately inserted in coding and decoding structure and reduces the coding and front of the existing conventional slot transmission data sent decoding latency. However, Hamming and CRC coding schemes are both modulation will inevitably lead to high complexity and thus based on block codes. There are two main shortcomings of increase processing latency. Therefore, to reduce the delay, the block codes: one is that the decoding process must wait for choice of appropriate modulation mode needs to weigh Tque the whole codeword to receive before it can begin to decode; and Tproc according to application scenarios. the other is that accurate frame synchronization is needed, In addition, some new low-latency multiple access tech- which leads to large latency and large gain loss. nologies have emerged in 5G, such as interleave division In 1955, Elias proposed convolutional coding that makes multiple access (IDMA), sparse code multiple access (SCMA), full use of the correlation among various information blocks. non orthogonal multiple access (NOMA), filter bank multi In the decoding process of convolutional codes, not only the carrier (FBMC), universal filtered multi-carrier (UFMC) and decoding information is extracted from the code, but also the generalized frequency division multiplexing (GFDM). These relevant information of decoding is extracted from the codes multiple access technologies have the following characteristics received before and after. And the decoding is carried out in reducing latency: continuously, which can ensure that the decoding latency of • IDMA simplifies the complexity of multi-user detection convolutional codes is relatively low. Convolutional codes also (MUD) without complex transmission scheduling strat- have the problem of computational complexity, and there is egy, and thus reduce the T and T . always a gap of 2-3dB between their gain and Shannon’s proc que • Combining symbol mapping and spreading and introduc- theoretical limit. ing non-orthogonal sparse code domain, SCMA achieves Combining convolutional codes with interleavers, the paral- three times the number of connections. At the same time, lel cascade convolutional code, namely Turbo, was proposed because SCMA allows users to have certain conflicts, the in 1993. By exchanging reliability information iteratively to application of the dispatch-free technology in SCMA can improve its decoding results, Turbo code achieves performance significantly reduce data transmission latency. close to Shannon limit. But Turbo code does not solve the • The synchronization requirement of NOMA receiving problem of complexity, and its complexity increases with the algorithm for different signal arrival time is not high, increase of interleaving depth. which makes terminals can send data directly without In the case of high real-time requirement, Turbo code waiting for the base station to allocate dedicated uplink encounters bottlenecks for the upcoming 5G demand of ultra- resources. Compared with traditional scheduling-based high speed and ultra-low delay. Therefore, in the 5G era, there resource allocation, the NOMA technology can save a is a dispute between Polar code and low-density parity check request scheduling and scheduling authorization cycle, (LDPC) code. LDPC code was proposed by MIT professor save time and network resources. Robert Gallager in 1962, which was the first proposed channel • FBMC, UFMC and GFDM are all based on filters, which code approaching shannon limit. LDPC is based on efficient can all improve spectral efficiency and reduce the latency parallel decoding architecture and its decoder is superior to mainly by shortening CP and reducing the dependence of Turbo codes in terms of hardware implementation complexity synchronization. The long transmission impulse response and power consumption. Polar code was proposed by Professor length leads to the long frame length of FBMC, although E. Arikan of Bilken University in Turkey in 2007. It is a coding the CP is shortened. In addition, the computational com- scheme that has been proved theoretically to reach Shannon plexity of FBMC is much higher than OFDM, which limit. Polar codes have lower coding and decoding complexity, makes it unsuitable for low-latency communication. Thus, and there is no error floor phenomenon. The frame error rate UFMC has improved FBMC by filtering through a set (FER) is much lower than Turbo’s. Polar codes also support of continuous subcarriers. GFDM replaces linear convo- flexible encoding lengths and rates, and have proven to be lution with cyclic convolution, which reduces computa- better than Turbo codes in many aspects. tional complexity and processing latency. Finally, 3GPP abandoned Turbo code in 5G era and chose LDPC as data channel coding scheme and Polar as broadcast and control channel coding scheme. Due to the different E. Signal Carrier advantages and disadvantages of various coding schemes, the hardware implementation complexity, power consumption, With the development of mobile communication technology, flexibility and maturity should be comprehensively considered. carrier frequencies are increasing (from 800-900 MHz of 2G to millimeter-wave bands of 5G). With the increase of D. Multiple Access and Modulation frequency, the number of sinusoidal waves per unit bandwidth The development of modulation technology and multiple will increase, i.e., the amount of information carried by unit access technology is mainly to improve the efficiency of bandwidth will increase. Therefore, Tque can be reduced when spectrum utilization, that is, to increase the amount of data the number of data and users is large. transmitted under the same spectrum bandwidth per unit time. In addition, the coverage of base stations becomes smaller In the case of large data and users, high spectrum utilization at the same power due to the increase of frequency. As a efficiency can greatly reduce queuing latency Tque, and then result, the cell size is shrinking, which reduces the propagation reduce the overall latency. It is noteworthy that high-order latency Tprop. TABLE IV: Summary of low latency evolution schemes.

Network architecture Physical layer technologies RAN Core network Bear network 1. Replacing private interface Abis with open interface Iub; 2. Sinking several control functions from RNC to NodeB; 3. Removing the PCU entity; 4. The interface A and Gb are 1. The complete separation consolidated into the interface Iu; of bearer and control 3G 5. The interface Iur between RNCs 1. The transmission 1. Shorter TTI; in CS domain, logically is added; distance is continuously 2. Faster scheduling; and physically. 6. Some radio-related management shortening; 3. Coding with lower complexity functions are moved from core 2. The transmission medium and higher efficiency; network into RAN; has evolved from 4. Multiple access technology 7. All interfaces realize the separation copper wire to optical with higher spectrum of control plane and user plane at cable and then to utilization efficiency; protocol level. microwave transmission; 5. Higher-order modulation; 1. Removing RNCs; 3. The transmission technologies 6. Higher frequency carrier. 2. The interface X2 between have experienced the eNodeBs is added; 1. Removing the CS domain; evolution of PDH, SDH, 3. Sinking several control functions 2. The control plane and user MSTP, PTN, OTN and PON; 4G from RNC and plane are completely 4. All-IP; core networks to eNodeB; separated, physically and 5. All-optical; 4. The concept of C-RAN logically. 6. Cut-through switching; was introduced to improve 7. FlexE; resource scheduling capability. 8. Latency-aware priority; 1. Traditional network entities 9. Priority-based latency 1. The network elements is able are split into multiple scheduling. to move more flexibly; virtualized NFs by NFV 2. The reconstructed three and SDN; functional entities promote 2. The traditional point-to- RAN network slicing.; point communication 5G 3. NFV and SDN build VCRPs between network to speed up network sessions; elements is replaced by 4. Introducing fog computing bus communication mode; into RAN to build F-RAN; 3. The user plane functions 5. D2D communication directly. can further sink to MEC nodes.

V. EVOLUTIONARY MEDIUM ACCESS CONTROL LAYER terminals can send data directly on pre-allocated wireless FOR LOW LATENCY resources when they have data to send. No need to request The medium access control (MAC) layer is responsible for resources from the network side, so it reduces the time of resource scheduling, multiple access, mobility management, the whole resource request process. But this method has interference management, rate adaptation, and synchroniza- some disadvantages: Whether or not users use pre-scheduled tion. A good design of MAC layer technologies can greatly re- wireless resources, they are always allocated to terminals. duce network latency. In this section, we review the evolution After receiving the wireless resource scheduling, if there is no of resource scheduling schemes, multiple access technologies, data to transmit, the terminal will always upload the padding mobility management and caching technologies to help reduce data using the allocated wireless resources. This results in the the latency. waste of precious wireless resources, the power consumption of equipments and the increase of noise level. A. Scheduling Schemes In view of this, in 2016, 3GPP proposed semi-static schedul- Resource scheduling latency is also an important component ing in Release 14 to improve pre-scheduling. In semi-static of air interface latency, and a fast scheduling scheme can scheduling, even if terminals are allocated wireless resources, greatly reduce Tque, Tproc and Tretr. Fast scheduling is first they do not need to send padding data. proposed in the 3G HSDPA, which achieves more efficient In 4G, semi-static scheduling resources are generally al- scheduling and faster retransmit by transferring some radio located to each user individually. Therefore, when there are interface control functions from RNC to base station (closer many users in the network, the waste will be very large, to air interface and shorter frame length make base station because the terminal does not necessarily use the reserved scheduling faster and more efficient). wireless resources. In 5G, reserved resources can be allocated In 4G LTE before Release 14, equipment manufacturers to a group of users, and a collision resolution mechanism is generally used pre-scheduling to improve latency. The main designed when multiple users collide on the same wireless idea of this method is that base stations periodically allo- resources at the same time. In this way, the utilization of cate corresponding wireless resources to terminals, and the precious wireless resources is guaranteed while reducing the latency. In addition, the priority preemption scheduling can be mainly by shortening CP and reducing the dependence of used into 5G networks to ensure the latency requirement of synchronization. The long transmission impulse response low-latency services. If idle resources in time and frequency length leads to the long frame length of FBMC, although domain are available, the user B with high priority will the CP is shortened. In addition, the computational com- be given priority in scheduling idle resources. Without idle plexity of FBMC is much higher than OFDM, which resources available, user B will preempt the resources of other makes it unsuitable for low-latency communication. Thus, users (e.g. user A), even if the user A has been originally UFMC has improved FBMC by filtering through a set scheduled in the corresponding slot. of continuous subcarriers. GFDM replaces linear convo- lution with cyclic convolution, which reduces computa- B. Multiple Access and Modulation tional complexity and processing latency. The purpose of multiple access technologies is to enable multiple users to access the base station at the same time and VI.OPEN ISSUESOF NETWORK ARCHITECTUREAND enjoy the communication services provided by the base station, PHYSICAL LAYER FOR LOW LATENCY so as to ensure that the signals between each user will not Although there are many schemes to reduce latency, there interfere with each other. Each generation of communication are still some research directions and challenges waiting for system has its own unique multiple access technology. The researchers to explore and solve. In this section, we discuss techniques can be divided into two categories [15]: contention- some open issues and challenges for future research to further free MAC protocols and contention-based MAC protocols. reduce latency. 1) Contention-free MAC Protocols: The FDMA, TDMA, CDMA and OFDM are employed by 1G, 2G, 3G and 4G, A. Network Architecture Issues respectively. The original intention of these multiple access To achieve low latency, it is often necessary to make technologies is not to reduce the latency, but to improve tremendous changes to the existing network architecture. In the system capacity and accommodate more users. In terms the process of network reconfiguration, many challenges need of latency, FDMA has more advantages, because of its low to be overcome, which can be summarized as follows: cost of continuous transmission, no need of complex framing • For communication networks, practicality is the key. In and synchronization, no need of channel equalization and so order to truly apply a network architecture that can on. On the contrary, TDMA, CDMA and OFDM have more reduce the latency to real life, the cost problem must advantages than FDMA in system capacity and access. be considered. How to achieve low latency through the In the age of 5G, latency becomes an important consid- change and deployment of network architecture at low eration. Thus, some new low-latency multiple access tech- cost is worth studying. One of the best ways is to take nologies have emerged in 5G, such as interleave division advantage of the existing architecture. multiple access (IDMA), sparse code multiple access (SCMA), • From 2G to 5G, the current mobile communication net- non-orthogonal multiple access (NOMA), filter bank multi work is a heterogeneous network constructed by a variety carrier (FBMC), universal filtered multi-carrier (UFMC) and of network systems. How to manage heterogeneous net- generalized frequency division multiplexing (GFDM). These works in order to achieve efficient utilization of resources multiple access technologies have the following characteristics while maintaining low latency is also a research area. in reducing latency: • From the previous description, we can conclude that • IDMA simplifies the complexity of multi-user detection the network is becoming software and virtualization. (MUD) without complex transmission scheduling strat- These new virtual entities implemented by SDN and NFV egy, and thus reduce the Tproc and Tque. technology are totally different from legacy networks, and • Combining symbol mapping and spreading and introduc- they have not yet unified standards. Extensive research ing non-orthogonal sparse code domain, SCMA achieves needs to be studied on how to standardize and unify them. three times the number of connections. At the same time, • Unmanned aerial vehicles (UAVs) and satellites can be because SCMA allows users to have certain conflicts, the integrated into traditional cellular networks to reduce application of the dispatch-free technology in SCMA can latency. However, resource management and interference significantly reduce data transmission latency. control need to be addressed. • The synchronization requirement of NOMA receiving • Future networks will become ultra-dense cellular net- algorithm for different signal arrival time is not high, works with numerous small cells, which makes it possible which makes terminals can send data directly without for millimeter-wave wireless bearers. The challenge is to waiting for the base station to allocate dedicated uplink design new bearer networks and protocols to overcome resources. Compared with traditional scheduling-based interference and collision. In addition, cooperative co- resource allocation, the NOMA technology can save a transmission between small cells can also be a research request scheduling and scheduling authorization cycle, direction to reduce latency. There are many issues worth save time and network resources. studying about this direction, such as how to achieve • FBMC, UFMC and GFDM are all based on filters, which dynamic cooperation and how to reduce overhead caused can all improve spectral efficiency and reduce the latency by inter-cell interaction. B. Physical Layer Technique Issues that is, software; 4) functional entities continue to sink closer Most of physical layer technologies are not designed to to terminals; 5) network bearer gradually unified (all-optical reduce the latency. Thus, these original physical layer tech- and all-IP). On the other hand, we can draw the following nologies must be redesigned to reduce the latency. Although conclusions from the evolution of physical layer technologies: many promising solutions have been proposed to date, we 1) TTI is decreasing continuously to achieve low latency in a believe that the following issues about low latency at physical smaller packet transmission mode; 2) scheduling algorithm is layer that deserve further exploration: becoming faster and faster, even dispatch-free transmission; 3) coding mode is more flexible and changeable, no longer one- • Reducing latency may cause other performance degra- size-fits-all; 4) Multiple access technologies and modulation dation. For example, low latency is related to control methods tend to have more latitudes and higher orders to overhead (including CP, pilot, etc). Short TTI means that achieve huge connections and thus reduce queue latency; 5) the control overhead proportion increases, resulting in the Carrier frequency is higher and higher, and cell size is smaller waste of radio frequency resources. Therefore, various and smaller. Although we can see many solutions to reduce trade-offs need to be investigated, including spectrum latency through our review, there are still many difficulties and efficiency versus latency, energy efficiency versus latency, challenges to be solved in practical application. Thus, we also and throughput versus latency. give some challenges and future research topics, hoping this • Fast channel estimation algorithm and efficient symbol can be served to reduce latency for the next generation mobile detection are conducive to reducing latency. However, communication system. The low latency evolution schemes with the emergence of multi-antenna, high-order mod- involved in this paper are summarized in Table IV. ulation and new multiple access technologies, these al- gorithms for scheduling also need to be redesigned to ACKNOWLEDGMENTS reduce latency. This work was partly supported by the National Natural • With the advent of 5G, almost no single physical layer Science Foundation of China (Grant Nos. 61871023 and technology can meet the requirements of all service sce- 61931001), Beijing Natural Science Foundation (Grant No. narios. How to dynamically, efficiently and intelligently 4202054), and the Fundamental Research Funds for the Cen- coordinate various physical layer technologies to meet the tral Universities (Grant No. 2019YJS010). needs of different service scenarios is a potential research direction. Therefore, the application of machine learning REFERENCES in the physical layer has also attracted wide attention. [1] M. Series, “Imt vision–framework and overall objectives of the future • Millimeter wave is one of the promising technologies development of imt for 2020 and beyond,” Recommendation ITU, pp. of 5G, but many of its channel characteristics have 2083–0, 2015. [2] I. Parvez, A. Rahmati, I. Guvenc, A. I. Sarwat, and H. Dai, “A survey not been fully understood. Deep understanding of the on low latency towards 5g: Ran, core network and caching solutions,” characteristics of attenuation, angular spread, reflection, IEEE Communications Surveys & Tutorials, vol. PP, no. 99, pp. 1–1, Doppler effect and atmospheric absorption is helpful to 2018. [3] H. Ji, S. Park, J. Yeo, Y. Kim, J. Lee, and B. Shim, “Ultra-reliable and the design of appropriate physical layer technologies. low-latency communications in 5g downlink: Physical layer aspects,” • The propagation characteristics of short packet trans- IEEE Wireless Communications, vol. 25, no. 3, pp. 124–130, 2018. mission (with shorter TTI) and traditional packet trans- [4] M. A. Habibi, M. Nasimi, B. Han, and H. D. Schotten, “A comprehensive survey of ran architectures towards 5g mobile communication system,” mission in channel are different. In the case of large IEEE Access, 2019. packet transmission, distortion and thermal noise can [5] M. Agiwal, A. Roy, and N. Saxena, “Next generation 5g wireless be averaged. In the case of large packet transmission, networks: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 3, pp. 1617–1655, 2016. thermal noise can be averaged, but small packet trans- [6] A. Gupta and R. K. Jha, “A survey of 5g network: Architecture and mission is not feasible. Therefore, channel modeling and emerging technologies,” IEEE access, vol. 3, pp. 1206–1232, 2015. experiment under short packet transmission need to be [7] Y. YUAN and X. WANG, “5g new radio: Physical layer overview g new radio: Physical layer overview,” ZTE COMMUNICATIONS, vol. 15, further explored. no. S1, 2017. [8] B. Briscoe, A. Brunstrom, A. Petlund, D. Hayes, D. Ros, I. J. Tsang, VII.CONCLUSION S. Gjessing, G. Fairhurst, C. Griwodz, and M. Welzl, “Reducing internet latency: A survey of techniques and their merits,” IEEE Communications In this paper, we review the measures to reduce the la- Surveys & Tutorials, vol. 18, no. 3, pp. 2149–2196, 2016. tency of 2G to 5G communication from an [9] S. Srivastava and S. P. Singh, “A survey on latency reduction approaches for performance optimization in cloud computing,” in 2016 Second evolutionary perspective, including network architecture and International Conference on Computational Intelligence & Com- physical layer technologies. In order to achieve low latency, munication Technology (CICT). IEEE, 2016, pp. 111–115. on the one hand, we can draw the following conclusions [10] P. Schulz, M. Matthe, H. Klessig, M. Simsek, G. Fettweis, J. Ansari, S. A. Ashraf, B. Almeroth, J. Voigt, I. Riedel et al., “Latency critical in the evolution of network architecture: 1) fewer network iot applications in 5g: Perspective on the design of radio interface and units, that is, the Architecture tends to be flat; 2) network network architecture,” IEEE Communications Magazine, vol. 55, no. 2, elements are gradually transformed from entity to virtual pp. 70–78, 2017. [11] F. Muller, J. Sorelius, and D. Turina, “Further evolution of the /edge unit, that is virtualization; 3) data and signaling transmission radio access network,” REV(ENGL ED), vol. 78, no. 3, pp. control is gradually transformed from hardware to software, 116–123, 2001. [12] Y. B. Lin, Y. R. Haung, Y. K. Chen, and I. Chlamtac, “Mobility management: from gprs to ,” Wireless Communications & Mobile Computing, vol. 1, no. 4, pp. 339–359, 2010. [13] K. M. Shaheen, “Method and apparatus for supporting handoff and serving radio network subsystem relocation procedures in a single tunnel gprs-based wireless communication system,” Sep. 13 2007, uS Patent App. 11/626,538. [14] “The direct tunnel technology deployment in 3g network,” Information & Communications Technologies, 2009. [15] A. Tootoonchian, S. Gorbunov, Y. Ganjali, M. Casado, and R. Sherwood, “On controller performance in software-defined networks,” in Presented as part of the 2nd {USENIX} Workshop on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services, 2012.