arXiv:1408.4751v3 [cs.CR] 23 Aug 2014 inl lutaiey epooeta u ytmmyb us implementing be to of may existing directly system means t sent our an alternative access that an be have propose in must not we receiver Illustratively, it need the signal. sent though hiding carrier be person, receiving The by to the message. message individual code covert receiving Morse a allows a system to Our using its areas EchoLink. remote delivers and geographically that to and and economically characteristics sage code Morse modulates ua eoig u ytmhsavral aart,and data-rate, variable a one has manu for delivered system suitable rate is Our a message at decoding. time, coded human stations a at numbers the number) of (letter, as symbol so rate bandwidth low, data the and presumably The to is stations. CW equal to numbers uses approximately by CW or used channel or than Our less human), occupies code. or AM Morse (synthesized Modulation communicate voice Amplitude communicate or SSB Traditional to utilize (AM). Modulation stations Amplitude numbers Single- or to (SSB) compared when sideband requirements bandwidth preva communication, low HF its is range and which long noise, in unpredictable against commu- somewhat range resilience and long its for to bands due HF This nication on code). we frequently (Morse CW used format is is communication channel format covert carrier our for fo The selected criteria have these stations. of level numbers the in exceed or equal lev should data-rate, curity) (bandwidth, criteria channel our stations, remai may simplex) anonymous. generally receiver very are the channels stations, (such channel numbers remote our with and broad As to areas. access covert geographical applicati easy sys from communication our benefit covert serve, would in to that utility thought find widely still are may they tem role stati th the numbers doing serve and not false, of do are way presumptions alternative, our an If least thing. correct, at same is or assumptio stations improved, numbers the an of human make suggest purpose we covert presumed paper It with the this that code. securely In [8]. Morse communicate government assets in to by intelligence or them operated are voice allow systems to in these random letters, that seemingly speculated or areas. a is numbers broad of consists across of stations and allow series these distances band of long content this over The of operate characteristics to propagation them the rad (HF) as frequency high band, the throughout Numbe found stations. be can numbers stations to alternative economical and cure Abstract ic u oli orpaetefntoaiyo numbers of functionality the replace to is goal our Since se- a is derive soon will we system the for use potential A rpsdSse o oetCmuiainto Communication Covert for System Proposed A Acvr omncto ytmi eeoe that developed is system communication covert —A itn n ra egahclAreas Geographical Broad and Distant .I I. NTRODUCTION ubr stations numbers [email protected] . lo se- of el carrier ohaDavis Joshua radio das ed the o mes- und lent ons ons of io al rs d n n e s s - stedsrdcngrto.Frhr u ytmobfuscate system the our th degree Further, if some stations configuration. fashion, to numbers desired this to equal in the security utilized is of be message level and may a receiver system providing the Our both that anonymous. ensure stations are to Numbers a radio pads, range behavior. one-time long communication, statistical (one-way) about code simplex information utilize Morse provided when user’s user, a human a emulate can .MreCode Morse do. A. to known not are stations numbers that something r vial traoal rcs n hogotteglob the throughout and prices, reasonable at available int a are the is bands without themselves, to, HF participating listen the of to through things tunes interesting for who looking One being example). Service World prominent BBC by a (the done agencies news sometimes and is government broadcasting Shortwave broadcasting. letter short the by example, air For the over represented are tolerant. noise is bandwidt and little (CW) [6], relatively Wave speeds requires Continuous typical uniform the code called represent of Morse is to This operation. carrier off symbols. and a on code usage, turned Morse is typical frequency and In another amplitude distances. one amateu with by long communicate used to over bands widely HF is on operators It radio numbers). letters, (e.g. symbols fH rpgto r mcbet uhuae Fi some- is HF usage. as propertie such to the to referred amicable as dis- times are areas, propagation long geographic HF over large of across communicate and to tances, governments and operators Communication HF B. [3]. set symbol u code operators Morse usage, betwe typical established shared In is receiver. set and communication, via symbol the the the like that to and they’d radio, access message over has generally any receiver the transmit Using assuming units, long. may CW, units time one time one three seven code, are last last Morse words symbols between letter) between voids between (e.g. the voids voids and symbol The The one units. unit. in one time time lasting dashes) three as (dots, defined last elements is Dashes dot measure unit. A proportional generally (WPM). are time is Minute dashes which per transmission, and Words the dots in of of speed duration the to The dash. a by os oecnit fasadrie e fsmos which symbols, of set standardized a of consists code Morse transmitting of means durable and simple a is code Morse Frdocmuiaini sdb ohaaerradio amateur both by used is communication radio HF existence shortwave A srpeetdb o,followed dot, a by represented is ftecvr communication; covert the of seilyi h otx of context the in especially , hrwv listener shortwave dots n longer and Fradios HF . dashes ention ethe se at h nd en e, is d s s r . 2 so shortwave listening is a fairly popular hobby. A numbers to use HF, and so one may conceive that the advantages of station recipient may easily obtain a shortwave receiver from HF propagation might be used in conjunction with EchoLink a local electronics market, and the possession of one is not or a similar system [4] to allow the user to areas both far and itself so unusual as to arouse suspicion. broad. Governments use HF for three primary purposes. First, HF is used to link facilities and stations together. As HF is II. OUR CHANNEL generally not as fast or reliable as other forms of communica- tion (e.g. the Internet, satellite communications), such systems With the above tools to some degree understood, we com- are often used as backup or emergency systems. Embassies bine them to create a system that is part covert channel, part often have large, very visible HF antennas on their roofs. numbers station-style secured broadcast system. That is, we Second, governments use HF for propaganda. HF broadcasting devise a method to communicate over broad geographical allows the transmitter to communicate with a distant and broad areas, with the message obscured. Even if the existence of the geographical area with a single antenna, which maybe located covert message is identified, it cannot be easily deciphered. on friendly soil. Those with shortwave radios may listen to Our system is proof of concept: our channel has been devel- uncensored news and propaganda anonymously and often in oped and tested, but no significant mathematical analysis or their native language. Finally, HF is thought to be used to derivation has been attempted, and legal restrictions limit our communicate with undercover human intelligence assets, via ability to test over channels [2]. systems such as numbers stations. An agent with a shortwave Our channel as described here makes use of technologies radio copies down a code at a predetermined time and day, such as EchoLink, VHF/UHF radio, and perhaps HF. The and decodes the information using a pre-shared key (e.g. one- concept is flexible enough that it may be used over existing time pad). When a one-time pad is used, the communication is operator infrastructure (e.g. the HF radio assets governments thought to be entirely secure, as long as an adversary does not own and may already be using for traditional numbers sta- obtain access to the message recipient or the one-time-pad. tions), or other long-range radio networks such as the Internet The quality of HF communication is dependent on several Radio Linking Project (IRLP) [4]. We use EchoLink here factors, some being the weather and the time of day. To main- because it provides easy and economical access to many tain relatively high levels of reliability over long distances, diverse geographical areas. We would sometimes like to use careful planning, powerful , and large antennas are HF over EchoLink, and would if such stations where available, sometimes necessary. So, while HF serves a definite purpose, though they seem to be rare or nonexistent at this time. it is not always an optimal solution. Our covert system may Our channel modulates CW, providing in effect Morse use HF, as in traditional numbers stations, but this need not be code over . We do this by varying the statistical the case. If HF resources are not accessible or optimal for the properties of the overt (carrier) Morse code communication. transmitting individual, a system such as EchoLink can be used As microcontroller assisted keying has become more common, to deliver the covert and encoded message to the recipient. the statistical properties of Morse code have become more uniform. In particular, the standard deviation of carrier keying times have been lowered. In our channel we are forced to C. EchoLink make one more assumption: our carrier communication is Echolink [5] is a service that registered amateur radio presumably formed by a traditional electromechanical key, operators can use for free. EchoLink allows part (or all) where the tone length (and associated statistics) are directly of the communication path between amateur operators to controlled by the time the key is depressed, which is directly occur over the Internet, using Voice over Internet Protocol controlled by the very imperfect (in terms of milliseconds) (VoIP) protocols. For example, an amateur operator, having human operator. registered with the service, may communicate with another Since our channel is Morse code modulated on Morse code, ham across the globe, using their relatively short ranged Ultra and we send an element (dot, dash) with each element of the / Very High Frequency (UHF/VHF) hand-held carrier message, our channel speed is about the same as the radio, computer, or cell phone [1]. An EchoLink station will carrier keying speed, measured in words per minute. At this receive their radio communication, translate it to VoIP, and time, we have already seen the following desirable properties send it to the recipient EchoLink station, which will translate of our covert channel: the communication back to Radio Frequency (RF) and send it • Our channel is to some degree covert. Observers see the over the air to the recipient. carrier Morse code communication, whose contents are There are thousands of EchoLink nodes currently deployed innocuous. The carrier message can be formed naturally around the world, giving the user radio transmit capability by communicating with an amateur radio operator, and by to many geographic areas from the comfort of their living answering his questions or inquiring about his activities. room (or military base, or secret facility). The number of This other radio operator is unaware that our responses stations in a given area depends on the population density and and inquiries carry covert data: to all receivers but our standard of living in the area, as well as the amicability of the own decoding software, the carrier message is not un- area’s governing entity to amateur radio activity. The EchoLink usual. website [5] allows anyone to locate nodes in a given area. As • Our channel is sufficiently fast that it can be used for of this writing, most nodes use VHF/UHF, though a few report the purpose of sending coded characters “to the field”. 3

That is, it operates at CW and its associated speeds, as shifting the pseudo-random symbol time up, perhaps by one do many numbers stations. standard deviation, to represent a covert dot, and down to • Our channel is noise resilient. Morse code is used by represent a covert dash. To represent the end of a letter, symbol amateur radio operators wanting to communicate over time is left at its PRNG generated time. As the averages and large distances in part because of this property. Also, our standard deviations are taken from real human CW messages, channel relatively little bandwidth [6]. as a PRNG is used to generate the particular symbol times, • Finally, our covert channel carrier is quite innocuous. and as the offsets are constrained to some maximum time (e.g. One may find many Morse code conversations over the one standard deviation), the channel’s symbol times appear amateur radio bands at any given time. Adding our needle innocuous to those who may happen to be quantifying the to this haystack won’t raise any eyebrows, assuming we carrier message’s statistics. That is, observers do not have the don’t behave aberrantly on our carrier. PRNG seed, and they cannot determine whether the symbol The channel transmitter and receiver share three things: times have gone up or down from the shared reference. knowledge of the time and location of transmission, a seed or use with a Pseudo-random Number Generator (PRNG), and III. SOME CONSIDERATIONS the transmitter’s base keying statistics. The time and location of transmission is knowledge also required by recipients of The covert channel recipient must have decoding software. traditional numbers stations; one may not decode what they If they are familiar with the implementation specifics, coding cannot find. Numbers stations also require the receiver to have such a receiver is not prohibitively difficult. It is speculated the key, or one-time pad, available for decoding the message. that poly-tone numbers stations exist [7]. Such stations have We suggest that our PRNG seed may double as a one-time the potential to transmit information at a higher rate than pad, with arbitrary long pads derived by inputting the PRNG stations based on voice or CW. That they are not used exclu- into a message digest algorithm. It is conceivable that using sively is probably due to their increased bandwidth and lower the PRNG seed as a key may compromise the anonymity of noise resilience compared to voice and CW transmissions, the message, a consideration that must be weighed during the and the necessity that the receiver posses specialized decoding design of a particular channel implementation. software. These previous two items, the time and location of trans- Like numbers stations, our channel operates in simplex mission, and PRNG (one-time pad), are pre-shared both in (one-way) mode, and the recipient probably cannot be identi- our system, as well as in traditional numbers stations. Our fied as the carrier traffic covers a broad area, which probably channel also requires the receiver to know some information contains many individuals. Security of the covert transmission about the transmitter’s keying statistics. This information can is dependent on the strength of the encryption. It is said that be pre-shared, or it can be derived during a pre-arranged one-time pads provide perfect anonymity of a message, though training period, during which the transmitter communicates this may be to some degree compromised if the PRNG doubles unmodulated messages to unwary amateur operators, and the as the pad. The existence of the covert communication is receiver unobtrusively listens and derives statistics from the secure to the degree that realistic user behavior is emulated. transmission. The required statistics are the dot and dash For manual electromechanical keying, the standard deviation time averages, and their standard deviations. Note that in of symbols should be high enough that we may modulate our example implementation, our deterministic transmitter (a symbol time without arising suspicion, when a PRNG and Python script) emulates human statistical behavior by taking as normal distribution are used as previously mentioned. input the averages and standard deviations of a human user’s EchoLink requires registration. This suggests that an indi- keying times. vidual may be associated with the transmission. Though this With the transmitter and receiver aware of the element may seem dangerous, there are mitigating factors. First, the statistics of the pending conversation, and in possession of the covert message is difficult to identify, and so the transmitting seed, covert transmission may begin. If EchoLink is used to individual should incur no suspicion. Second, if the channel economically carry the message, the transmitting individual is used in the context of traditional numbers stations, which selects an EchoLink node in the vicinity of the recipient. are presumably operated by governments, obtaining a false The transmitter then calls CQ, and talks to whomever will amateur radio license should not be a problem. Finally, if answer. The receiver records the transmitter’s conversations, it is determined that EchoLink is too risky for a particular and when they have ended, decodes them. For each element transmission, other assets or networks may be used to transmit (dot, dash) of the overt transmit message, an element duration the covert message. Other assets would also be used when is taken from a normal distribution with the element average there is not an EchoLink node in range of the recipient. time as its mean. As the transmitter and receiver PRNGs share Multipath propagation may affect the duration of the signal the same seed, they both know what duration each element at the receiver and undermine the accuracy of the channel. To should have. Deviation from the pre-shared durations convey compensate, the code deviation time may be increased where the covert message. The use of a shared PRNG seed and multipath will cause problems. One should examine to what statistical distributions to aid in covert communication was extent increasing the code deviation undermines the statistical discussed in my previous work on web read-time modulation correctness, and thus covertness, of the channel. Fortunately [9]. the statistics in question vary from individual to individual, so The symbols for the covert message are conveyed by one may simply emulate a slower keyer when addressing this 4 issue. As standard deviations may be in multiple milliseconds, length was 180ms. The trailing noise (characters that occur multipath effects may presumably be overcome without much after the message) may be removed by arranging for a higher, effort. When digital systems such as EchoLink exist in the singular high code deviation to indicate end of message, communication path, sampling also compromises the accuracy though the presence of noise after the message deviations of the channel. The duration of an element is rounded to have ceased may indicate that the standard deviation was not conform with sampling. In the case of 8kHz sampling, the high enough for the particular test. In the demonstration test time slots are 125µs long, and standard deviations in the order we show the receiver first decoding the carrier Morse code of milliseconds should overcome this. The digital filter used message, then show it also decoding the covert message. Our in the receiver software may also have an effect on recovered software, which can be found at http://covert.codes, is still symbol duration at the receiver. under development and so replication of this experiment may Our implementation uses a sliding window in the decoder to yield different results. overcome attenuation and noise in the radio path. For example, Comprehensive radio tests were not undertaken due to a signal will be thought to exist throughout a window if [2], and we request the aid of anyone who can legally test any point in that window rises above a threshold. For noisy our system over radio paths. Until radio testing has been signals this window size is increased, for example to three undertaken, and the software adjusted in accordance with carrier frequency cycles in width. An increased window size subsequent findings, this covert channel remains a work in decreases the accuracy of the recovered element durations, as progress. the entire window is deduced to be on, or off. Again, this effect is not major, though when combined with the considerations VI. CONCLUSION above a significant deviation may exist. One may increase We have theorized a covert channel that may be able to element time standard deviation to overcome this, though as convey its covert payload anonymously and quite securely to previously mentioned this may undermine the security of the distant and broad geographical areas. The channel is flexible, channel. Further, at a certain point the standard deviation may in that it may utilize existing long-range (e.g. HF radio) become so high that the use of a PRNG and normal distribution communication systems, or free systems such as EchoLink becomes irrelevant, as the derived element times are often to carry its message economically. Should EchoLink not either very high or very low with respect to an average or be an appropriate vehicle for a given implementation, other even reasonable value. systems such as IRLP exist that may be able to fill the role. While HF, EchoLink, and numbers stations are used here Other vehicles for the channel (satellite, web streaming) are to illustrate the channel, they certainly do not comprise the speculated to be useful as well. The channel is shown to work necessary structure or limitations of our channel. Alternatives in software, where the output of the transmitter is fed directly include the use of amateur radio satellites, and reception into the receiver, but meaningful radio tests have yet to be by web streaming (many repeaters have an online listening conducted due to [2]. feature.) Numbers stations are a fine illustrative vehicle for this channel, though use for it may certainly be found elsewhere. REFERENCES [1] Echolink android application. bit.ly/1vikIJY. Accessed 2014-08-5. IV. IMPLEMENTATION [2] Fcc part 97. 1.usa.gov/1q0SIbW. Accessed 2014-08-16. Our proof of concept implementation was first written as a [3] International morse code. http://www.w1wc.com/morse code/. Accessed 2014-08-16. general Morse code encoder and decoder, and later extended [4] Internet radio linking project. http://irlp.net/. Accessed 2014-08-6. to include the covert channel described here. It was written [5] Introducing echolink. http://www.echolink.org. Accessed 2014-08-5. in Python, and is available from http://covert.codes. Our im- [6] An intuitive explanation of cw bandwidth. http://www.w8ji.com/cw bandwidth described.htm. Accessed 2014-08- plementation modifies Morse code symbol time as described 5. above, with some particulars. When a symbol time is selected [7] Japanese slot machine (xsl). bit.ly/XAMYMY. Accessed 2014-08-16. from a normal distribution with the pseudo-random duration as [8] The spooky world of the ’numbers stations’. http://www.bbc.com/news/magazine-24910397. Accessed 2014-07- its mean, the time is wrapped around via a modulo operation so 22. that it does not fall outside one standard deviation. A standard [9] Joshua Davis. A covert channel based on web read-time modulation. deviation above the derived duration conveys a dot, and a 2014. Available on http://covert.codes. standard deviation below conveys a dash. No change conveys letter demarcation, and durations of two standard deviations from the mean are used to signal the end of a word.

V. EXPERIMENT Several local experiments were run at the development site to ensure that the transmitter and receiver were able to communicate covert data over a seemingly innocuous carrier message. One of these tests is shown after the conclusion. In this experiment, both dot and dash standard deviations were set to be 10ms, dot average length was 60ms, and dash average 5

$ ./cwtx.py −o call.wav −m ”cq cq cq calling cq this is XXXXXX testing a radio system. forgive any interruption. have a good day.” −s statfile −c ”Mr. Watson come here, I want to see you.” −k ”secret” ∗∗ Saving output to call.wav ∗∗ Using coding frequency 900 Hz ∗∗ Using sampling frequency 8000 Hz ∗∗ Getting statistics from statfile ∗∗ Generating audio ∗∗ Finished.

$ ./cwstats.py −i call.wav

∗∗ Read 491101 samples from call.wav with sample frequency 8000 Hz and encoding pcm16 ∗∗ Signal average: 0.000234056981508 ∗∗ Using dominant frequency: 898 Hz ∗∗ Filtering to between 852 and 942 Hz ∗∗ Decoding with tolerance: 0.300

Message: cq cq cq calling cq this is XXXXXX testing a radio system. forgive any interruption. have a good day.

∗∗ Finished

$ ./cwstats.py −i call.wav −c statfile −k ”secret” ∗∗ Read statistics for covert demodulation from statfile

∗∗ Read 489533 samples from call.wav with sample frequency 8000 Hz and encoding pcm16 ∗∗ Signal average: 0.000224008610111 ∗∗ Using dominant frequency: 898 Hz ∗∗ Filtering to between 852 and 942 Hz ∗∗ Decoding with tolerance: 0.300

Message: cq cq cq calling cq this is XXXXXX testing a radio system. forgive any interruption. have a good day.

Decoded: mr. watson come here, i want to see you. a neee emz myitene nrtw ik

∗∗ Finished