Trans. Japan Soc. Aero. Space Sci. Vol. 55, No. 3, pp. 166–174, 2012

A Strategy to Determine Whether to Use GPU for a Mission Scheduling Algorithm

By Soojeon LEE, Byoung-Sun LEE and Jaehoon KIM

Satellite System Research Team, Electronics and Telecommunications Research Institute, Daejeon,

(Received June 9th, 2011)

As the first Korean multi-mission geostationary satellite, Chollian was launched on June 27, 2010. Chollian is being successfully controlled using a satellite ground control system (SGCS) developed by ETRI. A mission planning subsys- tem (MPS) in SGCS gathers mission requests from users, performs complex mission scheduling, and generates a conflict- free mission schedule. In this paper, we provide an overview of the current mission scheduling algorithms of the Chollian satellite, select three representative constraint checking schemes among these algorithms, and implement new graphics processing unit (GPU)-based constraint checking schemes for the three representative schemes. We compare the performance of the GPU-based and CPU-based constraint checking schemes based on the size of the problem set and the time complexity of the problem. Finally, we suggest a strategy to determine whether or not to adopt GPU for a satellite mission scheduling algorithm.

Key Words: Mission Scheduling Algorithm, Mission Planning System, COMS, Chollian Satellite, CUDA, GPU

1. Introduction Like the mission scheduling algorithms of other Korean , Arirang-2, Arirang-3 and Arirang-5,7–9) as well As a multi-mission geostationary satellite, Chollian, also as most of the general mission scheduling algorithms,10–13) called the Communication, Ocean, and Meteorological the mission scheduling algorithms used for the Chollian Satellite (COMS), was launched on June 27, 2010. The satellite are based on a central processing unit (CPU). In this Chollian satellite is located at 128.2 degrees East longitude work, we implement graphics processing unit (GPU)-based and 36,000 km from the Earth. This makes Korea the tenth Chollian mission schedule algorithms and compare the country in the world to develop a geostationary communica- performance with CPU-based ones. Even though a GPU is tions satellite, which will operate over the next seven years. basically used for graphics computations, general-purpose The Chollian satellite has three different payloads for three computation on a GPU (GPGPU)14) has become a reality different purposes: satellite communications, ocean obser- over the last few years. A GPU enables fast parallel process- vations and meteorological observations. Especially for ing for massive data by allowing thread-level parallelism on the satellite broadcasting and telecommunications, there hundreds of multi-cores. has been various research1–3) conducted and ETRI devel- Among the various steps used in the Chollian satellite’s oped the Ka-band communications payload for the Chollian mission scheduling algorithms, this paper focuses on the satellite. For the operation of the Chollian satellite, several ground segments cooperate as shown in Fig. 1. Users from the Communications Test Earth Station (CTES), the Korea Ocean Satellite Center (KOSC), and the Meteorological Satellite Center (MSC) submit mission requests to the COMS S/L-band link Ka-band link

link satellite ground control system (SGCS). The image data L-band link acquisition and control systems (IDACSs) in KOSC and S/L-band MSC receive raw data from the satellite and perform image preprocessing to generate Level1B data. Even if each user provides a perfect conflict-free mission Korea Ocean Satellite Meteorological Satellite Satellite Operations Center Communications Test Earth request for his/her own organization, conflicts can fre- Center (KOSC) Center (MSC) (SOC) Station (CTES) Satellite ground control -Satellite communications system (SGCS) -Communication system quently occur after mixing the requests received from differ- Image data Image data -Tracking, telemetry, & monitoring and control acquisition & control acquisition & control commanding system (IDACS) system (IDACS) ent organizations. In one of the examples, if meteorological -Satellite operations -Mission planning -Flight dynamics imaging and oceanic imaging are both executed at the same operation Satellite ground control -Satellite simulation time, degradation in image quality may occur in a meteoro- system (SGCS) Image data acquisition & control system logical image. To prevent these problems, Chollian-specific (IDACS) mission scheduling algorithms4–6) are required. Fig. 1. Chollian ground segment architecture. Ó 2012 The Japan Society for Aeronautical and Space Sciences May 2012 S. LEE et al.: A Strategy to Determine Whether to Use GPU for a Satellite Mission Scheduling Algorithm 167

Satellite ground control system Table 1. Categories of the constraint checking schemes. (SGCS) Category Constraint checking schemes Mission planning Chollian subsystem (MPS) MI image properties CheckMIMaxDuration - Mission request gathering CheckMIImageBoundary

S-band link - Mission scheduling - Mission schedule CheckExclusion reporting Overlap of missions CheckInclusion

Telemetry, tracking, Real-time operations Flight dynamics Predecessor-successor relations CheckSequence and command (TTC) subsystem (ROS) subsystem (FDS) - Orbit determination CheckNonSequence - Telemetry reception - Telemetry processing and prediction - Command transmission - Telemetry analysis - Station-keeping and CheckMaxTimeGap - Tracking and ranging - Command planning re-location planning - Control and monitoring - Telecommand processing - Satellite event prediction CheckMinTimeGap - Satellite fuel accounting

COMS simulator subsystem (CSS) 2.2. Mission scheduling steps - Satellite static simulation - Command verification 2.2.1. MI and GOCI algorithms - Anomaly simulation For MI mission requests received from the MSC, image duration calculation, scan coordinate conversion, and pro- Fig. 2. Functional block diagram of SGCS. portional command generation are performed by the MI algorithm. For GOCI mission requests received from KOSC, the displacement angle of the mirror pointing mech- constraint checking schemes, which are able to be applied anism is calculated by the GOCI algorithm. for other satellites’ mission scheduling, as well. Constraint 2.2.2. Constraint check checking schemes used for the Chollian satellite’s mission Constraints are checked using pre-defined Chollian- scheduling algorithms are categorized into three groups. specific relation rules such as exclusion, inclusion and pred- For a representative scheme in each category, we implement ecessor-successor relationships among missions, including a GPU-based version and compare the performance with a event information and maneuver requests. CPU-based one. Finally, we suggest a simple but efficient 2.2.3. Priority check strategy to determine whether or not to use GPU for a satel- If missions that have an exclusion relation and different lite mission scheduling algorithm. priorities overlap with each other, the mission having lower priority is always discarded based on the priority rules. 2. Mission Planning Overview 3. Analysis of the Constraint Checking Schemes The SGCS of the Chollian satellite enables the satellite operator to execute the satellite missions14) and control 3.1. Categories the satellite. The SGCS consists of five subsystems: tele- The constraint checking schemes of the Chollian satellite metry, tracking and command (TTC), real-time operations can be categorized as shown in Table 1. subsystem (ROS), mission planning subsystem (MPS), 3.1.1. MI image properties flight dynamics subsystem (FDS) and COMS simulator There are two regulations regarding the MI imaging itself. subsystem (CSS), as shown in Fig. 2. The mission sched- First, the imaging duration of each observation area must be uling algorithms in this paper are implemented in an smaller than its maximum duration limit. Second, the imag- MPS. ing boundary of each observation area must be within its 2.1. Meteorological and oceanic missions maximum boundary limit. These two regulation schemes are 2.1.1. Missions via meteorological imager called CheckMIMaxDuration and CheckMIImageBoundary. There are three meteorological imager (MI) observation 3.1.2. Overlap of missions modes: global, regional and local. The global mode includes There are two general rules dealing with the overlapping full disk (FD) imaging that covers the entire Earth. FD imag- of missions: exclusion and inclusion. The former is a reg- ing normally takes less than 1,620 s. The regional mode in- ulation stating that missions that have an exclusion relation cludes the Asia-Pacific Northern Hemisphere (APNH), the must not be executed simultaneously, while the latter states Extended Northern Hemisphere (ENH), and the Limited that mission A must be executed within mission B’s time Southern Hemisphere (LSH) areas. The local mode includes window if there is an inclusion relationship, A B, be- the Local Area (LA) imaging, which covers a randomly se- tween them. These two regulation schemes are called lectable area in the FD boundary. CheckExclusion and CheckInclusion. 2.1.2. Missions via geostationary ocean color imager 3.1.3. Predecessor-successor relations A geostationary ocean color imager (GOCI) takes oceanic Four rules exist depending on the predecessor-successor images around the Korean peninsula. A GOCI image can in- relations. The first rule is about the sequence of missions. clude a maximum of 16 slots, and contiguous slots have If there is a sequence rule A ! B ! C, mission A must overlapping areas. be followed by mission B, while mission B must be followed 168 Trans. Japan Soc. Aero. Space Sci. Vol. 55, No. 3

Fig. 3. Pseudo code of CheckMIMaxDuration. by mission C. There must not be any missions between them. The second rule regulates that some specific se- Fig. 4. Pseudo code of CheckExclusion. quences must not be established. This is the opposite of the first rule. For example, the rule ^ðA ! B ! CÞ means that the sequence A ! B ! C must not be allowed. The third and fourth rules are about the time gap between two missions. If mission A precedes mission B, the spacing time between the end time of mission A and the start time of mission B must be smaller than its allowable maxi- mum limit in the former, while it must be larger than its allowable minimum limit in the latter. The four regulation schemes are called CheckSequence, CheckNonSequence, CheckMaxTimeGap and CheckMinTimeGap. 3.2. Selection of representative constraint checking schemes Among the constraint checking schemes, we select one rep- resentative scheme in each category: CheckMIMaxDuration, CheckExclusion and CheckNonSequence. The reason these are chosen is that they are the most frequently used in this category and generate most of the conflict messages during normal mission planning scenarios. For the selected three schemes, which are CPU-based and Fig. 5. Pseudo code of CheckNonSequence. are currently used for the Chollian satellite mission plan- ning, we implement corresponding GPU-based schemes. Let R be the number of exclusion rules and N be the We then compare the performance of the CPU- and GPU- number of missions in the mission request. In Fig. 4, based schemes. GetOverlapMissions( ) consumes O(N) time, and thus the 3.3. Analysis of the CPU-based constraint checking time complexity of this scheme is O(RN2). Note that as of schemes May 2011 there are 48 exclusion rules in the Chollian MPS. 3.3.1. CheckMIMaxDuration 3.3.3. CheckNonSequence CheckMIMaxDuration can be described as shown in Figure 5 shows the pseudo code of CheckNonSequence. Fig. 3. MissionRequest and mi indicate a mission request A non-sequence rule consists of a pair or triplet of mission and MI mission, respectively. For all missions in a mission IDs (i.e., (first ID, second ID) or (first ID, second ID, third request, it checks whether the duration of an MI mission ID)). For each of the non-sequence rules, it loops for all exceeds its maximum limit. If it does, it sets the mission’s of the missions in the mission request. It checks if the rule’s conflict flag to true. Let N be the number of missions in first ID is equal to a mission’s ID. If it is, that mission is the mission request. The time complexity of this scheme called first. Next, it obtains the next mission of first, which is then O(N). is called second. If second is null, it means that first is the 3.3.2. CheckExclusion last mission in the mission request, so it breaks the loop. Figure 4 shows the pseudo code of CheckExclusion.An Otherwise, it compares the second ID of the rule and the exclusion rule consists of a pair of mission IDs (i.e., source ID of second. If they are the same, it checks the length of ID and destination ID). For each of the exclusion rules, it the rule. If the length of the rule is two, the conflict flags loops for all missions in a mission request. It then checks of first and second are set to true and it breaks the loop. If if the rule’s source ID is equal to a mission’s ID. If it is, that the length of the rule is three, it obtains the next mission mission is called src. The scheme then obtains all of the of second, which is called third. If third is null, it means that overlapping src missions and searches those having an ID second is the last mission in the mission request, so it breaks equal to the exclusion rule’s destination ID by looping over the loop. Otherwise, it compares the third ID of the rule and the overlapping missions. If found, the mission is called the ID of third. If they are the same, the conflict flags of first, dest, and the conflict flags of the src and dest are set to true. second and third are set to true and it breaks the loop. May 2012 S. LEE et al.: A Strategy to Determine Whether to Use GPU for a Satellite Mission Scheduling Algorithm 169

Multiprocessor 16

Multiprocessor 2 Multiprocessor 1 Shared memory

Instruction unit Core 1 Core 2 Core 3 Core 4 Register 1 Register 2 Register 3 Register 4 (a) C# caller Core 5 Core 6 Core 7 Core 8 Register 5 Register 6 Register 7 Register 8

Constant cache Texture cache

Fig. 6. Architecture of Nvidia GeForce GTS 250.

(b) CUDA host Let R be the number of non-sequence rules and N be the number of missions in the mission request. In Fig. 5, GetNextMission( ) consumes O(1) time, and thus the time complexity of this scheme is O(RN). Note that as of May 2011, there are 11 non-sequence rules in the Chollian MPS. (c) CUDA device 4. Analysis of the GPU-based Constraint Checking Schemes Fig. 7. Pseudo code of cudaCheckMIMaxDuration.

To utilize the power of the GPU, we adopt Nvidia’s Com- pute Unified Device Architecture (CUDA).15) Figure 6 4.1. cudaCheckMIMaxDuration shows a simplified architecture of Nvidia GeForce GTS cudaCheckMIMaxDuration can be described as shown in 250. There are 16 multiprocessors and each multiprocessor Fig. 7. As in Fig. 7(a), the durations and IDs of all missions contains eight cores. Thus, in total 128 cores can operate at in the mission request are stored into dur array and the same time. In a multiprocessor, the instruction unit ID array. Using its parameters such as array count and makes each core execute the same instruction and CUDA conflict flag array, the C# caller calls the CUDA host threads running on each core can share data through a shared through the CUDA wrapper. The arrays are copied into memory. the device memory and, as in Fig. 7(b), the CUDA kernel To compare the performance of CPU-based conflict CUDACheckMIMaxDuration is invoked. The thread block checking schemes, we implement the GPU versions, size threadsPerBlock is set to 256 because it is a common cudaCheckMIMaxDuration, cudaCheckExclusion and choice, as proposed by Ref. 18). In a kernel invocation, cudaCheckNonSequence. (threadsPerBlockblocksPerGrid) CUDA threads are exe- The Chollian satellite’s MPS was developed using C# cuted. As shown in Fig. 7(c), each CUDA thread checks language on a .NET framework. However, the CUDA pro- whether a mission with index i has a conflict, and sets its gramming model basically supports C language, even conflict flag to true if it does. though there have been other efforts, e.g., GPU.NET16) or 4.2. cudaCheckExclusion CUDA.NET17) to enable CUDA to run on the .NET frame- cudaCheckExclusion can be described as in Fig. 8. As work. These efforts are mostly commercial and/or have not shown in Fig. 8(a), the start times, end times and IDs of been proven well in the industry thus far. Therefore, we all missions in the mission request are stored into compile C style CUDA codes with an Nvidia C compiler start time array, end time array and ID array. Likewise, (NVCC) and generate .dll files. The compiled .dll files all exclusion rules’ source IDs and destination IDs are stored can be called and used in MPS via the C#’s attribute, into corresponding arrays. Then, the C# caller calls the DLLImport. CUDA host through the CUDA wrapper. The arrays are For simplicity, we divide GPU-based constraint checking copied into the device memory as shown in Fig. 8(b). schemes into three parts: C# caller, CUDA host and CUDA Compared to the cudaCheckMIMaxDuration, device. For instance, the C# caller calls the CUDA host with cudaCheckExclusion requires many more CUDA threads its parameters through a CUDA wrapper. Then, the CUDA and blocks to run, as the time complexity of this problem host allocates device memory, copies the necessary data when using a CPU is basically O(RN2), as shown in from the host memory to the device memory, and invokes Fig. 4. Even though, theoretically, 4,294,967,296 the CUDA kernel in the CUDA device part. The CUDA (65,53565,535) blocks can be used by a kernel, we cannot kernel runs with the data in the device memory, and after use them maximally due to several constraints, such as the the CUDA kernel finishes its execution, necessary data are total amount of available memory on a device. copied back from the device memory to the host memory. To schedule a normal Chollian satellite mission request, Finally, the C# caller can refer to the data in the host the number of CUDA threads or number of blocks frequently memory through the CUDA wrapper. exceeds the above systematic limits, which leads to a pro- 170 Trans. Japan Soc. Aero. Space Sci. Vol. 55, No. 3

(a) C# caller (a) C# caller

(b) CUDA host

(b) CUDA host (c) CUDA device

Fig. 9. Pseudo code of cudaCheckNonSequence.

4.3. cudaCheckNonSequence cudaCheckNonSequence can be described as shown in Fig. 9. As shown in Fig. 9(a), the IDs of all missions in the mission request are stored into the ID array. Likewise, all first, second and third IDs of the non-sequence rules are stored into corresponding rule arrays. Note that the third ID can be null if a rule is not a triplet but a pair. Then, the C# (c) CUDA device caller calls the CUDA host through the CUDA wrapper. The arrays are copied into the device memory and Fig. 8. Pseudo code of cudaCheckExclusion. cudaCheckNonSequence is invoked as in shown in Fig. 9(b). In Fig. 9(c), rule index i, and mission index j in the mission gram crash. To prevent this abnormal situation, we divide request are acquired, and we check whether a conflict one large kernel invocation into pieces. Thus, the kernel exists in the CUDA thread. If mission j and mission j+1 CUDACheckExclusion is invoked multiple times through have a conflict, the conflict flags of the two missions are looping depending on the amount of calculation. We set set to true. If mission j, mission j þ 1 and mission j þ 2 have the maximum number of blocks to 65,535, and the maxi- a conflict, the conflict flags of the three missions are set mum number of threads to 16,776,960 (65,535256), for to true. a kernel invocation even though we can obtain a higher performance using more than 65,535 blocks. loop cnt repre- 5. Performance Evaluation sents the number of necessary kernel invocations. In Fig. 8(c), rule index i, mission index j, and mission index We compare the execution times of CPU-based conflict k in the mission request are acquired, and mission j and checking schemes with GPU-based ones. Table 2 shows mission k’s conflict flags are set to true if a conflict exists the experiment environment in terms of hardware and between them. software. May 2012 S. LEE et al.: A Strategy to Determine Whether to Use GPU for a Satellite Mission Scheduling Algorithm 171

Table 2. Experimental environment (H/W and S/W).

CPU IntelÒ CoreÔ i7, [email protected] GHz Memory 12.0 GB Display adaptor Nvidia GeForce GTS 250 —CUDA compute capability = 1.1 —Number of multi processors = 16 —Number of CUDA cores = 128 OS Windows Vista Business K, 64 bit, SP 1 CUDA S/W CUDA Driver v3.2, CUDA Toolkit v3.2 Nvidia GPU Computing SDK 3.2 Fig. 11. Execution times of CheckMIMaxDuration and cudaCheckMIMaxDuration.

Execution time Only MI missions are affected by this constraint checking Core 1 T1 T9 T17 T25 T33 T41 T49 T57 ... scheme. Thus, we use only MI mission requests to compare Core 2 T2 T10 T18 T26 T34 T42 T50 T58 ... the performance. The MI mission request used, which was Core 3 T3 T11 T19 T27 T35 T43 T51 T59 ... Core 4 T4 T12 T20 T28 T36 T44 T52 T60 ... actually used during the Chollian satellite’s in-orbit test Core 5 T5 T13 T21 T29 T37 T45 T53 T61 ... (IOT) phase, is for the day of Sept. 21, 2010, and consists Core 6 T6 T14 T22 T30 T38 T46 T54 T62 ... of 302 missions in total, i.e., 49 MI sequences, 49 block Multiprocessor N Core 7 T7 T15 T23 T31 T39 T47 T55 T63 ... body calibrations and 204 MI missions (7 FDs, 37 APNHs, 19) Core 8 T8 T16 T24 T32 T40 T48 T56 T64 ... 37 ENHs, 86 LAs, and 37 LSHs). To repeat the above one-day MI mission request, we Warp 1 Warp 2 generate multiple-day MI mission requests, i.e., for 2, 4, 8, Fig. 10. Execution of CUDA threads in a multiprocessor. 16, 32, 64, 128, 256, 512, 1,024 days. For instance, the 1,024-day MI mission request contains 309,248 (1,024302) There are a number of general ways17) to further increase missions and covers from Sept. 21, 2011 to July 10, 2013. the performance of a GPU-based approach: using shared There are three reasons why we generate a multiple-day memory instead of global memory, avoiding a bank conflict mission request by repeating a one-day mission request in shared memory, coalesced global memory access, less use instead of using a real multiple-day request. First, a mission of conditional branches and so on. request during an IOT is not always similar to that during The nature of the Chollian conflict-checking schemes, normal operation. Various experimental tests are performed however, requires storing the conflict results within multiple during an IOT, and thus a mission request on a particular if-statements, which prevents the parallel execution of day can be quite uncommon and different with that of threads in a warp and causes a huge delay. Figure 10 another day. This can cause a bias in the experiment. describes the execution flow of CUDA threads in a multi- Second, the mission request for the day of Sept. 21, 2010 processor and explains why conditional branches cause a was quite close to a normal operation scenario. delay. Here, 32 CUDA threads, of which a warp consists, Thus, we can acquire a near-real multiple-day mission re- should execute the same instruction. A multiprocessor has quest by repeating it. Third, a sufficient number of real mul- only eight cores, so 32 CUDA threads on a warp cannot tiple-day mission requests have not been accumulated as of execute simultaneously. Thus, a unit of eight CUDA threads May 2011, and thus we cannot obtain real large multiple- executes serially in a warp. For instance, if eight CUDA day (512 or 1,024 days) mission requests. threads (T1–T8) in Warp 1 have conditional branches, it is Figure 11 shows the execution times of possible that one CUDA thread (T1) goes into if-clause, CheckMIMaxDuration and cudaCheckMIMaxDuration. while the others (T2–T8) go into else-clause. In this case, When the size of a mission request is large, e.g., from all the others (T2–T8)inelse-clause should wait until 512 to 1,024, cudaCheckMIMaxDuration outperforms the one (T1)inif-clause finishes because the cores should CheckMIMaxDuration due to its parallelism; 128 CUDA execute the same instruction. Thus parallel execution of cores simultaneously execute the device code described threads in a warp cannot be accomplished. in Fig. 6. However, when the size of a mission request is Despite this fundamental disadvantage, GPU-based relatively small, e.g., from 1 to 128, CheckMIMaxDuration conflict-checking schemes tend to show a better perfor- shows a shorter execution time than mance if the size of the problem set becomes larger, or cudaCheckMIMaxDuration. With the same size mission the time complexity of the CPU-based schemes is higher. requests, cudaCheckMIMaxDuration shows rather constant 5.1. CheckMIMaxDuration vs. cudaCheckMIMaxDuration execution times. Why is there no advantage of parallelism The maximum duration limits of the FD, APNH, ENH, in this case? It is because of the default CUDA setup over- LSH and LA are 1,620, 243, 742, 396 and 60, respectively. head. In a CUDA kernel invocation, CUDA setup is neces- If the duration of an MI mission is larger than its limit, an sary to initialize the CUDA context on the GPU, allocate alarm must be sent to the mission operator. memory and release the CUDA context. Even when a 172 Trans. Japan Soc. Aero. Space Sci. Vol. 55, No. 3

(a) Comparison of execution time Fig. 13. Execution times of CheckNonSequence and cudaCheckNonSequence.

Figure 12(a) shows the execution times of CheckExclusion and cudaCheckExclusion. cudaCheckExclusion shows a shorter execution time than CheckExclusion. For example, when the size of a mission request is eight, the execution times of CheckExclusion and cudaCheckExclusion are 7,285 and 904 s, respectively; cudaCheckExclusion is nearly 8.1-times faster than CheckExclusion. Figure 12(b) shows the execution time ratio of CheckExclusion to cudaCheckExclusion. In each case, (b) Execution time ratio cudaCheckExclusion outperforms CheckExclusion. When the size of a mission request is one, cudaCheckExclusion is only 1.8-times faster than CheckExclusion, but converges to 4.5-times faster in a 1,024-day mission request. Figure 12(c) shows the correlation between the number of kernel invocations and the execution time of cudaCheckExclusion per kernel invocation. The former and latter are expressed with red bars and blue dots, respectively. As the number of missions N in a mission request becomes larger, the number of kernel invocations increases in proportion to N2. The execution time per (c) Execution time per kernel invocation kernel invocation is converged to a certain level, i.e., 85 ms. 5.3. CheckNonSequence vs. cudaCheckNonSequence Fig. 12. Execution Times of CheckExclusion and cudaCheckExclusion. We experimented on the same mission requests described in section 5.2. Figure 13 compares the execution times of CUDA kernel has nothing to do, about 60–70 ms of CUDA CheckNonSequence and cudaCheckNonSequence. setup overhead exists in this case. It means that a CUDA When the size of a mission request is less than or equal to kernel invocation for a small calculation is not an efficient 32, CheckNonSequence shows a shorter execution time than way of solving a problem. cudaCheckNonSequence. However, if the size of a mission 5.2. CheckExclusion vs. cudaCheckExclusion request is larger than 32, cudaCheckNonSequence is faster For the MI mission request described in section 5.1, we than CheckNonSequence. This trend is similar to that add other types of mission requests. They were also used described in section 5.1. In both cases, as shown in Figs. 11 during the Chollian satellite’s IOT phase for Sept. 21, and 13, GPU-based schemes show rather constant execution 2010. Thus, the mission request we use consists of 346 mis- times with small problem sets, and outperform the CPU- sions in total, i.e., 49 MI sequences, 49 block body calibra- based schemes in large problem sets. This is because the tions, 204 MI missions (7 FDs, 37 APNHs, 37 ENHs, 86 time complexity of the problem itself is basically the same LAs, and 37 LSHs), 8 GOCI sequences, 8 GOCI shutter as O(N) for both schemes. opens, 8 GOCI shutter closes, 8 GOCI missions, 9 events In summary, Fig. 14 suggests a strategy to determine (1 apogee crossing, 1 perigee crossing, 2 nodal crossings, whether or not to adopt GPU for a satellite mission schedul- 1 eclipse by the Earth, 2 sensor intrusions by the Sun, 2 ing algorithm. If the time complexity of a problem is high, sensor intrusions by the Moon), 1 North-South station using GPU is likely to be a good choice. Otherwise, consider keeping, and 2 wheel-offloadings.14) the size of the problem set; use GPU only if the problem set By repeating the above one-day mission request, we gen- is large. This strategy can be an efficient guide while deter- erate multiple-day mission requests, i.e., for 2, 4, 8, 16, 32, mining the adoption of GPU in the satellite mission sched- 64, 128, 256, 512, 1,024 days, as described in section 4. uling areas. May 2012 S. LEE et al.: A Strategy to Determine Whether to Use GPU for a Satellite Mission Scheduling Algorithm 173

5) Lee, S., Jung, W. and Kim, J.: Task Scheduling Algorithm for the Analyze Is time Communication, Ocean, and Meteorological Satellite, ETRI J., 30 Y Adopt GPU Stop time complexity complexity high? (2008), pp. 1–12. 6) Lee, S., Jung, W. and Kim, J.: Scheduling North-South Mirror Motion Y N between Two Consecutive Meteorological Images of COMS, KOSST J., 3 (2009), pp. 26–31. Analyze Is problem Start the size of N Refuse GPU 7) Lee, B. and Kim, J.: Design and Implementation of the Mission set large ? problem set Planning Functions for the KOMPSAT-2 Mission Control Element, J. Astron. Space Sci., 20 (2003), pp. 227–238. 8) Kim, J., Jung, W., Lee, B., Hwang, Y., Kim, I., Lee, S. and Kim, H.: Fig. 14. A strategy determining whether or not to adopt GPU for a mis- KOMPSAT-3 Mission Control Element System Detailed Design sion scheduling algorithm. Document, ETRI Technical Document, MCE-SYS-006, ver. A, 2009. 9) Kim, J., Jung, W., Lee, B., Hwang, Y., Kim, I., Lee, S. and Kim, H.: 6. Conclusion KOMPSAT-5 MCE Mission Planning Subsystem Detailed Design Document, ETRI Technical Memo, 2009. We implemented GPU-based conflict checking schemes 10) Barbulescu, L., Watson, J., Whitley, L. and Howe, A.: Scheduling and compared their execution times with those of CPU- Space-Ground Communications for the Air Force Satellite Control Network, J. Scheduling, 7 (2004), pp. 7–34. based schemes used for the Chollian satellite. A performance 11) Manner, R. and Manderick, B.: Parallel Problem Solving from Nature evaluation showed that execution time can be reduced 2, North-Holland, Amsterdam, 1992. tremendously through using a GPU in some cases. Even 12) Caseau, Y. and Laburthe, F.: Cumulative Scheduling with Task Inter- though the mandatory use of multiple conditional branches vals, Proc. Joint International Conference on Logic Programming, 1996, pp. 363–377. in GPU-based conflict checking schemes caused a signifi- 13) Kramer, L. A. and Smith, S. F.: Task Swapping for Schedule Improve- cant delay, one of the GPU-based schemes was maximally ment: A Broader Analysis, Proc. International Conference on Auto- 8.1-times faster than its corresponding CPU-based version. mated Planning and Scheduling, 2004, pp. 235–243. In general, if the time complexity of a problem is less than 14) Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Kru¨ger, J., Lefohn, A. and Purcell, T. J.: A Survey of General-Purpose Computa- or equal to O(N), the benefit of a GPU-based scheme over a tion on Graphics Hardware, Proc. Eurographics, 2005, pp. 21–51. CPU-based scheme increases as the size of the problem set 15) CUDA Background, http://www.nvidia.com/object/what is cuda new. increases. If the time complexity of a problem is larger than html O(N), the benefit of a GPU-based scheme over a CPU-based 16) GPU.NET, http://www.tidepowerd.com/ 17) Cloud Services for GPU Computing, http://www.hoopoe-cloud.com/ scheme converges to a certain level if the size of the prob- Solutions/CUDA.NET/Default.aspx lem set is large enough. On the contrary, if the problem 18) NVIDIA CUDATM-NVIDIA CUDA C Programming GUIDE version set is small, adoption of a GPU may not be an attractive 3.1, 2010. solution. Thus, when we adopt the power of a GPU into very 19) Basiege, F. et al.: COMS Mission Planning Specifications (SYS-48), Technical Document, COMS.SPT.00005.DP.T.ASTR, ver. critical areas such as satellite control, a detailed analysis 03.05, 2008. regarding the size of the problem set, time complexity of the problem, and programming/maintenance cost must be Appendix performed a priori. For those who are not familiar with the pseudo codes in The next Korean GEO satellite projects after the Chollian Figs. 3–5 and 7–9, flowcharts of the CPU- and GPU-based will require much larger mission requests and more complex conflict checking schemes are shown in this section. mission scheduling algorithms due to the improved imaging A. CPU-based schemes performance of the new satellite imager. Likewise, not only Figure 15 shows the flowcharts of the CPU-based conflict for the mission scheduling, but also for the other parts of checking schemes. Note that, even though the flowcharts satellite ground systems, such as event prediction, image of CheckExclusion and CheckNonSequence look the same, processing, image analysis and weather forecasting, larger the time complexity of the two is different. When checking amounts of data will be generated and more complex predic- whether a rule is violated or not, CheckExclusion requires an tion models will be applied. Based on this work, we plan to internal loop while CheckNonSequence does not. Thus, for utilize the power of the GPU as much as possible for upcom- CheckExclusion, CheckExclusion and CheckNonSequence, ing satellite projects. one, three and two loops are required, respectively. B. GPU-based schemes References Figure 16 shows the flow charts of the GPU-based 1) Lim, J., Back, G. and Yun, T.: Polarization-Diversity Cross-Shaped Patch conflict checking schemes. cudaCheckMIDuration and Antenna for Satellite-DMB Systems, ETRI J., 32 (2010), pp. 312–318. cudaCheckNonSequence finishes in just one CUDA kernel 2) Park, J., Ahn, D., Lee, H. and Park, D.: Feasibility of Coexistence of invocation, thus no loop is required for them. Instead, check- Mobile-Satellite Service and Mobile Service in Cofrequency Bands, ing whether a rule is violated is performed in each CUDA ETRI J., 32 (2010), pp. 255–264. 3) Hwang, Y., Lee, B., Kim, Y., Roh, K., Jung, O. and Kim, H.: GPS- thread, in parallel. However, cudaCheckExclusion may re- based Orbit Determination for KOMPSAT-5 Satellite, ETRI J., 33 quire a loop because the time complexity of the problem (2011), pp. 487–496. is sometimes too large to be solved in a CUDA kernel invo- 4) Lee, S., Jung, W. and Kim, J.: CFTUE: Construct First-Finished Task cation. Thus, the problem is divided into many pieces and Avoiding Unconditional Exclusion Algorithm for Single-Resource Satellite Mission Scheduling, Proc. International Communications each piece is executed in a CUDA kernel with multiple Satellite Systems Conference, AIAA Paper 2007-3217, 2007. CUDA threads. 174 Trans. Japan Soc. Aero. Space Sci. Vol. 55, No. 3

CheckMIMaxDuration Stop Y Is it the last MI N mission? Y N

Select the first MI Set the mission s Go to the next MI Is mission Is rule Start N mission in mission Y conflict flag mission in mission request list violated? empty? request list to true request list

(Loop)

Go to the next rule in rules list

N (Loop)

CheckExclusion, Y Stop Y Is the mission Is the rule the Y N CheckNonSequence last one? the last one? Y N Select the first Set the missions Go to the next Is rules list Select the first rule Is mission missions in Is rule Start N N Y conflict flags mission in mission empty? in rules list request mission request violated? to true request list empty? list

(Loop)

Fig. 15. Flowcharts of the CPU-based conflict checking schemes.

cudaCheckMIMaxDuration, N cudaCheckNonSequence

Invoke CUDA Is rule Set the missions Start Y Y Stop threads violated? conflict flags to true

In each CUDA thread

cudaCheckExclusion N

Divide the problem Invoke CUDA Is rule Set the missions Start Y Y Is it the last Y Stop into pieces threads for a piece violated? conflict flags to true loop?

In each CUDA thread N

Go to the next (Loop) loop

Fig. 16. Flowcharts of the GPU-based conflict checking schemes.