Optimizing for Performance

by Keith Gladstien

In this article, you'll find strategies to optimize performance of applications made with Flash Professional. The process of optimization involves editing your FLA project files to ensure that the published application's realized (or actual) frame rate is sufficient to make animations play back fluidly.

If you've ever run a Flash project and seen stuttering animation, that is the behavior you want to avoid. If you'd like to replicate a test with stuttering animation, create a project with a simple animation and assign a frame rate less than 10 (such as 5). When you test the movie by publishing the SWF file, you'll see an example of stuttering animation.

There are two main components that determine Flash performance: CPU/GPU usage and memory usage. These components are not independent of each other. Some optimization adjustments applied to improve one aspect will have a negative impact on the other. In the sections below, I'll explain how that works and provide reasons why you might judiciously decide to, for example, increase memory use in order to decrease the CPU/GPU load.

If you develop Flash games for mobile devices, it's likely you'll need to use some of the techniques discussed below to achieve acceptable frame rates. If you are creating non-game apps for the desktop, it's possible to achieve acceptable frame rates with little or no familiarity with the techniques described in this article.

Judging and measuring game performance with

The only way to accurately judge the performance of your app is to run it on the target platform – your development platform can have very different performance characteristics, especially if you're developing for mobile devices. In late 2012, Adobe released a new tool called Adobe Scout (formerly known as "Project Monocle") that lets you do just this.

Scout is a profiler and performance debugging tool for Flash content. It lets you accurately measure the performance of your app – its frame rate, CPU utilization, memory use, rendering performance, and much more. It supports remote profiling, meaning that you can profile your app while it's running on a mobile device. This lets you tune the performance of your app for the specific platform you're targeting.

You can download Scout from here. For more information about what Scout can do, and how to use it, you can read the getting started guide.

Memory tracking, memory use, and performance testing Scout can help you to detect memory leaks in your content, but version 1.0 doesn't show you which objects are in memory and causing the leak. More detailed memory features are planned for future releases, coming soon. In the meantime, if you discover memory issues on the target platform, you can use the MT class to debug your app and resolve issues. (In the provided sample files folder, open the ActionScript class located in this directory: MT/com/kglad/MT.as.)

The code in the MT class is adapted from code provided by Damian Connolly on his site divillysausages.com. The MT class reports frame rate, memory consumption, and lists any objects that are still in memory. To use the MT class, follow these steps:

1. Import the MT class: import com.kglad.MT; 2. Initialize it from the document class or the project's main timeline with this line: MT.init(this,reportFrequency); In the line above, the this keyword references the movie's main Timeline and reportFrequency is an integer. The main Timeline reference is used to compute the realized frame rate and reportFrequency is the frequency (in seconds) that the trace output reports the frame rate and amount of memory consumed by a Flash application. If you don't want to output periodic frame rate and memory reporting data, pass 0 (or anything less). Even if you choose not to output the frame rate, you can still use the memory tracker part of this class. 3. To track objects you create in your app, add this line: MT.track(whatever_object,any_detail); In this line of code, the first parameter is the object you want to track (to see if it is removed from memory) and the second parameter is an optional string containing anything you want to test. (Developers typically use this parameter to get details about what, where, and/or when you started tracking a specific object). 4. To create a report that reveals whether your tracked objects still exist in memory, add this line: MT.report();

It is not necessary that you understand the MT class to use it. However, it is a good idea to check out how the Dictionary class is used to store weak references to all the objects passed to MT.track(). The class includes extensive comments that describe how it works.

Many of the sample file tests provided at the beginning of this article use the MT class. To learn more about working with the MT class, check out the tests to see how the MT class is used.

Similar to the observer effect in physics, the mere fact that we are measuring the frame rate and/or memory and/or tracking memory, changes the frame rate and memory utilization of your app. However, the effect of measurement can be minimized if the trace output is relatively infrequent. Additionally, the absolute numbers are usually not important. It is the change in frame rate and/or memory use over time that is important for debugging and optimization. The MT class does a good job of reporting these types of changes.

The MT class does not allow trace outputs more than once per second to help minimize spuriously low frame rate reports caused by frequent use of the trace method. (The trace method itself can slow the frame rate.) It's important to note that you can always eliminate trace output as a confounder of frame rate determination by using a textfield instead of trace output, if desired.

The MT class is the only tool used by the sample file test projects to check memory usage and pinpoint memory problems. It also indirectly measures CPU/GPU usage (by checking the actual frame rate of the executing app).

Implementing optimization techniques

In the sections below, I'll begin by discussing memory management guidelines, with sub-topics listed in alphabetical order. Next, I'll provide information on CPU/GPU management with sub- topics related to that goal.

It may seem logical to provide the techniques in two sections. However, as you read through this article, remember that memory management affects CPU/GPU usage, so the recommendations listed in the memory management section work in tandem with the tips listed in the CPU/GPU section.

Before providing the specific best practices you can use, I think it is also helpful to include information about the techniques so that you can learn which are the easiest or hardest to implement. I'll also provide a second list that prioritizes the techniques from greatest to least benefit.

Keep in mind that these lists are subjective. The order depends on personal developer experience and capabilities, as well as the test situation and test environment.

Easiest to hardest optimization techniques to implement

1. Do not use filters. 2. Always use reverse for-loops. Avoid writing do-loops and while-loops. 3. Explicitly stop Timers to ready them for garbage collection. 4. Use weak event listeners and remove listeners when no longer needed. 5. Strictly type variables whenever possible. 6. Explicitly disable mouse interactivity when mouse interactivity is not required. 7. Replace dispatchEvents with callback functions whenever possible. 8. Stop Sounds to enable garbage collection for Sounds and SoundChannels. 9. Use the most basic DisplayObject needed for each element. 10. Always use cacheAsBitmap and cacheAsBitmapMatrix with air apps (mobile devices). 11. Reuse Objects whenever possible. 12. Event.ENTER_FRAME loops: Use different listeners and different listener functions applied to as few DisplayObjects as possible. 13. Pool Objects instead of creating and garbage collecting Objects. 14. Use partial blitting. 15. Use stage blitting. 16. Use . Greatest to least benefit of optimization techniques

1. Use stage blitting (if there is enough system memory). 2. Use Stage3D. 3. Use partial blitting. 4. Use cacheAsBitmap and cacheAsBitmapMatrix with mobile devices. 5. Explicitly disable mouse interactivity when mouse interactivity not needed. 6. Do not use filters. 7. Use the most basic DisplayObject needed. 8. Reuse Objects whenever possible. 9. Event.ENTER_FRAME loops: Use different listeners and different listener functions applied to as few DisplayObjects as possible. 10. Use reverse for-loops. Avoid writing do-loops and while-loops. 11. Pool Objects instead of creating and garbage collecting Objects. 12. Strictly type variables whenever possible. 13. Use weak event listeners and remove listeners. 14. Replace dispatchEvents with callback functions whenever possible. 15. Explicitly stop Timers to prepare them for garbage collection. 16. Stop Sounds to enable garbage collection for Sounds and SoundChannels.

With these priorities in mind, proceed to the next section to learn how to update your Flash projects to manage memory more efficiently.

Managing memory

The list of suggestions below is not exhaustive but it contains strategies that can significantly improve the performance of Flash content.

Using a callback function vs. dispatchEvent

There is an increase in memory use when dispatching events because each event must be created and memory is allocated to it. That behavior makes sense: events are objects and therefore require memory.

I tested a handful of events and found each consumed 40 to 128 bytes. I also discovered that using callback functions used less memory and ran more efficiently than using events. (See the test files in the sample files folder: callback_v_dispatchEvent.)

Applying filters

Memory use increased when you apply a dynamic filter. According to Adobe Help documentation , using a filter doubles memory use. In real world testing with Flash Professional CS6, I've found that while filters do cause an increase in memory use, they do not come close to doubling the amount of memory used. (To review the test examples, review the sample files in the filters folder.) Using the correct type of display objects for each element

The Shape, Sprite, and MovieClip objects each use different amounts of memory. A Shape object requires 236 bytes, Sprite requires 412 bytes, and Movieclip requires 448 bytes.

If you are using many thousands of DisplayObjects in a project, you may be able to save substantial memory by using a Shape if interactivity is not required. Or, use a Sprite in cases when a timeline is not needed.

Object pooling

At the start of your app, create all the object references you'll ever need during the entire time your app is open and pool (store) those references in an array. Whenever an object is needed, retrieve it from the array.

Whenever an object is no longer needed, return it to the array. It is common practice to use Vectors instead of arrays to store same-typed objects. Using a Vector may be twice as fast as using an array but, unless you're doing many hundreds of thousands of operations, you're not likely to notice a difference because they are both fast when limited to thousands of operations. (For examples, see the sample files located in the array_v_vector folder.)

While there are performance benefits to using object pooling, the main benefit is that it makes it easy to manage memory. If you have a problem with unlimited increases in memory utilization, object pooling can prevent that problem. It is a technique that generally improves performance and reduces memory use.

I saw a 10% increase in frames per second using pooling and a decrease in memory use of about half when testing a SWF file that contains many objects being garbage collected and recycled on each frame. (Check out the sample files in the folder named pooling_v_gc.)

Reusing objects

Whenever you create objects in a loop, strive to create one object outside the loop and reuse it repeatedly inside the loop. It is not always possible for all projects, but there are many situations where this technique is helpful.

The section that describes blitting includes an example that reuses a number of objects. You can examine that test file to see how that is accomplished.

Working with sounds

The issue with sounds in relation to memory usage is relatively minor. When a sound is playing, it cannot be garbage collected (when using Flash Professional CS6 to test the file). When the Sound finishes playing or a SoundChannel instance is used to stop() the sound, the Sound is prepared for garbage collection. (To learn more, see the sample test files in the folder named sound_test.)

Using Timers

The issue with Timers is more critical. If a Timer has not stopped (because its currentCount is less than its repeatCount or because a stop() method has not been applied to it), the Timer will not respond to garbage collection even if you remove its listener and null all references. A Timer's listener function won't be called again once you remove the listener, but the Timer still consumes memory.

A Timer only uses 72 bytes of memory so this is unlikely to become a noticeable problem in a desktop/browser Flash game. However, if you open, play, and then close a Flash game running on a mobile device repeatedly without ever restarting the game, you may see a noticeable problem.

To see the code, open the files in the folder named gc_timer_test.

Weak listeners vs. strong listeners

Another unexpected result of testing with the MT class is that it makes no difference whether you use weak or strong listeners. They were both treated like weak listeners in my testing with Flash Professional CS6. (See the test files in the strong_v_weak_listeners folder.)

Managing CPU / GPU usage

Currently, the only way I know how to measure this directly is to use an tool. Windows includes the Windows Task Manager (performance tab) and Mac OS provides the Activity Monitor. Both tools allow you to see CPU usage but, generally, neither is very useful for testing Flash performance.

As a result, you are left measuring CPU/GPU usage indirectly by checking your app's actual frame rate. The MT class enables you to check a project's frame rate, along with memory use reporting and memory tracking.

Working with cacheAsBitmap and cacheAsBitmapMatrix

Enabling the cacheAsBitmap property of a DisplayObject significantly improves performance (and increases memory) as long as the DisplayObject does not undergo frequent changes that require frequent updates to the bitmap. Essentially, this means verifying that the DisplayObject does not change appearance in any way other than changing its location on the stage. If there are frequent bitmap updates, performance will decrease.

How frequently you can update a cached bitmap and still see a performance benefit, depends on several factors. The most important factor is, not surprisingly, how frequently you are updating the bitmap. In any case, use the MT class to test your specific project, both with and without cacheAsBitmap enabled for DisplayObjects that require bitmap updates. (It is a no-brainer when deciding whether to use cacheAsBitmap for DisplayObjects that require no bitmap updates: Use it!)

If you have a DisplayObject (movie clip) and you want to enable its cacheAsBitmap property, add this line: mc.cacheAsBitmap = true;

Enabling cacheAsBitmap is always beneficial even when changing the scale, skew, alpha and/or rotation (but not changing a movie clip's frames) of a DisplayObject when publishing for mobile devices.

Specifically, when publishing a project for mobile devices, you can enable the cacheAsBitmap and assign the cacheAsBitmapMatrix property of your DisplayObjects and realize a substantial performance boost, like this: mc.cacheAsBitmap = true; mc.cacheAsBitmapMatrix = new Matrix();

You do not have to use the default identity matrix. However you'll find that there are only a few reasons to use something other than the default matrix.

Stage blitting

Stage blitting, a term that describes bit block transferring, involves the use of bitmaps to render the final display. Instead of adding DisplayObjects to the display list, pixels are drawn to a stage- sized bitmap which has been added to the Stage. To convey animation, the bitmap's pixels are updated in a loop. Typically, an Event.ENTER_FRAME loop using the BitmapData class's copyPixel() method is applied to the stage-sized bitmap's bitmapData property using other bitmapData objects created outside the animation loop.

This technique is more complicated than adding objects directly to the display list but it is much more efficient—often making the difference between unacceptable frame rates and excellent frame rates for Flash app. To be sure, there is absolutely no reason to use this strategy unless you need the increased frame rate.

I compared a SWF file with 10,000 squares moving and rotating across the Stage using movie clips (see the sample file titled blit_test/blit_test_mc.fla). Then I updated the same SWF file with some basic optimization techniques (see the sample file named blit_test/blit_test_basic_optimizations.fla) and stage blitting (see blit_test/blit_test2).

The first SWF file ran at about 15 fps, which is unacceptable. However, there were a few basic tweaks that can be easily applied to improve performance before embarking on more difficult to institute techniques like blitting. First, I reversed the for loops to gain a little performance boost (see the section on loops below) and, more importantly, I used some constants instead of recalculating the same values repeatedly. Those changes provided a significant (~40%) speed boost to almost acceptable frame rates, ~21fps.

Using stage blitting to encode the same display yielded a frame rate of 54 fps, an impressive 350+% boost in frame rate.

However, as I previously mentioned, the process of blitting is more complex. The steps involve:

1. Initializing the Stage display bitmap assets (Bitmap instance, BitmapData instance, and Rectangle instance) onto which all the displayed pixels are copied during each Event.ENTER_FRAME event loop. 2. Populate a data array with all the data used to update the display. (This step is not always necessary.) 3. Populate an array of BitmapData objects. If you had an animation on a movie clip's timeline, this is where you store a BitmapData object of each frame (for example, by using a sprite sheet. In the sample test file I created a BitmapData instance for each angle the rectangles can be rotated using ActionScript. 4. Create an Event.ENTER_FRAME event loop. 5. Update the data in the Event.ENTER_FRAME loop, copy the appropriate pixels from the array created in step 3 to the appropriate location (determined using the data array from step 2) of the BitmapData instance created in step 1.

For more details, review the file in blit_test/blit_test2. It contains extensive comments.

The downside to stage blitting, other than the difficulty coding, is that a large amount of memory may be consumed when creating the needed bitmaps. That is a significant factor when creating an app for a device like the iPad that has high screen resolution (1024 x 768 for the first and second generation iPad, and 2048 x 1536 for the third generation iPad) and relatively low memory (RAM) capacity (256MB, 512MB, and 1GB for first, second, and third generation, respectively).

Generally, your game should consume no more than half the available RAM. That includes not just bitmaps but everything else in your game that consumes RAM.

Partial blitting

As the name implies, partial blitting combines the use of the Flash display list and copying pixels to BitmapData objects. Typically, each object displayed on Stage is a bitmap that is added to the display list and manipulated as usual with display objects like movie clips. Each object's animation is blitted to an array of BitmapData objects.

For example, using the previous example of squares rotating and moving across the Stage, I blit the squares and their various rotations, store those BitmapData objects in an array, add bitmaps to the display list, and then manipulate the bitmaps just like any display object (like the movie clips described above) in the Event.ENTER_FRAME loop. And then finally, I assign the bitmapData property of the bitmaps to the appropriate array element. (To see how this works, review the blit_test/partial_blitting_test.fla file.)

The partial blitting test was not nearly as fast as stage blitting when tested on my PC (24-26 fps). But keep an open mind because partial blitting may be faster than stage blitting in other situations. In addition, it's easier to code partial blitting than Stage blitting, so if you can achieve acceptable frame rates with partial blitting, that eliminates the additional work required for stage blitting.

Working with Event.ENTER_FRAME loops

Creating multiple Event.ENTER_FRAME listeners that are applied to one instance calling multiple listener functions was slightly more efficient than creating one Event.ENTER_FRAME listener calling one listener function, which then called other functions.

However, it is a different situation when you have multiple objects each with their own Event.ENTER_FRAME listener, compared with one object with an Event.ENTER_FRAME listener. There is approximately a two-fold performance gain using one object with an Event.ENTER_FRAME listener compared with many objects that each have their own Event.ENTER_FRAME listener. (To review the tests, check out the files in the enterframe_test_one_v_many_loops_with_different_movieclips folder.)

Understanding the difference between for loops, while loops, and do loops

In Flash, reverse for loops are the fastest executing loops. If a stored list of same-type objects is needed in the loop, a reverse for loop using a Vector to reference the list of objects is the fastest way to go.

All three loops execute faster if you use an int for the iteration parameter, rather than using an uint. All three loops execute faster if you decrement the loop variable, rather than increment it. (Note: If you decrement a loop variable i and use i>=0 as the terminal condition, you will trigger an endless loop if i is a uint.)

All three loops execute faster if you use a variable or constant for the terminal condition rather than an expression or object property. Because the initial condition only needs to be evaluated once (and not with each loop iteration), there is no significant difference whether you use an expression or object property for the initial condition in any of these loops.

Anything that can be moved outside a loop without affecting the result should be moved. That includes declaring objects outside the loop (see the section about reusing objects) where using the new constructor inside a loop sometimes can be moved outside the loop and the terminal loop condition, if it is an expression, should be evaluated outside the loop.

I have seen some mention that using objects that each reference the next object is faster than using an array to reference the objects. In my tests, I found that statement to be false. Using an array was both easier and faster to both initialize and to use. Using a Vector instead of an array, of course, was even faster. (See the sample test file in the for_loop_v_sequential_loop folder.)

None of these suggestions is likely to make a major difference under most conditions. However, these tweaks are worth implementing if you are trying to squeeze every bit of efficiency out of your coding or if your project involves iterating through a large number of loops.

Disabling mouse interactivity

Movie clips and sprites can interact with the mouse. Even when you do not code for any mouse interactivity, Flash Player checks for mouse interactions when these objects are present. You can save some CPU cycles by disabling mouse interactivity for objects that do not require mouse interactivity.

This strategy is very helpful in situations when you notice a performance problem (or your computer's fans increase speed) when your mouse moves across the Stage. Disabling mouse interactivity improves performance and can quiet your computer fan.

During testing, I saw the frame rate increase by about 2 1/2 times when disabling all movie clips in a test file. The sample test code is located in the mouse_interactivity folder.

Removing event listeners

Even though more recent Flash Player versions appear to remove listeners when objects are garbage collected and having strong listeners does not appear to delay garbage collecting, you should still explicitly remove all event listeners as soon as possible. The sooner a listener is removed, the less CPU cycles are consumed by the listener. In addition, you may not know which Flash Player version a user has installed. Older versions of Flash Player may not garbage collect objects—even those with weak listeners. Do not rely on the newer capabilities of Flash Player to optimize poor coding.

Working with Stage3D

Stage3D is a GPU-enabled display rendering model that became available with the release of Flash Player 11. This model is especially helpful for 3D rendering but can also be advantageously used for 2D displays using frameworks, such as Starling.

Because display rendering has typically been handled by the CPU, (which also does all the other work needed to run an app), harnessing the power of the GPU for rendering frees the CPU to do all the other work. This significantly improves performance on devices with capable GPUs.

To view Stage3D content, you must use Flash Player version 11 or higher. To use the Stage3D API, you will need to publish SWF files to use Flash Player 11 or future releases. If you are working with Flash Professional CS6, you're all set. If you have Flash Professional CS5 or CS5.5 you can update your installation of Flash to enable publishing to Flash Player 11. For more details, read See the blog post by Rich Galvan titled Adding Flash Player 11 support to Flash Professional CS5 and CS5.5.

Unfortunately, using the Stage3D API is difficult. However, there are several free public frameworks available that handle the low-level code needed to use Stage3D which offer easier to use .

One of these frameworks, Starling, is designed for developing 2D games. It is easy to use and effectively abstracts the complexity of Stage3D. The Starling API can be found on the Reference site.

I tested Starling to see how it compared to blitting and partial blitting. In some situations, Starling performed worse than both blitting options. In fact, it performed much worse than the un-optimized 10,000 square movie clip test.

However, if you deselect the permit debugging option in the Starling test, that simple tweak more than doubled the frame rate and the resulting SWF file was comparable to the un-optimized 10,000 square movie clip test. That is still a disappointment. However, part of the problem is that I use the debug version of Flash Player to test the files and Starling appears to perform much worse in the debug vs. non-debug version of Flash Player.

In addition, the 10,000 square movie clip test does not show Starling at its best. If you are using many movie clips that each contain a timeline with animation, Starling will almost certainly out- perform anything you can build that utilizes simple optimizations.

Only blitting provides the performance needed to exceed the benefits of using Stage3D and Starling. But blitting may not be practical because of the memory required to create the needed bitmaps.

The sample test files are located in the starling_test folder.

To use the Starling framework, follow these steps:

1. Download the starling.swc file. 2. Add it to your Flash project's Library path by following these steps: 1. Choose File > Publish Settings > ActionScript Settings. 2. Click the Library path tab and then click the Browse to SWC file icon. 3. In the Open File dialog box that appears, navigate to select the starling.swc file on your desktop. 4. Click Open to add starling.swc to your Library path. 5. Click OK to close the Advanced ActionScript 3.0 Settings panel and then click OK again to close the Publish Settings. 6. Save the FLA file and you are ready to use Starling. If you publish a mobile air game that uses Stage3D (which includes the use of frameworks like Starling that use Stage3D), set the Render mode to Direct. If you publish an embedding HTML file, set the Window Mode to Direct in the Publish Settings.

You can learn more about Starling and the Stage3D API on the Adobe Gaming site.

Where to go from here

In addition to the optimization techniques described above, there are two other best practices you can adhere to when developing Flash projects to improve playback:

1. Specify the class type of every variable you declare. The code runs faster when you take the time to type all variables, and the compiler displays more descriptive, helpful information when encountering errors. Check out the test files in the sample files folder: variables_typed_v_untyped. 2. Rather than using arrays to store data information, use Vectors. To see how this works, review the test files in the sample files folder: array_v_vector.