11.Jan by Nikolai

This article is about enhancing the performance of iPhone applications using the power of Objective C++. By discussing a real-world problem from Savoy’s Spots application, the article shows the necessary optimizations to make the program run smoothly in three steps.

Spots

Using Objective C++ to enhance the performance of iPhone applications.

One thing I really like about software development is, that there are problems from so many different domains—and problem solving is, when the fun part starts. While developing Spots, I have been confronted with UI decisions over and over. Designing a streamlined user interface for the iPhone is not easy. »Shall I include a button for the provider filter or put it into the Settings.app?« I changed my mind on this about twenty times.

This post will not be about user interface but about coding and performance. One of the hardest problems to solve was the drawing of the map (the rectangle in the lower part of the screen). It took me several approaches to learn that the only way to draw 230,000 spots on the map was using OpenGL and a lot of tricks and cheats to make drawing performance acceptable. At last I used three different techniques depending on the zoom level and amount of visible spots. Performance was the most important issue, since I wanted map interaction (dragging and pinching) to be as smooth as possible, ideally at 60Hz.

map_spots

In this article, I want to share some of the techniques used to improve performance. They are very simple and straight forward but I took the time to write them down because I see them rarely used in a Cocoa context. For example, I’m not seeing a lot of C++ in Cocoa, even though both languages can be mixed in a very productive way. Objective C++ is a great extension to pure Cocoa when your aim is performance. And on the iPhone, everything is about performance.

The Problem

Spots contains about 230,000 hotspot locations, spread over a rectangular area (the world map in mercator projection). It is necessary to count the number of visible spots in the area covered by the current map rectangle, since this value decides which drawing technique to use. The coordinates are normalized, so the whole world covers the rectangle {{0.0, 1.0}, {0.0, 1.0}}. For a typical use of the program, the map rectangle is quite small: less than two percent of the whole map width or height.

map

How do I count the number of locations in a database of hundreds of thousands of locations efficiently?

I prepared an XCode project that contains all code discussed here. The example project does not contain the original database that comes with Spots. It simply spreads out the spots randomly over the world map, which is very different from the real scenario. But the stats I collected with the original data are very much the same like those from the random data.

The first Approach: Plain-vanilla Cocoa

The first approach is very simple and does not even try to be very efficient. The locations are represented as Objective C objects (@class Spot) and are kept in a database object (@class SpotsDB1) using an NSArray.

@interface Spot : NSObject {
    CGPoint _position;
}
@property (nonatomic) CGPoint position;
@end

@interface SpotsDB1 : NSObject {
    NSArray* _spots;
}
- (NSUInteger)countSpotsInRect:(CGRect)rect;
@end

To count the locations in a given CGRect, the database object simply iterates over the array and checks each location.

NSUInteger count = 0;
for (Spot* spot in _spots) {
    if (CGRectContainsPoint(aRect, spot.position))
        ++count;
}

With this code it takes about 250 milliseconds to count the number of spots in a {0.02, 0.02} random rectangle from a total of 230,000. Too bad, if you target a frame rate of 60Hz, where you have 16 ms to draw a frame.

What makes this solution slow is not only the lack of a smarter algorithm, but also the amount of time wasted to make a couple of function calls in the tight loop. Another negative impact on the performance is the fact, that the locations are stored in objects in an NSArray, since the large memory footprint of objects and the pointer indirection of the array all add to processing time.

The second Approach: Simple optimizations

To make the same solution run much faster, we can simply put the locations into a more compact data structure: a standard C array of CGPoint structs. For ease of use, we keep that C array in an NSMutableData object.

_spotCount = [spots count];
_spotsData = [[NSMutableData alloc] init];
for (Spot* spot in spots) {
    CGPoint p = spot.position;
    [data appendBytes:&p length:sizeof(CGPoint)];
}

Removing the function calls from the loop we get a times four performance boost:

CGFloat xMin = CGRectGetMinX(aRect);
CGFloat yMin = CGRectGetMinY(aRect);
CGFloat xMax = CGRectGetMaxX(aRect);
CGFloat yMax = CGRectGetMaxY(aRect);

CGPoint* begin = (CGPoint*)[_spotsData bytes];
CGPoint* end = begin + _spotCount;

NSUInteger count = 0;
for (CGPoint* i = begin; i != end; ++i) {
    if (i->x > xMin && i->y > yMin && i->x < xMax && i->y < yMax)
        ++count;
}

Without changing the simple algorithm, we now calculate the spot count in 55 milliseconds, which is more than four times faster. What’s even better, regarding the very constrained resources of the iPhone, is the fact, that we use less than half of the memory compared to the first version.

The third Approach: Algorithmic optimizations

Of course there are lots of clever ways to optimize lookup of points in a planar map. For example, you could use a quad tree to store the points. In this example I’m using a less complex solution, which does not imply a change to the way, the data is stored. And I’m using built-in features of C++ so that I don’t even have to write the small piece of code that gets a little more complex.

sorted

The idea is to sort the locations in the array from left to right. By doing that, it is possible to determine the left and right margins of the given rectangle very quickly (see below). When the margins in the array are known, only the points between the margins have to be checked.

sorted2

So it’s all about finding the margins quickly. We do this by using binary search, which works on an array of sorted values. It does not check every value to find the matching position but jumps in the middle, checks if the value is less or greater, then jumps in the middle of the remaining section and so on. By using this cutting-in-half algorithm, binary search takes log2(n) checks to find the best element. With our array of 230,000 locations, that’s only 22 elements to try!

Back to the code, we find how easy it is to implement the above algorithm. Of course, now we have to sort our database:

// comparison function for array sorting
NSInteger leftToRight(Spot* a, Spot* b, void* context) {
    CGFloat xa = a.position.x;
    CGFloat xb = b.position.x;
    if (xa < xb)
        return NSOrderedAscending;
    return xa > xb ? NSOrderedDescending : NSOrderedSame;
}

...

spots = [spots sortedArrayUsingFunction:leftToRight context:NULL];

Now the interesting part: binary search using C++. We use the library function std::lower_bound from #include <algorithm>. It takes a starting iterator, an end iterator, the value to look for and a comparison function. It returns the first value for which the comparison function returns false. Start and end iterators can be simple C pointers into the array.

#include <algorithm>
...

bool cmpX(const CGPoint& a, const CGPoint& b) {
    return a.x < b.x;
}
...

CGFloat yMin = CGRectGetMinY(aRect);
CGFloat yMax = CGRectGetMaxY(aRect);

CGPoint* begin = (CGPoint*)[_spotsData bytes];
CGPoint* end = begin + _spotCount;

CGPoint leftMargin = aRect.origin;
CGPoint rightMargin;
rightMargin.x = CGRectGetMaxX(aRect);

CGPoint* left  = std::lower_bound(begin, end, leftMargin, cmpX);
CGPoint* right = std::lower_bound(left, end, rightMargin, cmpX);

NSUInteger count = 0;
for (CGPoint* i = left; i != right; ++i) {
    if (i->y > yMin && i->y < yMax)
        ++count;
}

Note, that the second binary search, the one that finds the right margin, only starts searching from the left margin, not the whole array. Another optimization was to remove the X-axis checks from the loop, since that’s already done by determining the margins.

With these optimizations, the code takes only about one millisecond to count the points in the rectangle. That’s two hundred times faster than the first version and fast enough for the intended use.

Results

Here’s the output of the example project, running on three different devices. If you want run the code on your own device, keep in mind that you have to change the Code Signing Identity in the target settings. If you don’t use a universal application identifier in your profile, you also have to change the Bundle Identifier in the Info.plist file from »com.savoysoftware.spotsDB« to whatever your identifier is.

iPhone 3G:
SpotsDB1 needed 3.2 seconds to count spots in 13 rects
that's 244.423 ms per rect
SpotsDB2 needed 3.0 seconds to count spots in 55 rects
that's 55.230 ms per rect
SpotsDB3 needed 3.0 seconds to count spots in 2988 rects
that's 1.004 ms per rect
Second generation iPod Touch:
SpotsDB1 needed 3.1 seconds to count spots in 16 rects
that's 194.756 ms per rect
SpotsDB2 needed 3.0 seconds to count spots in 65 rects
that's 46.450 ms per rect
SpotsDB3 needed 3.0 seconds to count spots in 3399 rects
that's 0.883 ms per rect
Simulator running on a Mac Pro:
SpotsDB1 needed 3.0 seconds to count spots in 687 rects
that's 4.372 ms per rect
SpotsDB2 needed 3.0 seconds to count spots in 2687 rects
that's 1.117 ms per rect
SpotsDB3 needed 3.0 seconds to count spots in 187276 rects
that's 0.016 ms per rect

Conclusion

When looking at the results it gets obvious that there’s a huge difference in performance between a desktop computer and the iPhone. Translating the numbers to a rule of thumb you could say: »What takes one second on a Mac takes one minute on the iPhone«. Or, what sounds even worse to me: »A frame rate of 60Hz on a Mac means one update per second on the iPhone«.

There’s obviously a big need for optimized code on the iPhone. In this article, I showed a couple of ways to reduce the overhead of Objective C and how to maximize performance using C++. Of course it’s not always so easy to get at the bottlenecks and simply switch some data structures to make the code fly. But when designing the architecture of data intensive applications, it’s absolutely worth looking at non-Cocoa ways to help coping with the restrictions of mobile devices.

31 Comments

Thanks, nice article. So what’s the C++ story in general on the iPhone? Do you get full-blown C++? Zero cost exceptions, multiple inheritance, RTTI and the like? Are there any special hurdles to clear, like weird Xcode linker flags?

by Wolf Rentzsch, Jan 19th, 2009, 23:45  

@Wolf:

The iPhone’s C++ is in every way comparable to the desktop gcc version. There’s no need to set up any compiler flags or hidden XCode settings, just name your files foo.mm or bar.cpp and everything works as expected.

As far as I’ve been trying, every bit of C++ is working: The standard library loads just fine, the iostream part works (std::cout < < "Hello, World!" << std::endl). STL containers are all there and type safe, as they used to be. Multiple inheritance and dynamic_cast<>() work as well. All complex template magic works fine.

C++ experts will inevitably ask:”What about boost?” Since I’m no bjam expert, I did not try to compile the libraries yet. But I’ve been using the header-only parts with success (I cannot live without smart_ptr.hpp).

Exceptions are working on the iPhone but behave a little different than on the desktop. In Mac OS, @try ist slow but @catch is quite fast. In C++ it’s the other way round: try is zero cost while throw is very expensive. At the last WWDC an apple engineer explained that the objc way was a bad design idea, since trying is common and catching (or throwing) is seldom. When they invented Objective C 2.0, they switched to the C++ exception model, but, for compatibility, only in the 64bit part.
On the iPhone there are no compatibility issues, so they did it right from the beginning: using the C++ exception model everywhere in objective C. This leads to a slight difference in performance between the platforms: @try’ing is zero cost on the iPhone while expensive on the simulator, @catch’ing is the opposite.

by Nikolai, Jan 20th, 2009, 00:04  

iPhone exceptions are compatible between Objective-C and C++, like 64-bit Mac. But iPhone exceptions are not zero-cost; they are implemented with setjmp/longjmp in both Objective-C and C++.

(Why? Space: exception unwind tables are big, and iPhone storage is limited. And engineering time: that’s how the ARM gcc implementation worked at the time.)

by Greg Parker, Jan 20th, 2009, 00:28  

Thanks for writing this up!

You mention:
“What takes one second on a Mac takes one minute on the iPhone”.

That’s high, but in line with some other measurements that have been published. I’m guessing that memory access speed is the main culprit for the difference between the 60x seen here and the 6x the clock cycle time suggests (there are other architectural effects, likely, but I’d expect it to make up the smaller portion of the 10x that remains). Do you know how you cache is behaving during your measurements?

And yes, I think using the C++ facilities is great. I also regularly use C99 extensions available in the GNU C++ implementation for convenience; especially compound literals (e.g., (CGRect){ orig, size }).

by David V., Jan 20th, 2009, 01:54  

[...] and Performance by admin on January 19th, 2009 Here’s an interesting blog post about using ObjC++ to improve performance in iPhone apps. I may be in a minority on this, but [...]

by iWyre, Jan 20th, 2009, 02:28  

Memory access speed is a gating factor on the device. While the CPU and GPU are comparable or better than the original iMac, the CPU cache is tiny, and the main memory speed is ~50% of the original iMac.

That means that data optimization that reduce the amount of memory thrashing tend to be very good. It also means you really need to profile the app, since sometimes specializing a bit of code through templates is a speed up, and sometimes doing it too much just results in a larger codebase that ruins your instruction cache.

In other words, no sacred cows. Tthe same solution does not work in all cases, what is most important is to pay attention and be flexible.

by Louis Gerbarg, Jan 20th, 2009, 02:33  

It’s nice to see an article about optimization on the iPhone platform, but I wouldn’t think of the iPhone as slow. It’s actually a very powerful mobile device. Think of it more like a top of the line Macintosh, from 10 years ago, with a more feature-filled version of OpenGL, running in the palm of your hand.

A good strategy for optimizing your performance is running it through Instruments first, and seeing where your overall bottlenecks are, and start with the lowest hanging fruit.

One area the iPhone _is_ slow at is software rendering, which much of the Cocoa/Quartz drawing calls use, but the PowerVR graphics chip can push a lot of textured polygons per cycle — John Carmack is porting Quake 3 to the phone, and there are number of powerful graphical things that you can do with the hardware that will run > 60fps on both your Mac Pro, and your iDevice.

by Mark Johns, Jan 20th, 2009, 02:36  

What a strange world we live in when C++ is being used to “optimize”.

Coming from a C background I’d suggest just using bsearch(), since you don’t have to monkey around with .m vs. .mm files/the nightmare that is the Obj-C++ compiler.

I also wouldn’t be the least bit surprised if bsearch is faster or more efficient than a C++ implementation (C++ sucks my balls).

by Eric, Jan 20th, 2009, 02:51  

I don’t think this says anything about C++, as I believe you went from “slow ObjC implementation” to “slow C implementation” to “slightly optimised implementation that just happened to be in C++”.

You can definitely achieve exactly the same kind of results in C if you were to implement the binary tree bounding algorithm. This article only highlights that a better algorithm does a better job.

by steve, Jan 20th, 2009, 03:27  

Excellent article..

If there was a job where I could spend all day just optimizing other people’s code I’d be happy as a pig in s&*t.

I think it’s the most fun you can have with your clothes on, or not falling out of a plane…:)

by scratt, Jan 20th, 2009, 04:06  

I was looking at a similar problem of finding points on a grid and came across http://en.wikipedia.org/wiki/Geohash which would allow you to just do a range lookup on the bit prefix based on the precision that you need for your display.

by Charlie, Jan 20th, 2009, 05:13  

Hmm, I like my IPhone just the way it is!

RT
http://www.anonweb.pro.tc

by John Weston, Jan 20th, 2009, 05:20  

[...] from Savoy Software wrote a very interesting article about performance improvement on the [...]

This is kind of an aside from the point of the article, but I wonder why you don’t maintain a reverse lookup of location (to the smallest resolution you’re likely to need at a given zoom) to points? You can often solve speed problems by throwing memory at the issue, and the iPhone has got plenty of that, compared to traditional embedded applications.

by Todd, Jan 20th, 2009, 08:32  

Nice article, and some useful info there.

Minor point on writing style: you might want to consider using less commas. It unnecessarily interrupts the flow on occasion. E.g.:

One thing I really like about software development is, that there are problems from so many different domains—and problem solving is, when the fun part starts.

by Matt, Jan 20th, 2009, 09:32  

Have you tried using the THUMB compiler option

by Nemo, Jan 20th, 2009, 10:31  

@Eric:
In this case, C++’s standard library (lower_bound) buys you quite a bit of performance over C’s standard library (bsearch). The key difference is that the comparison call in C++ is inlined, whereas in C it’s indirect.

by David V., Jan 20th, 2009, 19:32  

@nemo Compiling this sort of code for Thumb would be a pessimization. Thumb code generation calls out to library functions for floating-point operations, even on VFP2 chips like the one in the iPhone. The Thumb code will be smaller and so that’s the default in Xcode.

by alexr, Jan 20th, 2009, 23:19  

When you evaluated your binary search algorithm, did you consider R-Tree ( http://delicious.com/search?p=r-tree in particular http://www.win.tue.nl/~mdberg/Papers/prtree.pdf ) — it seems to be optimized for spacial data. It sounds like your dataset is small enough that you don’t need it, but for the next order of magnitude larger R-Tree might be a better solution.

by Christopher Allen, Jan 21st, 2009, 01:11  

1. C++0x will make the STL actually palatable for regular use. I’m looking forward to it :-)

2. One nice thing about that Mac Pro is that it runs dtrace.

by Lally Singh, Jan 21st, 2009, 05:11  

[...] c++ on the iphone (tags: c++ optimization) This entry was posted on Thursday, January 22nd, 2009 at 12:08 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site. « 2008 Video Game Summary [...]

My understanding is that the iPhone and Mac desktop tools are not identical – some Objective C 2.0 features (such as, sensibly, garbage collection) are not supported on the iPhone, and the iPhone’s Cocoa depends more upon formal required protocols than desktop Cocoa.

by Todd, Jan 22nd, 2009, 23:52  

Thanks!…

I hope you can continue to build a sounds website here, I’ve quite loved the experience of reading your blog so far….

by Vedic math guy, Nov 22nd, 2010, 09:23  

Nice Post…

[...]I saw this really good post today. Just linked back to it from my site. Thanks![...]…

by river blindness, Dec 16th, 2010, 18:47  
03.Feb by Jan

We are happy to announce that Liquid Scale is available in the App Store. You can get more information on our product site.

20.Jan by Nikolai

Today is a good day to buy software. Indie developers are donating their sales to Haiti. Savoy Software will donate sales to Doctors without Borders.

»You get great software, Haiti gets financial help in its time of crisis.«

13.Nov by Jan

Content aware image resizing for the iPhone.

15.Apr by Jan

We plan to bring you some tasty new things in the near future. But for now we like to whet your appetite with this little movie we made for the Great Indie Bake Off in January. Unfortunately we missed the deadline for submission.
Turn on your volume, and… Buon appetito!

31.Mar by Nikolai

We are happy to announce the release of Spots 1.1 today. We added 13,000 free hotspots in the US and 2,500 T-Mobile US hotspots. All locations have been updated so that Spots now contains a total of 250,000+ hotspots worldwide.
Spots currently contains hotspot locations of the following providers:

Free US locations: 13,000+ (NEW)
T-Mobile US: 2,500 (NEW)
T-Mobile [...]

04.Jan by Jan

In a series of articles starting next week we will give you some technical details about the technologies under the hood of Spots, our new iPhone application. The first article My iPhone is not a Mac Pro will discuss how we tweaked the map rendering code of Spots in order to display over 230,000 hotspots [...]

24.Dec by Jan

Today, Spots has been released in the App Store. Please check out the screenshots and a video on our product website. [...]