Homepage: www.whit537.org           Email: chad@zetaweb.com

Monday, April 14, 2008

Scaling gheat

I gave a lightening talk on gheat at DevHouse Pittsburgh last Thursday. DevHouse was a good time, and it worked! I met Isaac and we had a good conversation about how to scale gheat.

Heatmaps are expensive to generate, which makes them infeasible for applications approaching anything like real time. Most of the cost is in the multiply operation (see the source). This operation is already written in C (it's part of PIL), so what else could we do to optimize this?

We came up with two ideas. First, we could parallelize it. Split up tiles even further, generate mini-heatmaps, and then knit them back together. This is basically carrying Google's tiling strategy even further. You'd have to distribute the parallelization over a number of CPUs to make it worthwhile. And since points just off the edge of a tile affect the tile itself, you end up with this overlap area where you are really doing double work. The smaller the tiles, the larger the overlap. At maximum parallelization you are doing fully twice the work.

The other idea was to reduce the dot resolution. Perhaps multiplying 100 dots looks about the same as multiplying only 10 well-placed dots. And probably, finding those 10 dots is cheaper than multiplying all 100. To make this work, you would need to calibrate your dot resolution to the zoom level. You would also need to normalize across all your data so the heatmap meant the same thing everywhere. For this reason, local changes could potentially ripple out to affect the entire dataset, which would hurt caching (our current optimization strategy). As always, optimizations are not without trade-offs.

Wouldn't it be neat to have real-time heatmaps?

7 comments:

illume said...

Have you tried using pygame?

It has a lot of fast image manipulation routines - many of them much faster than those in PIL.

Here's how to use pygame without it creating a window.
http://www.pygame.org/wiki/HeadlessNoWindowsNeeded


cheers,

whit537 said...

Oooh, good tip, thanks. Off the bat I don't see a multiply operation in pygame.transforms, but maybe average_surfaces could work? Hrmmm ...

illume said...

Hi,

there is blit with a multiply flag.
http://pygame.org/docs/ref/surface.html#Surface.blit

Maybe you can use that... I'm not really sure what the algorithm does.

Do you want to blend two images together multiplying the colors?
eg.
surf1.blit(surface2, (0,0), 0, BLEND_MULT)



cheers,

whit537 said...

I managed to get Pygame installed today, and (though I probably should have profiled it first) I've gone ahead and refactored gheat to use multiple backends. Tomorrow hopefully I'll be able to actually rewrite the heatmap algorithm using Pygame. I'm excited to see how it turns out! Thanks Rene!

illume said...

Cool.

Let me know if you have any issues, or if you need help making it faster.

There's others on the pygame mailing list that could help out too.
http://pygame.org/wiki/info


cu,

Kush Sharma said...

Hi,
I was looking for a heatmap solution and I got reference of gheat from a research paper and I decided to intgrate it with my application.I a java guy though I don't mind working with other open source languages. python is new to me so I am having some problems. first of all I have taken a weeks time to install aspen pygame and gheat. but when I run the example.html I don't see the heatmap tile represented on the google map. Aspen is running without errors but the images which are needed for the heatmap are getting created in the empties directory. I don't know how to work from here onwards. I have started digging the code but to no avail.

plz help

kush

whit537 said...

Kush,

Ooh! A research paper? Is that something you could email to me? chad@zetaweb.com.

As far as getting gheat running, are you on Windows? Could you be more specific about what steps you have taken and what exactly the behavior is that you're seeing? This will help me try to reproduce the problem.

Lastly, could you please post debugging help to the mailing list instead of here?

http://groups.google.com/group/gheat

Thanks!


chad