Search This Blog

Thursday, June 01, 2006

GNUPlot Fun

So, I'm trying to actually graph the full data sets. I believe I mentioned previously that I liked GNUPlot better than some of the others. However, it ended up being obscenely slow - it took 66 minutes to graph the realloc data set, which is about 60 megs large, and contains about 3.3 million data points (don't recall if I mentioned that). This absolutely confounded me. I couldn't conceive of any method that could possibly be that slow. I wrote a little program that used STL streams and extraction to see how long it would take to parse the file myself (keep in mind that STL streams are horribly slow): 2 minutes and 15 seconds.

So, I tried repeating the graph, telling it to graph every 20th data point, to see if the graph quality would be acceptable, while rendering in a reasonable amount of time (66 minutes / 20 = 3 minutes and 18 seconds, right?), using the handy stopwatch on my cell phone. This took 15 seconds. After a brief "OMGWTFBBQ" moment, the solution came to me: it was executing in exponential time. Gathering a few more times (1/17, 1/15, 1/12, 1/10, 1/8, and 1/5) confirmed this hypothesis: it was executing in O(N^2) time.



I can think of one "good" reason why it might be doing that: it's probably storing the data in a vector - a dynamically resizing array. Every time it gets resized, it has to copy the entire data set (which is almost 500 megs, by the end). This would produce the observed O(N^2) time. It also explains an odd behavior I observed while watching the GNUPlot execute in task manager. While loading the file, the memory usage increases over time. But the physical memory usage was ping-ponging all over the freakin place, varying at times by hundreds of megs (up and down) from second to second. I couldn't figure that one out, either; it was way too fast to be due to paging (my first thought). But it makes perfect sense if it's continually reallocating its buffer, copying the contents, and then freeing the old buffer.

*sigh* It seems like lately every program I've used I've needed to debug some catastrophic flaw in it. I have enough to do with debugging my own programs; you people need to leave me alone :P

1 comment:

Anonymous said...

That's pretty insane... Nice find! I wonder if they even tried using such large data sets?

I just downloaded the source, but I after looking at alloc.c, I don't think I want to look at the rest :|