I’ve always worked to make ClearTrace perform well. That’s probably because I spend so much time watching it work. I’m often going through two or three gigabytes of trace files but I rarely get the chance to run it on a really large set of files.
One of my clients wanted to run a full trace for a week and then analyze the results. At the end of that week we had 847 200MB trace files for a total of nearly 170GB.
I regularly use 200MB trace files when I monitor production systems. I usually get around 300,000 statements in a file that size if it’s mostly stored procedures. So those 847 trace files contained roughly 250 million statements. (That’s 730 bytes per statement if you’re keeping track. Newer trace files have some compression in them but I’m not exactly sure what they’re doing.) On a system running 1,000 statements per second I get a new file every five minutes or so.
It took 27 hours to process these files on an older development box. That works out to 1.77MB/second. That means ClearTrace processed about 2,654 statements per second. You can query the data while you’re loading it but I’ve found it works better to use a second instance of ClearTrace to do this. I’m not sure why yet but I think there’s still some dependency between the two processes.
ClearTrace is almost always CPU bound. It’s really just a huge, ugly collection of regular expressions. It only writes a summary to its database at the end of each trace file so that usually isn’t a bottleneck. At the end of this process, the executable was using roughly 435MB of RAM. Certainly more than when it started but I think that’s acceptable.
The database where all this is stored started out at 100MB. After processing 170GB of trace files the database had grown to 203MB. The space savings are due to the “datawarehouse-ish” design and only storing a summary of each trace file.