Monday, August 18, 2008

I'm alive

Yes I am :D ... just in case my inactivity here raised any doubts...and I won't surprised if it did because, despite several reminders from my mentor that a blog update has been pending, I have been putting it off for the time that I have something substantial ( or .. was it my laziness? ). Now that the pencils-down date has arrived, I see no further excuse for postponing it.

Since the last time..
  • I got my disk and CPU info without much use of guru mode code (used the queue_stats tapset.. thought had to create my own copy of it where I could change the default time unit to milliseconds instead of microseconds.
The tapset says..
# qstats.stp: Queue statistics gathering tapset

# -------------------------------------------------------------------------

# The default timing function: microseconds. This function could

# go into a separate file (say, qstats_qs_time.stp), so that a user

# script can override it with another definition.

function qs_time () { return gettimeofday_ms () }

# -------------------------------------------------------------------------

Till that is not done.. I might have to stick to my own copy of queue stats.
  • My renderer module is up. Even though it supports only svg for now, I'll extend it to support other formats very soon.
  • I gather per process CPU statistics which show how much system and user time they take (got this idea from bootprobe).
  • I trace sys_open and gather statistics like which process reads/write to what file etc.
  • I also trace the blockIO (in this case I just provide a way to bring out the blockIO information as gathered by the tapset in XML format).
The idea behind tracing points 3, 4 and 5 is to have as much information as possible at least in text format so that even if it cannot be rendered (will terribly clutter the graph if rendered), we can get as much detail as possible.
As all the above information is timestamped, correlation is very easy.
  • The bootlimn (as it has been named tentatively) installs and uninstalls very cleanly.
  • A jar file is packaged along with the source code. It can be run simply by executing ./ .
  • A build.xml (to be used with ant) is also available.
  • Failure of a part of bootlimn does not crash the entire application. It still tries to give as much output as possible.For example,if one of the XML files cannot be parsed, the others are not affected and neither is the renderer module unless it is *very* critical for the creation of the graph.Even if the XML generated is screwed, the bootlimn still renders till the first occurence of improper entry.
  • The user can specify where to stop by changing the -c option in stpcaller.

Known bugs (Taken care of)):
  • The XML created, sometimes, has negative timestamps. (see update 3)
  • The IOblock parser gives errors at times.(This again is because of the screwed XML) ( see update 1).

What needs more work:
  • All the unique processes are rendered. The user as of now has no control over the degree of detail.
  • The state transitions can be bettered.
  • The CPU wait stats can also be added ( code already present in the stps, just requires slight modification in the XSD and corresponding changes to parser and renderer.. will do it soon)
  • Other formats of images to be supported.( next task at hand after debugging)
  • Header information needs to be added (this will be done soon too)

And anymore that will be suggested when the code is reviewed ( code can be checked out from here ).

Update 1: A temporary workaround is to define the sector, bdev etc fields ( which get some funny values on very rare occasion) as a String type so that just a single instance of screwed up XML does not hinder the parsing of the entire file. Not the best solution but just a temporary workaround as there is no further calculation based on these fields and the only function that uses them is a 'tostring' which converts them to a string anyway.

Update 2: Egads!!! revision 14 is sort of broken.. I am rectifying it.. please don't checkout the code now.

Update 3: The negative timestamps error has been solved ***phew***. The code can now be checked out. Logging has been changed to disk as opposed to in memory (see comments for further details).


Nandeep Mali said...

Nice work. :) Wish I had systemtap here to run it. Any where I can see the sample output? Love these system chart thingies.

You keep it up!

wcohen said...

Very cool project.

I looked through the systemtap portion of the code.

Some of "probe end" have "e=1". That shouldn't be needed.

It might be cleaner to use sprintf() rather than the '.' (concatenate) operation for generating the strings.

For processinfo.stp can have multiple probes with the same body:

probe kernel.function("sys_fork"),

The gettimeofday_ms() may return different values. For a probe it might be better to just take the time once and use that value in the probe:

t= gettimeofday_ms()-starttime

For readwriteinfo.stp:

The logic for determining read/write/read_write doesn't look correct in "sys_open" probe. Shouldn't it be "flags & 3 == 0" (or "== 1" or "== 2")?

Could factor out the code for "file = " out of the if statements of the "sys_open" probe. Similarly output statement could factored out.

In general, try to keep the line length down to 80 columns where possible. Wrapped lines are not so pretty.

satya said...

Thank you very much sir, wcohen :)
I had initially tried without the e=1, but it gave me a 'cannot determine type' error and hence I used that.. But I'll check that again :)

I'll make these improvements now :) and will report back with updates.

Anonymous said...

Any reason on why you choose java to be used? It is a big pain in the ass to get java(free version) working atleast on non-fedora OS'es.

satya said...

:) hi anonymous,
This project was initially meant to extend the existing bootchart to use systemtap, so we thought we'd just have to do the information gathering part from scratch and the rest of the task could be accomplished by making modifications to the existing java code. But later on, we discovered that it turned out to be easier to build from scratch because our parser and renderer modules were heavily dependent on the way the information is stored (but natural :D ) which was very different. As we had already started with java, due to time constraint we had to carry on with it... but yes.. we do have plans to port it to python.. but that is out of GSoC, of course.

satya said...

Revision 14 is broken.. Please don't checkout the code now.. Sorry :(

wcohen said...

You are welcome for the review. Let me know if there are newer versions to look at. I am a willing test subject.

Another question I had was the use of the associative arrays to buffer things until the systemtap script ends. The associative arrays are of limited size and they might run out of room storing all those time stamped entries. Is there a reason that the data is not logged when the data is collected?

satya said...

Yes sir, we wanted to interfere as little as possible with the actual boot process so we aimed at minimizing the disk reads and writes. I had compiled the readwrite.stp with a -D MAXMAPENTRIES option setting it to a higher value.. It worked fine when i had originally posted this blog entry, but yes, after some changes that I made (which did not amount to having extra entries in any way)... i am running out of memory for that and two other stps too.. I am looking into that at the moment.. as I did not understand how cleaning the files up such that no line is longer than 80 cols could result in additional entries

I will update you when I am able to figure that out :) Thanks a lot for offering me help with testing. I will surely let you know when I have a newer version up.

wcohen said...

Ah, trying to avoid the disk writes (the scripts shouldn't be reading anything). Thus, buffer in the associative array and delaying printing output until the probe end is to avoid disk operations. Were there earlier versions where the instrumentation logging output was generated immediately and the perturbation of measurements was observed? It may be simpler to just log the events as they occur (avoids issues with guessing the appropriate -D MAXMAPENTRIES).

Systemtap allows free format of script code. Strings for printf cannot be broken up, so those lines with strings may still be longer than 80 characters, but the remaining arguments for a printf could go on following lines. The 80-column line length was a suggestion to make it easier for people to read with editor windows that are 80 columns wide. This was a cosmetic suggestion, it shouldn't change how the actual script executes.

satya said...

Very sorry for the late reply sir, was desperately trying to correct the negative timestamps before submission. Thankfully, I was able to do so.
I tried logging as and when I probe. There are a few extra entries in readwrite and blockio but the overall distortion does not seem to be much.

The current code at the google code link
is the new revised version consisting of all these changes.

I would be really grateful if you could test it ou and let me know what you feel about it.

Sankarshan Mukhopadhyay said...

By any chance are you using the same blog entry and updating ? Or, is something broken at my end ?

satya said...

No, I used the same blog entry as I had mentioned the bugs here, thought a one line update on the current status of the blog might as well go in there... just realized now that any updated entry comes up on the planets all over again.