Friday, July 4, 2008

Mid term evals!! eeks!

Quoting from my previous post "Now I just need to extend this structure to include the CPU and Disk info."

Well.. it turned out to be anything but just.

For 1, as I am not to use /proc files, that rules out passing those values to my systemtap script using command line arguments.
So, we decided to look into the kernel code and see how those entries are filled in the first place so that we could tap the information from there itself.

From there, I figured out that the files that particularly interest me are
1. /proc/proc_misc.c
2. /proc/array.c
3. /block/genhd.c

Then I mailed the systemtap mailing list and they pointed me to this. It is a very interesting tapset but the following bits left me a little unsure about using it-

Note that blktrace needs to be running in order for these scripts to
have any effect - __blk_add_trace() and therefore the tapset probe
isn't actually called unless tracing is active. Of course, that means
that you need to enable CONFIG_BLK_DEV_IO_TRACE in the kernel.



Things works fine this way, if you're careful - for some reason, if you
don't define overrides for *all* the callback functions, *none* of them
get called. The same thing is apparently true wrt optimization - if any
one of the callback functions gets optimized out, they all do. So in
your script, you need to define handler functions for every event type
whether you use it or not, and furthermore the bodies of the unused
handlers need to contain code that won't be optimized out


Installing blktrace sounded like adding additional baggage to us. Also, I haven't tried out blktrace so I do not know how to use this and whether using it will solve my purpose. I could of course play around with it for sometime but mid term evaluations start on 7th so I probed sys_read and sys_write to count the total reads and writes, and then used guru mode to find out the number of I/O tasks pending at the moment.

Yeah , I know guru mode needs to be used only when there is no other alternative . I couldn't see any. Any suggestion would be very welcome.

Thankfully, it works and I have all the diskstats that I require.So far so good.

But CPU information is turning out to be trickier. The variables that I require are not accessible.

A stap -p2 -e 'probe kernel.function("do_task_stat") {$foo}' -u ( A trick I learnt from fche )
gives the alternatives as : task buffer whole vsize eip esp wchan priority nice tty_pgrp tty_nr sigign sigcatch state res ppid pgid sid num_threads mm start_time cmin_flt cmaj_flt min_flt maj_flt cutime cstime utime stime cgtime gtime rsslim tcomm flags ns

But the variables utime, stime are defined within the function and a
stap -p2 -e 'probe kernel.function("do_task_stat") {$utime}' -u
gives semantic error: not accessible at this address: identifier '$utime' at < input > :1:40

I guess the only variables accessible are the parameters that are passed to the function and the return value as the probe is places at the start of the function definition. and the variables I need have their values worked out inside the function and they get seq.printed from there itself .. so these values are not even returned.

The kernel.statement construct is not what I can use as absolute addresses may be different on different computers.

As of now .. I am trying a very very ugly solution..

Edit 1: License stuff :D
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.


%{
#include< linux/kernel_stat.h >
#include< linux/sched.h >
%}

function get_str_info:long(val:long)
%{
struct task_struct *temp = (struct task_struct*)(long)THIS->val;
long x =(long)((temp->state & (0| 1 | 2 | 4| 8))| temp->exit_state);
THIS->__retvalue = x;
%}

function get_u_info_1:long(val:long)
%{
struct task_struct *temp = (struct task_struct*)(long)THIS->val;
cputime_t ut;
long usr;
struct signal_struct *si = temp->signal;
struct task_struct *t = temp;
do {
ut = cputime_add(ut,t->utime);
t = next_thread(t);
} while (t != temp);
ut = cputime_add(ut, si->utime);
usr = (long) ut;
THIS->__retvalue = usr;
%}


function get_u_info_2:long(val:long)
%{
struct task_struct *temp = (struct task_struct*)(long)THIS->val;
cputime_t ut;
long usr;
ut = temp->utime;
usr = (long)ut;
THIS->__retvalue = usr;
%}

function get_s_info_1:long(val:long)
%{
struct task_struct *temp = (struct task_struct*)(long)THIS->val;
long sys;
struct signal_struct *si = temp->signal;
struct task_struct *t = temp;
do {
st = cputime_add(st,t->stime);
t = next_thread(t);
} while (t != temp);
st = cputime_add(st, si->stime);
sys = (long) st;
THIS->__retvalue = sys;
%}

function get_s_info_2:long(val:long)
%{
struct task_struct *temp = (struct task_struct*)(long)THIS->val;
cputime_t st;
long sys;
st = temp->stime;
sys = (long)st;
THIS->__retvalue = sys;
%}

probe kernel.function("do_task_stat")
{
if ($whole)
{
u = get_u_info_1($task)
s = get_s_info_1($task)
state = get_state_info($task)
}
else
{
u = get_u_info_2($task)
s = get_s_info_2($task)
state= get_state_info($task)
}
}


Of course the above code needs to be debugged for successful compilation : function get_state_info:long works but the rest need some more work ( probably due to the calls to other inline functions: next_thread()and cputime_add() within the function ). The reason I am putting up the unfinished code is because I need to know whether its worth spending time debugging it.Is it the right way to proceed?

What I am basically trying to do is.. take the values of task and whole (they are passed as parameters to the function do_task_stat and hence are available. From there I calculate the variables I needed the same way as is being done inside the original function. *very ugly* but am at a loss of ideas (for now at least).

Anyway, I'll just work on this today and tomorrow. If I am unable to find a better way, I guess I'll move on to the cleaning up tasks so that my Bootchart is perfectly ready with the process and disk info at least.

Then I can come back to this after mid-term evaluations.

3 comments:

Frank Ch. Eigler said...

You can dereference pointer chains with the $-syntax, as long as the types of the internal pointers are not hidden (with void* or whatever):

probe kernel.function("do_task_stat") {
println($task->utime + $task->signal->utime)
}

Regarding disk info, you may find the ioblock tapset useful; if you need more probes or more data at each point, they can be extended.

Regarding /proc files, there may be nothing wrong with using values from there as command-line parameters for systemtap scripts, though in your case you may not be able to assume that /proc is mounted in time.

Eugene said...

Thanks for your help, Frank.

Satya, thanks for the update.

It's probably good to play with blktrace when you are well ahead of the schedule.

In get_str_info(), perhaps you should define constants for those magic numbers.

In get_u_info_1(), try to use kread() to dereference any pointers that could potentially be invalid, eg:

struct signal_struct *si = kread(&temp->signal);

kread() protects against null pointer accesses that could crash the system. I have some examples at http://kernel.sg/scripts. Not the best examples, as I am sure I make mistakes too, but you could check it out as a reference.

Consider using a license (i.e. GPL) for the code you display in your blog, or at least state what license it is in.

Don't give up. Persevere.

satya said...

thanks a lot :)