Saturday, October 11, 2008

Boot (limn/chart)?

Had drafted a small write-up on bootlimn sometime ago. About what is bootlimn, how does it work, how is it different from bootchart and how to interpret the output of bootlimn.

Just pasting the write-up here for anyone who is interested..

Bootlimn

===========================================

One line description:
It is an analysis and visualization tool for the linux boot process.

===========================================

Working:
Bootlimn uses systemtap[0], a kernel-probing language to extract the
data in an event-based structure where the systemtap scripts probe for
certain functions to be called or a small time-period to elapse before
triggering a corresponding probe handler.
The probe handler contains details as to how to log the information.

This information is stored in XML format for standardization and for
facilitating its use by interested third parties. This information is
parsed using a SAX interface and is used for rendering an SVG image,
whose format is heavily inspired by the svg output of Bootchart[1].

===========================================

Interpretation of results (most important for any user):

An example file is sent along with this text. The XML folder contains
the output as given by the systemtap scripts.

The final bootlimn output consists of an SVG image and five text files.

a) SVG image ( This image has a template similar to Bootchart):

Header: The time shows how long (in seconds) the boot process took.

CPU stats: The first rectangle shows the CPU stats. The pink plot (in
the background) showing the CPU utilization and the blue plot (in the
foreground) showing the CPU throughput.

Disk stats: The second rectangle shows the CPU stats. The pink plot
(in the background) showing the Disk utilization and the green plot
(in the foreground) showing the disk throughput.

The syscalls that have been probed are listed along with their color
coding.

Process Tree: The process tree differs from the classical process tree
in Bootchart in the following ways:

i) The processes are listed in the chronological order of their initial
call and not necessarily as parent child blocks as in bootchart. As the
boot is sequential, a child is never rendered before a parent but the
child and the parent may be separated by a few intermittent processes.
The parent-child relationship is shown by dotted lines connecting the
parent and the child.

ii) All the actions are rendered. But to bring the image to a practical
size, process blocks instead of processes have been used. The processes
with the same name have been merged into a process block (the individual
rectangles in the tree), and all the calls made to the processes in the
process block are rendered sequentially. Hence, one might observe
sys_exit being called more than once on the same process block but the
block might have a sys_clone called before the first exit. The criteria
for trimming the tree can be easily altered to suit various purposes
but changing the condition in the uniqueprocess generator (in the
SVGRenderer.java).

iii) The color code represents the last sys_call that acted upon the
process and not the state directly. This was done because Bootlimn,
unlike Bootchart, does not poll the /proc directory and probes the system
calls instead.


iv) Flexible level of detail: By default, to render the entire image
with manageable dimensions and still be accurate, the timestamps collected
were in milli-seconds. And while rendering each pixel represents 0.1s.
As, no information is discarded while rendering, changing the level of
detail is very easy.
The level of detail in the case of bootlimn is defined by:
The timestamps unit (systemtap offers options to gather timestamps in ns,
ms etc. i.e. by changing the gettimeofday_ms in the systemtap scripts
in stp folder, we can change the level of detail).
The image size and the scale factor in the renderer.java
The scalability of SVG images can be used to keep the image size manageable.

There is no provision to concentrate on a part of boot process and give a
separate detailed view of that part as of now.

b) The text files.
The systemtap scripts are written so as to gather as much information as
possible.As displaying all the details on a graph is not possible, there are
five complementary files that give all the details collected by the systemtap
scripts.

i)The Ioblock.txt gives all the block IO details:
It prints the text output of the ioblock tapset.
It has:
type - whether it was a request for IO or a signal to end
time - timestamp
devname - block device name
ino - i-node number of the mapped file
error - its value is zero on success
sector - beginning sector for the entire bio
flags -
BIO_UPTODATE 0 ok after I/O completion
BIO_RW_BLOCK 1 RW_AHEAD set, and read/write would block
BIO_EOF 2 out-out-bounds error
BIO_SEG_VALID 3 nr_hw_seg valid
BIO_CLONED 4 doesn't own data
BIO_BOUNCED 5 bio is a bounce bio
BIO_USER_MAPPED 6 contains user pages
BIO_EOPNOTSUPP 7 not supported
rw - binary trace for read/write request
vcnt - bio vector count which represents number of array element (page,
offset, length) which make up this I/O request
idx - offset into the bio vector array
phys_segments - number of segments in this bio after physical address
coalescing is performed.
hw_segments - number of segments after physical and DMA remapping
hardware coalescing is performed
size - total size in bytes
bdev - target block device
bdev_contains - points to the device object which contains the
partition (when bio structure represents a partition)
p_start_sect - points to the start sector of the partition
structure of the device

ii) The Perpro.txt gives the per process CPU usage details
It has:
time - timestamp
pid - process id
execname - name of the process
probefunc - the probing function
utime - the user time of the process
stime - the system time of the process

iii) The Process.txt contains the Process details ( the process tree
is derived out of the same XML as this file.
It has:
time - timestamp
pid - process id
ppid - parent process id
execname - process name
probefunc - probing function
pexecname - parent process name
misc - null as of now. any additional information can be added

iv) The Readwrite.txt contains the details of individual system reads
and writes.
It has:
rcount - read count so far
wcount - write count so far
time - timestamp
pid - process id
execname - process name
pexecname - parent process name
type - read or write
file - the file to which data was written or data was read from

v) The Stats.txt contains the CPU and disk statistics. This file is a
direct mapping to the first two rectangles in the image showing CPU and
disk stats.
It has:
time - timestamp
CPUutil - CPU utilization
CPUtput - CPU throughput
diskutil - Disk utilization
disktput - Disk throughput

==============================================
Source:
The svn version of bootlimn can be checked out from [2] and a tarball can
be found at [3].

==============================================

References

[0] http://sourceware.org/systemtap/
[1] http://www.bootchart.org/
[2] http://code.google.com/p/bootlimn/source/checkout
[3] http://code.google.com/p/google-summer-of-code-2008-fedora/downloads/list

============================================================================

1 comment:

Anonymous said...

Nice story you got here. It would be great to read more about that theme.
BTW look at the design I've made myself A level escort