Saturday, November 25, 2006

Processing vxstat to read into R

I got bored with my iostat data, and found some interesting looking vxstat logs to browse with the Cockcroft Headroom Plot. To get them into a regular format I wrote a short Awk script that is shown below. It skips the first record, adds a custom header and drops the time field into the first column.


# process vxstat file into regular csv format
BEGIN { skipping=1; printf("time,vol,reads,writes,breads,bwrites,tread,twrite\n"); }
NR < 4 {next} # skip header
NF > 0 && skipping==1 {next} # skip first record of totals since boot
NF == 0 {skipping=0}
NF == 5 {time=$0}
NF == 8 {printf("%s,%s,%s,%s,%s,%s,%s,%s\n",time,$2,$3,$4,$5,$6,$7,$8);}


It turns a file that looks like this:

OPERATIONS BLOCKS AVG TIME(ms)
TYP NAME READ WRITE READ WRITE READ WRITE

Mon May 01 19:00:01 2000
vol home 88159 346799 17990732 3680604 13.7 15.6
vol local 64308 103869 3848746 410899 6.0 22.0
vol orahome 80240 208372 18931823 886870 11.9 21.1
vol rootvol 336544 537741 21325442 8566302 4.8 323.1
vol swapvol 32857 339 4199304 58160 13.8 22.5
vol usr 396221 174834 11766646 2872832 3.5 547.6
vol var 316340 1688518 25138480 19275428 11.1 53.7

Mon May 01 19:00:31 2000
vol home 1 28 4 129 10.0 34.3
vol local 0 2 0 8 0.0 330.0
vol orahome 4 20 24 88 10.0 84.0
vol rootvol 0 80 0 720 0.0 9.4
vol swapvol 0 0 0 0 0.0 0.0
vol usr 0 1 0 16 0.0 20.0
vol var 4 235 54 2498 15.0 13.7

... and so on


into

% awk -f vx.awk < vxstat.out
time,vol,reads,writes,breads,bwrites,tread,twrite
Mon May 01 19:00:31 2000,home,1,28,4,129,10.0,34.3
Mon May 01 19:00:31 2000,local,0,2,0,8,0.0,330.0
Mon May 01 19:00:31 2000,orahome,4,20,24,88,10.0,84.0
Mon May 01 19:00:31 2000,rootvol,0,80,0,720,0.0,9.4
Mon May 01 19:00:31 2000,swapvol,0,0,0,0,0.0,0.0
Mon May 01 19:00:31 2000,usr,0,1,0,16,0.0,20.0
Mon May 01 19:00:31 2000,var,4,235,54,2498,15.0,13.7
... and so on


This can easily be read into R and plotted using


> vx <- read.csv("~/vxstat.csv", header=T)
> vxhome <- vx[vx$vol=="home",]
> chp(vxhome$reads,vxhome$treads)


One of the files I tried was quite long, half a million lines. It loaded into R in fifteen seconds, and the subsequent analysis operations didn't take too long. Try that with a spreadsheet... :-)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.