Introduction#

The idea is to monitor running computerized systems in a seamless way and to collect historical information and allow to analyze it and graph it easily.

Design#

Java Unix Agent#

An API to make Unix (and other) commands easily accessible in a more homogeneous way. The source code is here

The internal API is used to produce a connected agent that collect information.

Recording/Logging#

JUA produces JSON data that can be stored for future retrieval. This could be implemented in a NOSQL database, either document-oriented or big table. It should be easy to extract a subset of the data to export it so it can be loaded in another instance and analyzed. Compression is important if we want to be able to store old logs.

Reports/Viewserver#

The view server runs reports and produces summaries from the recorded information and caches it so that it can be retrieved fast.

Client/Visualization#

There is a need to show the information and be able to look at views and drill down on low-level information in the database (individual OS-specific data, log file contents etc.)

Navigation#

Eksplane, a simplistic Django prototype that was excluding reports and recording and focusing on real time navigation and links between different aspects of a live Unix sysem (process, files, sockets, ports etc.). The idea was to render the output of Unix commands as HTML enriched with hyperlinks specific to the type of the information in that output (e.g. a PID in the output is clickable so you get the choice of pstack, lsof, arguments etc.).

Background information#

Parsing text#

Parsing text is key as most important commands output text and most log files are text based. Some tools output HTML and parsing HTML (probably bad HTML) is important too. TagSoup and Jericho may be useful.

Unix commands#

There are many unix commands and other tools that are good sources of information for this project:

  • ps: list processes, arguments, resource usage etc.
  • pstack, gstack, procstack, dbx/gdb + where : show the call stack(s) of a given process
  • pfiles, procfiles, lsof : show files used by one or multiples processes
  • pmap, procmap, jmap: memory usage of a given process
  • netstat, ifconfig
  • df: show disk utilisation of partitions etc.
  • iostat, iotop etc
  • sp_sysmon: show Sybase activity summary
  • uname
  • prtdiag,
  • vmstat, free: system memory usage
  • lpr
  • dmesg, cat /var/log/messages, errpt -a : display operating system messages

Log files#

  • core,hserr*.log, javacore*: files generated by programs that have errors
  • LOG4J files
  • strace, truss output

Database commands#

Sybase#

  • sp_sysmon
  • sp_help

Oracle#

  • AWR report

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-13) was last changed on 20-May-2012 00:32 by pgaillard  
Welcome (anonymous guest) Wiki Prefs
JSPWiki v2.8.5-svn-6