thomas.apestaart.org » intarweb help

intarweb help

Filed under: General — Thomas @ 19:26

2007-10-01
19:26

Hey people.

I am looking for software that allows me to track "things that have happened" over "large amounts of time". I would like to be able to categorize things, assign them some level of importance, describe what "objects" are affected by it, ...

I want to be able to hook up this software to simple scripts that can enter events into the system.

Examples of these are: software upgrades on machines (can be culled from /var/log/messages). Incidents filed in a bug tracker. Alarms reported by a monitoring system. Me noting down when we did a change in some config.

After that, the software would let me go look for patterns somehow, and zoom in and out on the data, and graphs made from it. It should let me track the history, so I can answer questions like "when was the last time we upgraded this piece of software", but also let me find causes and effects (what happened right before so many things started failing).

Any suggestions are welcome. I have no idea what sort of words I should be googling for to find the sort of thing I want.

Comments (9)

9 Comments »

Try Cacti: http://cacti.net/ — we’re using it at rPath for a lot of custom monitoring requirements such as rows in a database, files on a disk, etc. I think it’s pretty easy to write scripts to feed it data. Good luck!

Comment by Tim — 2007-10-01 @ 20:40
While this isn’t quite the intended use for these tools, security event management systems like ossim [1] or prelude [2] could be modified to do what you are looking for. They aggregate event data from firewalls, intrusion detection systems, switches, web servers, etc, usually by parsing log files and storing it in a database here the events are correlated to determine the likelyhood and severity of a security compromise and show event history.

All the tools are there. You could write agents to parse the events you are interested in out of /var/log/messages, or watch your changes to /etc/sysctl.conf, or anything else really. It is just a little more complicated than writing a python module.

The main problem is that they either require postgres or mysql. You would have to port it to something like sqlite3 if you didn’t want those running on your desktop.

http://www.ossim.net/
http://www.prelude-ids.org/

Comment by Kyle Ambroff — 2007-10-01 @ 20:50
Sounds like you need some sort of blog with tags. Blog the event, with the time and date forced to match the time and date of the event. Then tag the blog entries with tags like “upgrade”, “webhead”, “brownout”, “bug”, “alarm”.

Comment by ken — 2007-10-01 @ 21:30
Shameless, unhelpful post follows, feel free to skip.

I have no idea where you would find software like that, but if you were going to start writing something like that, I’d recommend using the back-end library I’ve written for the next version of Dates (a calendaring app – http://pimlico-project.org/dates.html) – It’s pretty easy to write back-ends and things for it and there are already a few useful widgets that use it – plus, excluding the widgets (which are already over half-way there), it’s fully documented.

Docs @ http://chrislord.net/docs/

SVN @ http://svn.o-hand.com/repos/dates/branches/jana/

Comment by Chris Lord — 2007-10-01 @ 21:55
Splunk? (http://www.splunk.com/). Its for multi-logfile analysis, and is intended for the cause-and-effect searching you’re looking for. All you need to do is generate log events. The search stuff is pretty good, there’s also reporting/charting in it which I’ve not tried.

Not open source but has a limited free license that’s usable – it limits you to indexing 500Mb/day: http://www.splunk.com/product/2018

Comment by Baz — 2007-10-01 @ 23:56
I use the ‘logger’ command for this. Your syslog routes the data to the appropriate files, logrotate compresses and manages the history (keeping it as long as you want). The entries are timestamped and tagged, making them easily parsed. For example:

$ logger system.downtime “Install new kernel”
$ logger application.downtime “Upgrade WINE application XYZ”
$ logger system.patch “Applied patch 123456”

If your needs are simple, this does the job.

Comment by Mace Moneta — 2007-10-02 @ 03:49
Zabbix or Nagios can do what you want. Both of which are horrifically difficult to configure and resource intensive pigs.

Comment by James Cape — 2007-10-02 @ 11:53
Splunk works well for me too. I never thought about about using ‘logger’, but now that I actually think about it and reading the previous comments; logger + splunk seems like an obvious choice. Splunk is rather resource hungry though, so that might be a factor. Also 500MB of logs is a rather low ceiling.

Comment by Andrew — 2007-10-02 @ 12:10
I think that what your looking for is some kind of datamining tool tailored
for software and server management. I think I see what you’re looking for
and I’ve been searching for some similar stuff but I didn’t find anything close.

Cacti and Nagios on solve on part of the equation which is broader IMHO.
I think what you definitely want to look at is Datamining stuff, especially
to find out patterns.

After my searches I came to the conclusion that if you want to “datamine”
seriously you have to build custom/specific software. In that case a
framework like Orange (http://magix.fri.uni-lj.si/orange/) might help but
I abandoned here as it is way above my skills.

Hope it helps.

Comment by Ludovic Danigo — 2007-10-02 @ 16:08

RSS feed for comments on this post. TrackBack URL

Present Perfect

intarweb help

2007-10-01
19:26

9 Comments »

Leave a comment

Present Perfect

intarweb help

2007-10-0119:26

9 Comments »

Leave a comment

2007-10-01
19:26