Twin Peaks

My Monday LISA tutorial was on system log aggregation, analysis, and statistics. mjr taught it, and he’s as good a public speaker as ever. Also the topic was pretty damned fascinating. I’ll be dumping a pile of links into del.icio.us sometime soonish now.

Highlights, some of which are significant and some of which are just cool:

You can set up an invisible loghost. What you do is you specify a non-existent host as the loghost on all your DMZ servers. You’re gonna need to manually stuff an entry into the arp table so that your DMZ servers will blithely send syslog packets off into thin air. Then you hook the real loghost up to the DMZ with no IP address in promiscuous mode. Run tcpdump on it to capture all the packets, and write some cheap perl to strip syslog payloads out of the captured packets.

Or use mjr’s plog instead of tcpdump, since it’ll automate all that complex stuff for you. Neat.

Artifical ignorance. Cute term. It’s basically the same rule of thumb as “block everything, then permit what you want” but reversed. “It’s interesting unless I’ve explicitly said it’s boring.” At a very basic level, it looks like this: grep -v -f patternfile. As you figure out what you don’t care about, stick a regexp to match into patternfile and you won’t see it again. The process speeds up over time, obviously. This calls out for a slick web front end.

First seen anomaly detection. It’s sort of like artificial ignorance, but different. You alert every time something completely new appears in the logs. There is a tool for this, also written by mjr, called NBS (Never Before Seen). It uses Berkeley DB and is very fast. You feed it input for a specified dataset and it tells you if it’s seen that particular chunk of input before. It can report on its database in a bunch of useful ways.

Example: record DHCP servers giving out IP addresses. (Sample string after a bit of log parsing: “10.0.0.10 gives IP 10.0.1.1 to MAC 0:2:2d:10:10:10”.) If a new MAC address shows up, it’ll be flagged by NBS as a new chunk of input, because that string is guaranteed to differ in that case. If an old MAC address gets a different IP address, that’ll show up too, but only the first time it gets that particular IP. As a bonus, you’ll find out if any new DHCP servers show up. Pure gold.

Another example, which happens to be the first use I thought of: turn it loose on my HTTPD log files. Filter said log files for referrer and URL pairs; report the first time a new referrer/URL pair is seen. I have something like this in place now but it’s written in perl and it’s fairly fragile; this will be better.

Or just dump URLs into the database. “Hm, someone just tried to load /cgi/foobar.exe for the first time; looks like a new exploit.”

So yeah, a very cool tutorial. I’m all jazzed about the possibilities. Check out his web site on the topic.

Be First to Comment

Leave a Reply Cancel reply