Wednesday, May 06, 2009

firewall-wizards Digest, Vol 37, Issue 6

Send firewall-wizards mailing list submissions to
firewall-wizards@listserv.icsalabs.com

To subscribe or unsubscribe via the World Wide Web, visit
https://listserv.icsalabs.com/mailman/listinfo/firewall-wizards
or, via email, send a message with subject or body 'help' to
firewall-wizards-request@listserv.icsalabs.com

You can reach the person managing the list at
firewall-wizards-owner@listserv.icsalabs.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of firewall-wizards digest..."


Today's Topics:

1. Re: Handling large log files (Paul Melson)
2. Re: Handling large log files (david@lang.hm)


----------------------------------------------------------------------

Message: 1
Date: Tue, 5 May 2009 23:38:49 -0400
From: Paul Melson <pmelson@gmail.com>
Subject: Re: [fw-wiz] Handling large log files
To: Firewall Wizards Security Mailing List
<firewall-wizards@listserv.icsalabs.com>
Message-ID:
<40ecb01f0905052038y39cd0aafqc8ce6372dd1f0db8@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Tue, May 5, 2009 at 6:41 PM, Nate Hausrath <hausrath@gmail.com> wrote:
> Hello everyone,
>
> I have a central log server set up in our environment that would
> receive around 200-300 MB of messages per day from various devices
> (switches, routers, firewalls, etc). ?With this volume, logcheck was
> able to effectively parse the files and send out a nice email. ?Now,
> however, the volume has increased to around 3-5 GB per day and will
> continue growing as we add more systems. ?Unfortunately, the old
> logcheck solution now spends hours trying to parse the logs, and even
> if it finishes, it will generate an email that is too big to send.
>
[...][
> Are there other solutions that would be better suited to log volumes
> like this? ?Should I look at commercial products?
>
> Any comments/criticisms/suggestions would be greatly appreciated!
> Please let me know if I need to provide more information. ?Again, my
> lack of experience in this area causes me hesitant to make a solid
> decision without asking for some guidance first. ?I don't want to
> spend a lot of time going in one direction, only to find that I was
> completely wrong.


What are you trying to achieve with your log analysis, as in, what
sort of actions would the review of this daily log report trigger?
Would you want to or should you move to a model where search/analysis
is happening in near-real time instead of once daily? That's going to
be helpful in knowing what kind of solution you should be looking at.
Also, while it's overpowering your logcheck scripts, 5GB/day of log
data is nothing when you're talking about firewall logs.

PaulM


------------------------------

Message: 2
Date: Wed, 6 May 2009 05:30:01 -0700 (PDT)
From: david@lang.hm
Subject: Re: [fw-wiz] Handling large log files
To: Firewall Wizards Security Mailing List
<firewall-wizards@listserv.cybertrust.com>
Message-ID: <alpine.DEB.1.10.0905060502590.5928@asgard>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Tue, 5 May 2009, Nate Hausrath wrote:

> Hello everyone,
>
> I have a central log server set up in our environment that would
> receive around 200-300 MB of messages per day from various devices
> (switches, routers, firewalls, etc). With this volume, logcheck was
> able to effectively parse the files and send out a nice email. Now,
> however, the volume has increased to around 3-5 GB per day and will
> continue growing as we add more systems. Unfortunately, the old
> logcheck solution now spends hours trying to parse the logs, and even
> if it finishes, it will generate an email that is too big to send.
>
> I'm somewhat new to log management, and I've done quite a bit of
> googling for solutions. However, my problem is that I just don't have
> enough experience to know what I need. Should I try to work with
> logcheck/logsentry in hopes that I can improve its efficiency more?
> Should I use filters on syslog-ng to cut out some of the messages I
> don't want to see as they reach the box?
>
> I have also thought that it would be useful to cut out all the
> duplicate messages and just simply report on the number of times per
> day I see each message. After this, it seems likely that logcheck
> would be able to effectively parse through the remaining logs and
> report the items that I need to see (as well as new messages that
> could be interesting).
>
> Are there other solutions that would be better suited to log volumes
> like this? Should I look at commercial products?

I don't like the idea of filtering out messages completely, the number of
times that an otherwise 'unintersting' message shows up can be significant
(if the number of requests for a web image per day suddenly jumps to 100
times what it was before, that's a significant thing to know)

the key is to categorize and summarize the data. I have not found a good
commercial tool to do this job (there are good tools for drilling down and
querying the logs), the task of summarizing the data is just too site
specific. I currently get 40-80G of logs per day and have a nightly
process that summarizes them.

I first have a process (perl script) that goes through the logs and splits
them into seperate files based on the program name in the logs. Internally
it does a lookup of the program name to a bucket name and then outputs the
message to that bucket (this lets be combine all the mail logs to one
file, no matter which OS they are from and all the different ways that the
mail software identifies itself). for things that I haven't defined a
specific bucket for, I have a bucket called 'other'

I then run seperate processes against each of these buckets to create
summary reports of the information in that bucket. some of these processes
are home-grown scripts, some are log summary scripts that came with
specific programs.

one of the reports is how mnay log messages there are in each bucket (this
report is generated by my splitlogs program)

for the 'other' bucket, I have a sed line from hell that filters out
'unintersting' details in the log messages (timestamps, port numbers, etc)
and then run them through a sort|uniq -c |sort -rn to produce a report
that shows how many times a log message that looks like this shows up (the
sed line works hard to collaps similar messages togeather)

I then have a handful of scripts that assemble e-mails from these reports
(different e-mails reporting on different things going to different
groups). For a lot of the summaries I don't put the entire report in the
e-mail, but instead just do a head -X (X=20-50 in many cases) to show the
most common items.

for example, I have a report that shows all the websites that were hit by
people on the desktop network. I have another report that shows the hits
by desktop -> website. I generate an e-mail showing the top 50 entries in
each of these reports and send it to the folks looking for unusual
activity on the desktop network (it's amazing how accuratly a simple
report like this can pinpoint a problem desktop machine)

getting this setup takes a bit of time and tuning, but with a bit of
effort you can quickly knock out a LOT of your messages, and then you
start finding interesting things (machines that are misconfigured and
generating errors on a regular basis, etc). as you fix some of these
problems, the other report goes from an overwelming tens of thousands of
lines, to a much smaller report. just concentrate on killing the big items
and don't try to deal with the entire report at once (the nightly e-mail
to me shows the top several hundred lines of this report so that I can
work on tuning it. when I can keep up on the tuning it's not unusual for
this to be the entire report)

with this approach (and a reasonably beefy log reporting machine), it
takes about 3-6 hours to generate the report (6 hours being the 80G days)

I have other tools watch the logs in real-time for known bad things (to
generate alerts), and am installing splunk to let me go searching in the
logs when I find something in the reports that I want to investigate
further (with this sort of log volume, just doing a grep through the logs
can take days)

hope this helps.

David Lang


------------------------------

_______________________________________________
firewall-wizards mailing list
firewall-wizards@listserv.icsalabs.com
https://listserv.icsalabs.com/mailman/listinfo/firewall-wizards


End of firewall-wizards Digest, Vol 37, Issue 6
***********************************************

No comments:

Post a Comment