One of the cool things about working at Paglo is being able to use the product you are developing to help detect problems before they become serious failures on your own systems. This morning I had the experience of Paglo helping me detect a failing disk drive on one of my systems.
How did Paglo help here? Well, we have all of the system logs from these machines captured by Paglo’s Log system. You can see how to setup Paglo to capture logs here: Configure log collection…
NOTE: To access that page you need a Paglo account. A brief summation of the page is: for Windows machines logs are collected automatically once a day up to the free 10mb per day limit. For Unix machines you need to set up syslog-ng or RSyslog to send logs securely to your Paglo account for your company.
Once you are collecting logs there are lots of things you can do to monitor and inspect the ongoings of your systems. In this blog post we are going to focus on log messages that are related to disk or kernel issues. We want to look for log messages that have the words 'kernel' and 'failure' or 'root' and 'ZFS' in them. This may capture more than we want, but we can always refine the query later if it triggers too often.
In Paglo log search you use parentheses, ie: ‘(’ and ‘)’, to group search clauses. You use boolean logic to join search clauses with words such as AND and OR. This makes our search term be: “(kernel failure -user) OR (root ZFS)”.
In order for Paglo log search to differentiate between searching for the word ‘or’ and telling Paglo you want it to use an OR clause you must use an upper case “OR”. We have added “-user” to our search to tell Paglo that we want to exclude log messages that have the word “user” in them. Log searches are case insensitive.
In the case of our machine with a failing disk drive this returns the results:

To turn this in to an alert that will notify me via email whenever any log messages matching this occur is pretty simple. Go to the “Alerts” app and click the “Create a new alert” link. Then fill in the ‘Generate alert when’ section with our search term, being sure to select the ‘Log search’ type:

I have already created this alert and shared it with the Paglo community. You can find it at: Kernel / ZFS failures
Be sure to add some destinations to the alert if you use it so that Paglo has somewhere to send alert notifications to.
Although this does not remove the sinking feeling you get when you need to replace a failed hard drive this should at least tell you that a drive is beginning to fail before the whole system goes south.


