HomeSection_sub_breakBlogsSection_sub_breakTechnical Blog
Icon_blog_forum_small Analyzing log data with PQL
Icon_person
Chris
Icon_time
07/01/2009 at 11:39
Icon_post
0 comments

The results of log searches can be used together with PQL to perform sophisticated analytics on log data. In a PQL query the FROM clause can be a log query. For example if we were interested in analyzing which user names were most common in failed logins we might use a query like:


  select *
  from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%

This query generates output like:

event_id timestamp length content source source_type host_name host_address
635581 2009-07-01T02:58:35.000000Z 97 Jun 30 19:58:35 kamidake sshd3052: Failed password for paglo from 64.139.30.162 port 55417 ssh2 /var/log/auth.log syslog kamidake.apricot.com
552084 2009-07-01T02:38:33.000000Z 93 Jun 30 19:38:33 kamidake sshd84933: Failed password for paglo from 10.0.1.3 port 52002 ssh2 /var/log/auth.log syslog kamidake.apricot.com
531168 2009-07-01T02:27:09.000000Z 97 Jun 30 19:27:09 kamidake sshd6694: Failed password for paglo from 64.139.30.162 port 65375 ssh2 /var/log/auth.log syslog kamidake.apricot.com

The columns in the output are primarily meta information about the event like its timestamp, source and length. In some cases there will be additional columns based on information extracted from the event (e.g. if it contains key=value formatted information).

We can treat the output of the search just as if it were a SQL table. For example, to count the total number of matching events we could write:


  select count(*)
  from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%

giving:

count(*)
209

In this case we want to find the number of failed logins by user name. The user name is available, within the content of the event message. Using PQL it can be extracted using the EXTRACT clause like this:


  select user
  from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%
  extract "password for (?'user'[^ ]+)"

giving:

user
paglo
paglo
paglo
paglo
root
root

The EXTRACT clause uses a regular expression (in PCRE syntax) to identify the portion of the content to extract as a pseudo-column in the table. Multiple EXTRACT clauses can be used, and multiple fields can be specified in each EXTRACT clause. The extracted field is delimited by the parentheses () and named by the word in quotes. Here is another example:


  select user, source_ip, source_port
  from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%
  extract "password for (?'user'[^ ]+)"
  extract "from (?'source_ip'\d+\.\d+\.\d+\.\d+) port (?'source_port'\d+)"

giving

user source_ip source_port
paglo 64.139.30.162 55417
paglo 10.0.1.3 52002
paglo 64.139.30.162 65375
paglo 10.0.1.3 60944
root 190.2.0.151 49170
root 190.2.0.151 49033

Once the fields are extracted any PQL feature can be applied, such as column formatting, sorting and aggregation. So here is the final query we need to find how often each user name appears in failed logins:


  select user as User, count(*) as Count
  from %"failed password for" -"invalid user" source::/var/log/auth.log from::"3 days ago"%
  extract "password for (?'user'[^ ]+)"
  group by user
  order by 2 desc

and the answer

User Count
root 79
invalid 67
paglo 59
postfix 2
cyrus 1
pgsql 1

As always, if there is a complex query you need to write, we here at Paglo are happy to help so just contact support.

Add a Comment