My previous post showed how PQL can be used with log searches to perform sophisticated analysis. If you are dealing with up to a few hundred thousand log events then queries can complete within a second or so. However with larger log data sets it may take many minutes to perform a PQL query that involves tens of millions of events. This makes these queries impractical to use in dashboards where you want to be able to see and navigate the data more quickly. To support analytics over massive data sets Paglo support “consolidation queries”.
A consolidation query is a PQL statement which executes a PQL query and stores the results back into PQL. This is analogous to the practice in SQL of storing intermediate results into a temporary table. The syntax of consolidation queries follows the form:
consolidate into pql_path (key_field*) pql_select_statement
The consolidation query executes the PQL select statement and stores the results at the point in the tree specified by the PQL path. Optional key fields can be specified to created a nested structure. Consolidation queries run periodically (approximately every 15 minutes). Each time a consolidation query runs history is maintained in the resulting tree so that PQL history statements can be used to see how data has changed over time. As a special case, if one of the columns in the query is named ‘timestamp’ then that column will not be stored, but instead will be used to timestamp the data history for each of the other columns. If the select statement includes a log search then the select statement only considers events which have been indexed since the previous time the consolidation statement was run.
Some examples might make all of this clearer.
To compute the number of log events received with a resolution of 5 minutes a query like this could be used:
select timestamp(round(timestamp(timestamp), '5 minutes')) as timestamp, count(*) as total_volume
from %*%
group by timestamp(round(timestamp(timestamp), '5 minutes'))
Since summing across tens of millions of events can be time consuming we can use a consolidation query to pre-compute the result for us, and also allow us to use PQL’s HISTORY syntax to then chart the result. This query
consolidate into /consolidation/log_volume
select timestamp(round(timestamp(timestamp), '5 minutes')) as timestamp, count(*) as total_volume
from %*%
group by timestamp(round(timestamp(timestamp), '5 minutes'))
computes the log volume and stores it into the single tree node /consolidation/log_volume/total_volume but with timestamped history every 5 minutes. We can then use another PQL query to see the log volume over time:
select sum(total_volume) as vol
from /consolidation/log_volume
history to 'now' from '1 week ago'
This query only takes a fraction of a second regardless of how many log events are involved because it is dealing with the aggregated data, not the raw events any more. To store the log volume by source we need to key on the source field:
consolidate into /consolidation/log_volume/sources
keys (source)
select timestamp(round(timestamp(timestamp), '5 minutes')) as timestamp, source, count(*) as total_volume
from %*%
group by timestamp(round(timestamp(timestamp), '5 minutes')), source
This will create multiple /consolidation/log_volume/sources nodes, each one with source and total_volume sub nodes containing the selected values. Then we can use a query like this to get the history of a source:
select sum(total_volume) as vol
from /consolidation/log_volume/sources[source = 'source::/var/log/messages’]
history to 'now' from '1 week ago'
So consolidation queries can be used to distill massive log data sets down to summary information that can be viewed in real-time.


