HomeSection_sub_breakBlogsSection_sub_breakTechnical Blog
Icon_blog_forum_small Crawler 3.0 Beta Available
Icon_person
Chris
Icon_time
12/05/2008 at 08:58
Icon_post
0 comments

We are now in Beta on the new and improved Paglo Crawler 3.0 and would appreciate it if you would try it out and provide feedback. It has been rebuilt to improve performance and reliability. For this reason, we suggest that you first run it in a test lab or in a new environment that you have not already gathered data from. For example, if you are a consultant you might want to use it when you turn up Paglo for your next client.

Paglo Crawler 3.0 boasts a number of significant improvements over the current version including:

  • It now runs as a service (this means that a user no longer needs to be logged into the machine for it to run)
  • We have changed the architecture to be more fault-tolerant (similar to the Google Chrome architecture)
  • You can control the Crawler’s resource utilization (for the machine it is running on)
  • It collects more extensive Windows and Unix data (e.g. Windows routing table, disk quotas, motherboard configuration, etc.)
  • You can also now control the Crawler’s configuration directly through your account via the Crawler Application

Download Beta Crawler 3.0

The new multi-process Crawler architecture

Data collection is hard work, especially in today’s complex and extremely heterogeneous IT environments. There are more systems, applications, and users than ever before and great volumes of data are being generated every minute. This means that the data collection techniques of the past no longer suffice and are no longer reliable enough for demanding environments. With this in mind and the fact that modern computers and servers have greater resources, we improved the architecture of the Paglo Crawler.

Paglo Crawler takes advantage of the fact that data collection techniques can take place in parallel and puts different plug-ins in separate processes. This means that one technique or an unresponsive asset will not affect the Crawler or the other data collection processes that are taking place. It also means that each process is basically running in a restrictive sandbox that helps limit the impact if it completely stops. Each process is also now carefully monitored by a ‘watchdog’ which can give it a swift kick if it needs help getting going again.

New Crawler Application for online Crawler configuration

You can now configure the Crawler and all of the plug-ins directly through your Web account. Simply log in to your account and click on the word Applications in the left–hand menu. Install the Crawler Application and you are ready to go. Once installed, you can control the data collection process from just about anywhere through the Crawler Application without actually accessing the Crawler directly. You also now have granular control over when the Crawler runs and what data it collects through a powerful Scheduler.

Use the Scheduler to configure the Crawler to gather data from specific machines at custom time intervals. You simply need to select the plugin that you want to use (e.g. SNMP Interface Statistics), the hosts that you want to scan (e.g. your key switches and routers), and the interval that you want to gather data from them (e.g. every five minutes).You can also visualize the Crawler’s activity and the log files that it generates as well.

Things to keep in mind

Due to the fact that we have re-architected the Crawler, we strongly suggest that you use it in a test or new environment first, before moving it to an existing account. While it has been extensively tested, every account is different and we want to make sure that the new data that it collects does not impact data that you already have in an existing account.

In addition, because we have moved to a multi-process architecture, the Crawler will use more resources of the machine that it is running on. You will also notice in the Task Manager that many more Paglo processes are running simultaneously. However, you can control how many processes it can run directly through the Crawler. Simply pull the Crawler Performance slider to the desired level based on your environment. Also, the credentials that you give to the Crawler so it can gather the rich information about your environment still need to be configured directly through the Crawler. This information is never transferred into your Search Index, so it remains in the Crawler.

Let us know what you think and if you have any questions.

Download Beta Crawler 3.0

Add a Comment