The Radix DM Crawler is a program that can be run from any machine at any time, but is generally set to run as a Windows scheduled task on the machine on which Radix DM was originally installed. This program verifies that data stored about the documents in Radix DM matches the actual file details.
For the Radix DM Crawler to function, the machine on which it runs must have a MAPI component installed. For most users, Microsoft Outlook provides this functionality. For servers or other machines which do not have Microsoft Outlook installed, you will need to install Microsoft Exchange Server MAPI Client and Collaboration Data Objects 1.2.1 or equivalent.
The Radix DM Crawler can be run as a command line, a Windows application or by Radix DM Administrators from the Radix DM Search Administration tab.
The initial configuration screen will resemble the following:
Create Commandline Shortcut: This command creates a new shortcut to the program (with parameters set as defined by the current window) which can be saved.
Validate Library File Paths: This process works in reverse to a normal Radix DM Crawler operation by scanning the details in Radix DM to identify any files that may be missing. These results appear in the section Progress. This function cannot be called from a command line.
Commandline Parameters Information: Display a list of command line parameters which are detailed in the section Running as a Scheduled Task.
If Radix DM is configured to run across a WAN then the Radix DM Crawler should be run on a local server for each physical location.
Process all Locations and Library Groups: This toggle button switches the process range of Locations and Library Groups from the list of selected values in accompanying drop downs to all possible values.
Process Selected Locations/Library Groups: This toggle button switches the process range of Locations and Library Groups from the list of selected values in accompanying drop downs to all possible values. If Radix DM is configured to run across a WAN then the Radix DM Crawler should be run on a local server for each physical location. In each of these locations, only the library groups that are local to that particular server should be selected with this option.
Move files to quarantine: If this option is checked, then files that the Radix DM Crawler detects in the document file locations that should not be there are moved to the Quarantine folder. The default path for this folder is \\SharedNetworkLocation\Programs\RadixDM\Quarantine, as determined by the Radix DM installation. No files are deleted as part of this process.
Locations: This dropdown allows the user to select the Locations which the Radix DM Crawler process will operate on.
Library Groups: This dropdown allows the user to select the Library Groups which the Radix DM Crawler process will operate on.
Process: Click this button to verify that the files and folders in the physical directory store match the data stored in the database, based on the settings selected. In addition, tasks associated with the three tabs will be performed. Results will appear in the text box Progress.
If the check box Set Document Titles is checked then when the Radix DM Crawler processes the designated files then it will also set the document property Title with the value of the corresponding Radix DM system field Title. The only extensions supported for this operation are: doc, docx, xls, xlsx, ppt, pptx. If any of these document types are not desired, then remove them from the Set Document Extensions text box if you are running the program from Windows or from the value for the key SetDocumentTitleExtensions in the RadixDMCrawler.exe.config file which will be located in the same folder as the program executable. Please note that this operation will fail for documents that are password protected or designated as Read Only.
When the Radix DM Crawler processes the designated files it will also set the value of the built in Radix DM system field PageCount for each document which has an extension of one of the values in the section Page Count Extensions with the value of the PageCount property for these documents. The only extensions supported for this operation are: doc, docx, xls, xlsx, ppt, pptx, pdf. If any of these document types are not desired, then remove them from the Page Count Extensions text box if you are running the program from Windows or from the value for the key PageCountExtensions in the RadixDMCrawler.exe.config file which will be located in the same folder as the program executable. Please note that this operation will fail for documents that are password protected.
If the check box OCR PDF Files is checked then when the Radix DM Crawler processes the designated files then it will attempt to OCR all PDF files that do not contain fonts that it operates on. If this scan completes successfully the original file will be moved to the Quarantine\OCR folder (\\SharedNetworkLocation\Programs\RadixDM\Quarantine\OCR) and the processed file will replace it. This allows the original file to be recovered if there are any issues associated with the process.
The DPI used for the OCR can be changed. A lower DPI results in smaller files, but less accurate OCR. Higher scan rates result in larger files but more accurate OCR. 300 DPI offers a good balance between size and accuracy.
The Radix DM Crawler is designed to be run as a scheduled task. Schedule this task to occur at regular intervals to ensure that Radix DM search results are accurate, generally at least once per day (in the evenings).
The scheduled task can be created with the following details which are also specified on the the tab Command Line Parameters:
Command Line Parameters | |
---|---|
/A | If this argument is used, the programs runs automatically and closes when it is complete. |
/LOC="xxx" | A comma separated list of Location IDs that the Radix DM Crawler will use. All documents in library groups that belong to these locations will be operated on. |
/LIB="xxx" | A comma separated list of Library Group IDs that the Radix DM Crawler will use. All documents in these library groups will be operated on. If this parameter is omitted, as library groups will be used. |
/SETTITLE | If this argument is used, then the title property will be set for documents in the selected library groups. |
/OCR | If this argument is used, then the text will be OCRed for documents in the selected library groups. This argument is not required if /OCRDPI is used. |
/OCRDPI="xxx" | Thie DPI of the OCR scanning that will be applied to documents in the selected library groups. |
/MQ | If this argument is used, then documents that appear in the Radix DM document folders that do not belong in Radix DM are moved to Quarantine. |