Available in PaperCut MF only.

Set up self-hosted Document Processing

To set up self-hosted Document Processing, you need to:

  1. Determine where to install Document Processing

  2. Install Document Processing

  3. Configure the host location and available languages

  4. Tuning Document Processing server performance

IMPORTANT

Step 1: Determine where to install Document Processing

For smaller environments, it makes sense to install Document Processing alongside the Application Server. In medium to larger environments, though, you can ensure optimum system and Application Server performance by setting up one or more dedicated Document Processing servers that the Application Server can contact.

See the table below for recommendations.

Environment size Approx. scan jobs per day Recommended processors* Recommended installation location Benefits

Small

0 – 50

2

Application Server

  • Less infrastructure cost.

  • Great for smaller business with occasional Document Processing load

Medium

50 – 200

3

Start on a well- resourced Application Server. Monitor and plan for a separate server on an as-needed basis.

  • Balances resource use, system performance, and Document Processing performance.

Large

200+

4+

One or more separate high performing Document Processing servers

  • Dedicated resources mean better handling of high scanning load, spikes, and multiple jobs. For example, in larger Enterprise or Education environments.

  • Document Processing’s heavy resource requirements don’t interfere with the normal operation of the Application Server.

*Recommended available processors to use (to support parallel jobs).

Keep in mind that the more storage and processing power available, the better Document Processing performs—make as much available as you can. For any environment size, we recommend:

  • at least 10 GB available disk space

  • 512 MB available memory

  • running a 64-bit edition of Microsoft Windows.

For information about:

Step 2: Install Document Processing

  1. Download and install both of the following:

  2. Download the Document Processing (OCR) installer.

  3. On the Document Processing server, run the file. The Setup Wizard is displayed.

  4. Follow the prompts during the install.

    • If you intend to scan documents to PDF, ensure that the GhostTrap component is selected for installation.

    • If you intend to scan to DOCX, ensure that the Pandoc component is selected for installation.

    On Windows servers, the installer configures the Windows Firewall.

  5. If you are using a non-Windows Firewall, open port 9181 (inbound) to allow connections from the PaperCut MF Application Server.

  6. Repeat the process for each Document Processing server you wish to add.

Step 3: Configure the host location and available languages

  1. In the PaperCut MF Admin web interface, do one of the following:

    • If you’re already on the Capture page, refresh the page.

    • Click Options > Capture. The Capture page is displayed.

  2. In the Hosting area, select Use self-hosted Document Processing (requires additional setup).

  3. In the Add Document Processing Server area, in Hostname, type the hostname or the IP address of the server where you installed Document Processing.

    NOTE

    We recommend that you use the server hostname. Only use the server IP address if it’s static.

  4. Click Add.

  5. If you want to set up multiple Document Processing servers, click Add new Document Processing Server; then repeat steps 3 and 4.

    Each Document Processing server is listed on the Capture tab.

  6. Click Apply.

  7. Ensure that your scan actions have been configured with the desired Document Processing options enabled.

  8. Run a test job for each configured Document Processing option and check the output files.

Step 4: Tuning Document Processing server performance

The approach to tuning a Document Processing server's performance depends on whether it's on a standalone system or co-located with other services.

By default, a Document Processing server processes two jobs in parallel, and they are processed with a normal CPU priority. As described below, you can change the default number of parallel jobs by modifying the configuration file at [ocr-server-path]/data/config/config.toml.

After making changes to the config file, you’ll need to restart the Windows service: PaperCut OCR Server.

Tuning for installation on a standalone system

For best performance when installing the Document Processing server on a standalone system, it's a good idea to maximize the number of jobs that can be processed in parallel.

The ideal number to use depends on many factors, such as the type and size of the documents being processed and the system architecture. A reasonable starting point is to use the total number of virtual CPUs (or cores times threads on a “bare metal” system) minus two.

Put another way, if you want to process four jobs in parallel and you're installing Document Processing on a virtual machine, give it six virtual CPUs.

To make this change:

  1. In the config.toml file, remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.

  2. Set the MaxJobsInParallel line to MaxJobsInParallel = 4

  3. Restart the Windows service: PaperCut OCR Server

Tuning for co-location with the Application Server

NOTE

For medium to large environments we do not recommend this approach; see the table above. Document Processing’s heavy resource requirements can interfere with the normal operation of the Application Server.

If your system has additional available processors (beyond what the Application Server is using), you might want to consider increasing the number of jobs that are processed in parallel from the default of two.

To make this change:

  1. In the config.toml file, remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.

  2. Set the MaxJobsInParallel = 3

  3. Restart the Windows service: PaperCut OCR Server