Document Scanning Glossary

The scanning and capture industry is full of acronyms, buzzwords, and inconsistent terminology. We’ve cut through the noise to give you clear, reliable definitions—so you can focus on choosing the right solution, not deciphering the lingo.

Advanced Document Separation

Advanced document separation breaks out individual pages or groups of pages into their own distinct image files. This process may use recognition technology, artificial intelligence, or machine learning tools to detect page context, intelligently breaking up jobs with little to no manual intervention.

After Scan Correction

A proprietary feature found in PaperStream Capture Pro that easily allows the user to adjust image quality on documents that have already been scanned without rescanning them. After Scan Correction provides several enhanced versions of a document to choose from, which can be helpful when working with hard-to-read originals or when rescanning is too time consuming and costly, or simply impossible.

Agentic AI

An AI workflow that works in a circular iterative process to achieve a goal. AI agents can be chained together to realize a multi-agent workflow (business process).

Artificial Intelligence (AI)

A branch of computer science that enables machines to perform tasks that typically require human intelligence, such as recognizing patterns, interpreting information, or making decisions. This can be used for advanced document processing, character recognition, and streamlining complex workflows.

Autofill

A software feature that automatically completes similar form fields with previously-entered data. For example, if a user enters their name in the name field of a form, that name will automatically populate in the name field of a different form.

Automation

The use of technology to perform work with minimal human intervention, allowing humans to focus on more complex tasks. In document workflows, automation helps streamline repetitive processes such as filing, tagging, and routing.

Batch Scanning

Scanning multiple documents in one continuous process. Batch scanning is most easily done with an Automatic Document Feeder (ADF) scanner. The maximum possible batch size will be determined by the size of the ADF hopper or the capabilities of the scanning software.

Business Process Automation (BPA)

The broader strategy of using technology to automate complex business workflows, helping to increase efficiency, reduce errors, and improve productivity. This often involves multiple departments or systems to handle customer orders, process invoices, or onboard new employees, freeing up employees to focus on more strategic work.

Business Rules Engine (BRE)

A component that applies decision logic in an automated workflow after a document is captured. For example, a BRE might automatically send invoices totaling over $10,000 to a manager for approval.

Closed-Loop Feedback Mechanism

A process that captures human corrections and feeds them back into the system to improve classification/extraction models in scanning workflows.

Cognitive Capture

The use of artificial intelligence (AI) and machine learning (ML) to automatically extract, classify, and validate information from various types of documents. Applies language understanding and context recognition to interpret meaning from documents, extract meaningful data, like dates, names, and line items, and send it to the right automated systems.

Compression

Compression refers to reducing the digital storage space that digital images, such as scanned documents, require. More compressed images take up less space but may also lose more of the detail from the original, uncompressed version, depending on the type of file and compression used.

Computer Vision

A field of AI that trains computers to “see” and process visual images in the natural world, such as scanned forms or handwritten notes. It supports intelligent document capture and validation processes.

Content Enrichment

The process of adding value to scanned documents by improving, expanding, and organizing them with additional information to make them more complete, accurate, and useful for both humans and machines.

Crop

Removing unwanted parts of an image by changing its size or shape to focus on a particular part. While all kinds of editing processes can be used to adjust an image, cropping is often the simplest. A cropped image will not necessarily retain its uncropped version’s aspect ratio (relative length and width).

Database

An organized collection of data housed within on-premises storage or in the cloud. Databases usually contain structured data, or a standardized format that can be easily marked up with schema to make it machine-readable. Databases can also contain unstructured data, like photos or audio files, but this information is much more difficult to process.

Desktop Scanner

A small device designed to convert physical documents into digital files. Desktop scanners are highly valued in home offices or small business environments. Their compact size and versatility make them an excellent fit for anyone worried about their scanner taking up too much space.

Digital Mailroom Automation

A single point of intake for all the external information that flows into an organization. Whether the documents it receives are digital or physical, a digital mailroom ensures inbound information is processed, categorized, and sent to the correct recipients. This includes the digitization of physical mail.

Digital Transformation

Moving an organization from a system of “analog” resources and processes to more efficient digital counterparts. Digital transformation allows organizations to access their information and act on it faster and more accurately, which may give them an edge over their competition.

Digitalization

Converting an analog process into a digital one. Consider an organization that requires employees to submit a paper form to a physical inbox to request paid time off. Digitalizing that process could mean replacing it with an electronic one employees can complete from their laptops or phones.

Digitization

Converting physical assets into digital ones. For instance, an organization could use a scanner to digitize years’ worth of paper records. The resulting digital files can be stored on premises, in the cloud, or a hybrid of the two.

Document

A physical or digital file that stores written information. Physical documents may also include images, while digital documents may feature a range of multimedia such as images, sound, and video files.

Document Classification

Using an application to sort, categorize, and organize documents based on their content as part of a scanning workflow. Algorithms analyze files like emails, invoices, contracts, or forms to determine their type and route them to the correct destination.  

Document Lifecycle Automation

End-to-end handling of a document through its entire lifespan with minimal human intervention, from capture (scan) through retention or deletion, including triggers, routing, and audit trails.

Document Scanner

A device that uses imaging technology to create a digital replica of a document, optimize image quality, extract relevant data,and initiate digitization workflows. Document scanners come in a variety of different formats and capabilities, from compact scanners to production scanners and are key to successful automation of tedious manual processes.

Dots Per Inch (DPI)

A measure of detail with which a device can scan or print, such as 300 DPI. Higher dots per inch means more detailed images, since more points are scanned or printed in the same amount of space. Higher DPI images also tend to take more space to store digitally.

Driver

A program that allows a computer to communicate with a peripheral, such as a scanner or printer. Some common device drivers come pre-installed on modern operating systems, while others must be downloaded from the device’s manufacturer and installed. Also known as “device driver.”

Duplex Scanning

Duplex scanners eliminate the extra time, complexity, and potential for errors in manual two-sided scanning through automation. In a non-duplex scanner, digitizing a two-sided document requires processing a document once, removing and reorienting it, and then processing it again to scan the other side.

Electronic Records Management

A system that controls the creation and use of digital records. Electronic records management (ERM) platforms also centralize numerous processes tied to how organizations interact with their records. They can handle document storage, data security, user access, and legal compliance, helping to keep records secure with minimal lift.

Envelope Separation

Automatic separation and sorting of scanned content each time an envelope is detected. Once an envelope and its contents are scanned they go into a single document file. Ideal for mailrooms, this function was developed for PaperStream Capture Pro.

Exception Management Automation

Using automated systems to identify and resolve anomalies in an automated workflow, or automatically route them for human review.

Extraction

The process of taking important data found in digitized documents and putting it into business-critical software. For example, a user would scan their patient records, and intelligent data extraction would automatically take elements like the patient’s name, address, and medical history and insert them into the hospital database for easier, long-term access.

Field

An individual element within a form designed to capture data. Forms contain numerous fields that gather names, dates, comments, and other necessary information. They can be customized to permit or restrict certain types of data; for example, a “price” field could be set to only capture number values and send an error if the user enters any other type of data. Once the user submits a form, details captured within fields are stored within a database as structured data.

Form Template Builder

A tool that simplifies the process of creating online forms. Instead of hand-coding individual form elements and structure by hand, these tools provide pre-built templates and easy-to-use customization capabilities that allow users to capture customer data, gather event registrations, process payments, and more within minutes.

High Volume Scanner

A type of document scanner built to support organizations with large daily workloads. Most high volume scanners are built to have hoppers that can hold hundreds of documents and scanning speeds that can digitize a large quantity of files in minutes.

Image Enhancement

A technology that improves clarity and recognition of scanned images. It includes a suite of capabilities that can remove background noise and image artifacts. It can also enhance, remove, or fix imperfections found on original hard copies, like wrinkles, faded text, or other noise. The end result is a smaller digital file that is clearer and easier to read, giving the user more accurate, actionable information that they can use more effectively.

Integration API [for Capture Workflows]

Application programming interfaces that allow digital document workflow systems to feed data into additional systems downstream, as part of the automation.

Intelligent Document Processing (IDP)

Entrusts rote, repetitive data management tasks to sophisticated computer algorithms. This frees up employees to do more interesting, demanding work instead. IDP can also help reduce transcription errors and save money in the long run.

Job

A collection of user-defined settings that determine how a scanner will process a stack of documents once the user presses the scan button. Users can set whether a job will process documents in color or black and white, increase or decrease the image resolution, choose paper sizing options, scan a single or both sides of a document, or even apply after scan correction.

Machine Learning (ML)

A subset of AI focused on algorithms that learn from data and improve their performance over time without being explicitly programmed. ML powers intelligent document processing, data extraction, and classification tools used in digital workflows.

Metadata

The information contained within the file of a digital scan that describes critical details about it. Metadata can reveal a lot about a particular file: When it was scanned, who scanned it, where it was scanned, along with any additional tags that can be used to organize the file and surface it during a digital search.

Natural Language Processing (NLP)

A field of AI that enables machines to understand, interpret, and respond to human language. NLP is used in intelligent document processing to extract meaning from text-based documents and emails in automated workflows.

Network Scanner

A type of scanner that connects to a personal or corporate network to aid in the data extraction, processing, and delivery of scanned images, via a network and, in some cases, without the need for a PC.

Neural Network

A type of machine learning model inspired by the structure of the human brain. Neural networks are often used in document analysis to identify patterns in text, handwriting, or layout.

Operational Intelligence Dashboard

A real-time interface showing KPIs of document-centric workflows to support automation optimization. Examples include the number of documents processed, error rate, exceptions handled, etc.

Optical Character Recognition (OCR)

Technology that digitizes text to make it searchable, indexable, and even editable. OCR allows users to convert printed text into machine-readable, editable data once the printed text has been scanned. It is a component of document digitization and the first step in automating information workflows.

Page Separation

The process during a scan that separates pages within a single job into multiple distinct files. Page separation processes use simplified rules, either placing each page into its own file or identifying basic markers, such as blank pages or separator sheets with barcodes, to separate page groupings. PaperStream Capture Pro can use file identification for this, too.

Planetary Scanner

A device purpose-built for scanning rare, old, or oversized documents, books, or files. The document is placed on a flat surface, and a mounted camera takes a scan from above. These devices are often found in museums and other archival facilities that need to make digital copies without damaging the original.

Production Scanner

A type of high volume scanner designed for heavy-duty use. Some production scanners can hold up to 750 pages — 15 times more pieces of paper than standard document scanners. Large enterprises and industries that rely on high volumes of physical documents can leverage their capable feature set.

Profile

A collection of scanner image enhancement settings that determine how software processes scanned documents based on the type of file scanned. Profiles are immensely helpful for scanning standardized forms quickly and with minimal fuss. For example, if a user needs to scan a W-9 tax form, all they need to do is select the W-9 profile, and the scanner will take care of the rest.

Pull Scanning

A scanning process initiated at the target computer. The computer “pulls” the digitized documents from the scanner and stores them on its hard drive using communication standards like TWAIN.

Push Scanning

A scanning process initiated at the scanner. The user selects the destination point from the scanner, whether that’s a computer on the network or the cloud. The scanner then “pushes” the digitized documents to the chosen destination.

Quality Control (QC)

The act of checking scanned files to ensure final files are named and organized correctly, free from physical blemishes and inaccuracies, and are aligned with necessary compliance and legal requirements. Quality control relies on a series of manual and automated checks to verify that the final scan is as close to the original physical document as possible.

Recognition Technology

A suite of tools built to recognize patterns of symbols (usually letters and numbers) to make content found on physical documents usable within a digital environment. Includes tools like optical character recognition, form recognition technology, and intelligent character recognition, and barcode recognition, which make scans of different forms of typed and handwritten text editable, extractable, and efficiently deliverable.

Robotic Process Automation (RPA) [for Document Workflows]

Uses software robots to automate repetitive, rule-based tasks typically done by humans, such as data entry, data reconciliation, and file transfers.

Tagging

The act of updating the file’s metadata with keywords that improve accessibility within document management systems. Tools like PaperStream Capture Pro feature advanced tagging capabilities enhance workflows by making documents searchable and organized with minimal lift.

WIA

A driver that standardizes scanning processes running on Microsoft Windows. Short for Windows Image Acquisition, WIA operates at the layer between the scanning hardware and the application. This means that users can expect the same functionality and results regardless of the specific equipment they’re using.

Work Orchestration Engine

A software application that automates, manages, and executes a series of tasks in a specific order across different systems to achieve a business or technical goal. The software component triggers, routes, and monitors automated tasks (including document scanning triggers) in a given process.