Is it time for a new DMS?

November 3, 2024 @ 9:08 pm

I’ve been using PaperPort Professional as my document management system (DMS) since I first started scanning documents (instead of keeping a file cabinet full of paper) back in 1999. Over those past 25 (!!) years I’ve used multiple versions, gone through multiple ownership changes, and dealt with technical issues, but I’ve stuck with it. The “current” version, 14.7, was released in 2019, but that was really just a minor facelift to add the Kofax branding to v14.5 which was released in 2015 (and since then Kofax has been renamed to Tungsten Automation). That’s right, the latest major release of this software is almost 10 years old.

It’s odd that a technophile like myself wouldn’t have looked for something newer and shiny by now but PaperPort still works and I’ve built my document storage and archive folder organization around it for (it’s weird saying it this way) a quarter of a century. But the end of 2024 is approaching and traditionally over the holiday break I’ve done some sort of tech project (this past year was switching registrars, the year before that was leaving LastPass for Vaultwarden). Doing a little planning ahead, I’m thinking maybe it’s time for a DMS switch.

Taking a hard look at my PaperPort usage, I’m realizing that I really just use it for scanning and minor PDF edits to the resulting scanned files (combining pages, cropping, making notes/annotations, etc.). Because its built-in All-in-One search is so bad/outdated, I’ve never used it for searching/indexing. My folder organization (which is at the OS-level on my NAS, not something specific to PaperPort) helps with finding documents at a high level. If I do need to find a document, I can typically find it myself because either a) I know its location (a 2023 bank statement, for example) or I can search for the filename (typically by using the ‘find’ command in Linux). This works for me 90% of the time, but I have to admit having OCR‘ed full-text search capabilities could be nice. My wife is relatively familiar with the folder structure setup, but having a better, more user-friendly way to search for archived files couldn’t be a bad thing.

My initial searching led me to FileCenter, which on the surface seems almost like a direct PaperPort replacement: an all-in-one piece of (Windows) software that handles the three components I need: scanning, editing, and organization. It’d cost $200 for the Professional Edition which includes the OCR and editing features that I require. I played around with the free trial and it’s pretty much just like PaperPort, only a little more modern and maintained. The more I tinkered with it made me realize that if I wasn’t going to make big changes to how I work with my documents, it probably wasn’t worth $200 … I could just keep using PaperPort until it finally doesn’t work anymore. And rather than trying to find the all-in-one replacement, maybe splitting those three components makes more sense.

Scanning

For scanning, I could just use the utility that came with my Canon MF455dw multi-function printer, but instead my searches led me to NAPS2 (Not Another PDF Scanner). It’s a free, cross-platform piece of open-source software that can use WIA or TWAIN drivers, has some basic editing features (crop, rotate, etc.), and built-in OCR. It worked perfectly with my scanner, all I did was tweak some of the scanning profiles to match what I had in PaperPort (black-and-white document, grayscale document, color document, etc.).

Editing

My PDF editing typically consists of either highlighting or annotations (notes), both of which can be done in Adobe’s Acrobat Reader (or most free PDF readers). However, in PaperPort, I also do a lot of “stacking” (combining PDFs and then re-ordering the pages). For example, when I pay a medical bill online, I will combine the scan of the bill I received in the physical mail with the e-mail receipt showing I paid it, that way both items are in a single document. Adobe Acrobat (Standard or Pro) seems too expensive for my meager needs and there are free online PDF merging sites, but I don’t trust uploading my sensitive scanned documents to an unknown cloud service. I found some free open-source projects like PDFTK Builder Enhanced and pdfarranger which worked fine, but they only covered the stacking piece of my requirement and not the highlighting or notes. Then I found PDFgear: “all-in-one PDF Software for Windows … read, edit, convert, merge, and sign PDF files for completely free and without signing up.” It seemed to be good to be true, but third-party reviews seemed to support the claims. After giving it a quick test drive, I was pretty impressed: I can add, delete, and re-order pages in a PDF, and add highlighting and notes (and more!) in a single tool. I’ve even removed Acrobat Reader from my PC and am now using PDFgear as my primary PDF file viewer.

Organization

So now that I have potential new ways of scanning and editing my documents, what about the meat of any DMS: how to organize everything so things can be found and retrieved when needed? My research pointed me to paperless-ngx (although later I also found Docspell, which is similar). paperless-ngx is “a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.” Sounds pretty much what I’m looking for. It can be installed using Docker, which means it fits in perfectly to my existing Synology setup. Using one of the sample docker compose files, I easily had it up and running on the NAS in short order and started learning my way around.

Ok, wow. This is going to take a little work to get used to and figure out. But my biggest hurdle initially is how paperless handles documents. It “consumes” them and makes a copy of the original in its own (by default, flat) folder structure. The way it’s designed, you’re not really meant to be browsing the folder structure directly like I do now. You’re supposed to do everything through the paperless GUI. This is probably fine if you’re starting from scratch, but with 25 years of folders and documents meticulously organized, I’m struggling with how to replicate that in paperless and trying to decide if I even should. There are ways to specify a folder structure and file naming convention for paperless to use with consumed documents, but it’s going to take some effort on my part to see if I can architect a new folder structure that makes sense outside of paperless. Not to mention the extra work of adding correspondents, document types, and tags.

I also tend to put non-document files in my archive folders. These could be images or videos (like photos related to an insurance claim), or my various diagrams made in draw.io which are saved in XML format. Supported image types are converted into PDF documents, and with the Tika/Gotenberg option, paperless will convert Office documents and e-mails into searchable PDFs for its database. But if a file isn’t one of those types, you can’t put it into paperless (and it will ignore these files in the consume directory). Granted these aren’t the types of files I’m typically searching for, but again the inertia of how I’m used to organizing my files and folders is tough to overcome.

This is definitely going to be a bigger project than what I could complete over the holiday break, so I’m glad I’ve taken the first steps now. I have a chance to “start clean” with my 2025 documents, but still need to figure out how I want paperless to consume all of my historical stuff so that a) it still makes organizational sense and b) minimizes the amount of duplicate copies of all these files and folders I have on the NAS and in my backups.

More to come!

chmod 644

World Readable: a personal blog about anything that comes to mind for anybody who cares to read it.

Is it time for a new DMS?

Scanning

Editing

Organization

Leave a Reply Cancel reply

Scanning

Editing

Organization

Top Related Posts:

Leave a Reply Cancel reply