× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Hi Don,

it depends.

From the "command line" you can use Ghostscript. The latest PASE version from the IBM repository should be OK, with

gs
-DEVICE=txtwrite
-o output.txt
input.pdf

you should get an output - but you maybe have to experiment with the encoding, as this is not fixed in PDF documents.

From RPG I would always use PDFbox (https://pdfbox.apache.org/) - with this, you have complete control over the PDF processing.

But you can also use PDFbox from the command line using

java
-jar pdfbox-app-3.y.z.jar
export:text
[OPTIONS]
-i=<infile>

But make sure, to use a reasonable new 64-bit JVM - I'm using Java 21 64-bit, and it's quite fast - in fact after the initial JVM loading, Java is near native performance.

I had the task to split PDF files - up to 5 or 6 pages, Ghostscript (PASE) was faster - but with 10 or more pages, PDFbox (Java 21 65-bit) was always faster. And it got even better, if more than one file was to split in the same Job/Session - PDFbox was always faster, as the JVM stayed in memory and even the JAR file was kept loaded.

So as I said - it really depends on what you want to do exactly - and how. I.e. if this text should go into a database table, I would recommend going the RPG/Java/PDFbox way.

I'm in the process to write a bit about RPG, Java and PDFbox in the nexts weeks on my blog. If you like I can give you sneak peek of it. It's a bit overwhelming at the beginning with JNI, JVM initialization and RPG to Java prototypes - but once you got it, pack everting you need into a service program, and be happy.

HTH and kind regards,
Daniel


Am 16.06.2025 um 10:37 schrieb Patrik Schindler <poc@xxxxxxxxxx>:
Hello Don,

Am 16.06.2025 um 09:12 schrieb Don Brown via MIDRANGE-L <midrange-l@xxxxxxxxxxxxxxxxxx>:

1. Does anyone have a recommended solution to achieve converting a pdf to text. I am after a php or native rpg ish solution. Not python please.

I'd use the pdftotext command from the poppler-utils package in PASE. I assume the poppler-utils package is available for installation via yum.

https://en.wikipedia.org/wiki/Poppler_(software)

:wq! PoC

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related questions.

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.