Hi Daniel,
While I have this working I would like to see you blog article.
With PDFbox I did have to include the -sort switch to get the text back in
a more meaningful order.
Have you noticed any difference in Ghostscript to PDFbox (or any other
tools) with the accuracy/completeness of the text returned ?
Thanks
Don
Don Brown
Senior Consultant
[1]OneTeam IT Pty Ltd
P: 1300 088 400
-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Daniel Gross
Sent: Monday, 16 June 2025 7:00 PM
To: midrange-l@xxxxxxxxxxxxxxxxxx
Subject: Re: Convert PDF to text
Hi Don,
it depends.
From the "command line" you can use Ghostscript. The latest PASE version
from the IBM repository should be OK, with
gs
-DEVICE=txtwrite
-o output.txt
input.pdf
you should get an output - but you maybe have to experiment with the
encoding, as this is not fixed in PDF documents.
From RPG I would always use PDFbox ([2]
https://pdfbox.apache.org/) - with
this, you have complete control over the PDF processing.
But you can also use PDFbox from the command line using
java
-jar pdfbox-app-3.y.z.jar
export:text
[OPTIONS]
-i=<infile>
But make sure, to use a reasonable new 64-bit JVM - I'm using Java 21
64-bit, and it's quite fast - in fact after the initial JVM loading, Java
is near native performance.
I had the task to split PDF files - up to 5 or 6 pages, Ghostscript (PASE)
was faster - but with 10 or more pages, PDFbox (Java 21 65-bit) was always
faster. And it got even better, if more than one file was to split in the
same Job/Session - PDFbox was always faster, as the JVM stayed in memory
and even the JAR file was kept loaded.
So as I said - it really depends on what you want to do exactly - and how.
I.e. if this text should go into a database table, I would recommend going
the RPG/Java/PDFbox way.
I'm in the process to write a bit about RPG, Java and PDFbox in the nexts
weeks on my blog. If you like I can give you sneak peek of it. It's a bit
overwhelming at the beginning with JNI, JVM initialization and RPG to Java
prototypes - but once you got it, pack everting you need into a service
program, and be happy.
HTH and kind regards,
Daniel
> Am 16.06.2025 um 10:37 schrieb Patrik Schindler <poc@xxxxxxxxxx>:
> Hello Don,
>
> Am 16.06.2025 um 09:12 schrieb Don Brown via MIDRANGE-L
<midrange-l@xxxxxxxxxxxxxxxxxx>:
>
>> 1. Does anyone have a recommended solution to achieve converting a pdf
to text. I am after a php or native rpg ish solution. Not python please.
>
> I'd use the pdftotext command from the poppler-utils package in PASE. I
assume the poppler-utils package is available for installation via yum.
>
> [3]
https://en.wikipedia.org/wiki/Poppler_(software)
>
> :wq! PoC
>
> --
> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
> list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To
> subscribe, unsubscribe, or change list options,
> visit: [4]
https://lists.midrange.com/mailman/listinfo/midrange-l
> or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
> Before posting, please take a moment to review the archives at
> [5]
https://archive.midrange.com/midrange-l.
>
> Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: [6]
https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
[7]
https://archive.midrange.com/midrange-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
--
Message protected by MailGuard: e-mail anti-virus, anti-spam and content
filtering.
[8]
https://www.mailguard.com.au
References
Visible links
1.
https://www.oneteamit.com.au/
2.
https://pdfbox.apache.org/)
3.
https://en.wikipedia.org/wiki/Poppler_(software)
4.
https://lists.midrange.com/mailman/listinfo/midrange-l
5.
https://archive.midrange.com/midrange-l.
6.
https://lists.midrange.com/mailman/listinfo/midrange-l
7.
https://archive.midrange.com/midrange-l.
8.
https://www.mailguard.com.au/
As an Amazon Associate we earn from qualifying purchases.