× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Hi Don,

no - both extract the same text parts, as both use the same data structures inside the PDF.

The only difference is, when working with PDFbox procedures directly from RPG (using *JAVA objects and prototypes) you have complete control over the whole process.

If anything fails you have a Java exception, that you can handle, and recover from. This is much easier, than checking a sub-process for success or failure.

And you have complete control over the JVM start - which means, that you can control when the JVM is loaded - in my case, it's a background job, which is waiting for work - and it's loading the JVM right from the start, so it doesn't have any delay when doing its work.

When using PDFbox as a command line utility, there is practically no difference.

I think it will be in July with the blog post - but I will post at LinkedIn, and hopefully I remember to send you a mail, when it's out.

Regards,
Daniel


Am 23.06.2025 um 01:11 schrieb Don Brown via MIDRANGE-L <midrange-l@xxxxxxxxxxxxxxxxxx>:

 Hi Daniel,

While I have this working I would like to see you blog article.

With PDFbox I did have to include the -sort switch to get the text back in
a more meaningful order.

Have you noticed any difference in Ghostscript to PDFbox (or any other
tools) with the accuracy/completeness of the text returned ?

Thanks
Don



Don Brown

Senior Consultant

[1]OneTeam IT Pty Ltd
P: 1300 088 400

-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Daniel Gross
Sent: Monday, 16 June 2025 7:00 PM
To: midrange-l@xxxxxxxxxxxxxxxxxx
Subject: Re: Convert PDF to text

Hi Don,

it depends.

From the "command line" you can use Ghostscript. The latest PASE version
from the IBM repository should be OK, with

gs
-DEVICE=txtwrite
-o output.txt
input.pdf

you should get an output - but you maybe have to experiment with the
encoding, as this is not fixed in PDF documents.

From RPG I would always use PDFbox ([2]https://pdfbox.apache.org/) - with
this, you have complete control over the PDF processing.

But you can also use PDFbox from the command line using

java
-jar pdfbox-app-3.y.z.jar
export:text
[OPTIONS]
-i=<infile>

But make sure, to use a reasonable new 64-bit JVM - I'm using Java 21
64-bit, and it's quite fast - in fact after the initial JVM loading, Java
is near native performance.

I had the task to split PDF files - up to 5 or 6 pages, Ghostscript (PASE)
was faster - but with 10 or more pages, PDFbox (Java 21 65-bit) was always
faster. And it got even better, if more than one file was to split in the
same Job/Session - PDFbox was always faster, as the JVM stayed in memory
and even the JAR file was kept loaded.

So as I said - it really depends on what you want to do exactly - and how.
I.e. if this text should go into a database table, I would recommend going
the RPG/Java/PDFbox way.

I'm in the process to write a bit about RPG, Java and PDFbox in the nexts
weeks on my blog. If you like I can give you sneak peek of it. It's a bit
overwhelming at the beginning with JNI, JVM initialization and RPG to Java
prototypes - but once you got it, pack everting you need into a service
program, and be happy.

HTH and kind regards,
Daniel

Am 16.06.2025 um 10:37 schrieb Patrik Schindler <poc@xxxxxxxxxx>:
Hello Don,
Am 16.06.2025 um 09:12 schrieb Don Brown via MIDRANGE-L
<midrange-l@xxxxxxxxxxxxxxxxxx>:
1. Does anyone have a recommended solution to achieve converting a pdf
to text. I am after a php or native rpg ish solution. Not python please.
I'd use the pdftotext command from the poppler-utils package in PASE. I
assume the poppler-utils package is available for installation via yum.
[3]https://en.wikipedia.org/wiki/Poppler_(software)
:wq! PoC
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To
subscribe, unsubscribe, or change list options,
visit: [4]https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
[5]https://archive.midrange.com/midrange-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: [6]https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
[7]https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

--
Message protected by MailGuard: e-mail anti-virus, anti-spam and content
filtering.
[8]https://www.mailguard.com.au

References

Visible links
1. https://www.oneteamit.com.au/
2. https://pdfbox.apache.org/)
3. https://en.wikipedia.org/wiki/Poppler_(software)
4. https://lists.midrange.com/mailman/listinfo/midrange-l
5. https://archive.midrange.com/midrange-l.
6. https://lists.midrange.com/mailman/listinfo/midrange-l
7. https://archive.midrange.com/midrange-l.
8. https://www.mailguard.com.au/
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related questions.

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.