× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Hi Daniel,

While I have this working I would like to see you blog article.

With PDFbox I did have to include the -sort switch to get the text back in
a more meaningful order.

Have you noticed any difference in Ghostscript to PDFbox (or any other
tools) with the accuracy/completeness of the text returned ?

Thanks
Don

 

Don Brown

Senior Consultant
 
[1]OneTeam IT Pty Ltd
P: 1300 088 400

-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Daniel Gross
Sent: Monday, 16 June 2025 7:00 PM
To: midrange-l@xxxxxxxxxxxxxxxxxx
Subject: Re: Convert PDF to text

Hi Don,

it depends.

From the "command line" you can use Ghostscript. The latest PASE version
from the IBM repository should be OK, with

gs
-DEVICE=txtwrite
-o output.txt
input.pdf

you should get an output - but you maybe have to experiment with the
encoding, as this is not fixed in PDF documents.

From RPG I would always use PDFbox ([2]https://pdfbox.apache.org/) - with
this, you have complete control over the PDF processing.

But you can also use PDFbox from the command line using

java
-jar pdfbox-app-3.y.z.jar
export:text
[OPTIONS]
-i=<infile>

But make sure, to use a reasonable new 64-bit JVM - I'm using Java 21
64-bit, and it's quite fast - in fact after the initial JVM loading, Java
is near native performance.

I had the task to split PDF files - up to 5 or 6 pages, Ghostscript (PASE)
was faster - but with 10 or more pages, PDFbox (Java 21 65-bit) was always
faster. And it got even better, if more than one file was to split in the
same Job/Session - PDFbox was always faster, as the JVM stayed in memory
and even the JAR file was kept loaded.

So as I said - it really depends on what you want to do exactly - and how.
I.e. if this text should go into a database table, I would recommend going
the RPG/Java/PDFbox way.

I'm in the process to write a bit about RPG, Java and PDFbox in the nexts
weeks on my blog. If you like I can give you sneak peek of it. It's a bit
overwhelming at the beginning with JNI, JVM initialization and RPG to Java
prototypes - but once you got it, pack everting you need into a service
program, and be happy.

HTH and kind regards,
Daniel

> Am 16.06.2025 um 10:37 schrieb Patrik Schindler <poc@xxxxxxxxxx>:
> Hello Don,
>
> Am 16.06.2025 um 09:12 schrieb Don Brown via MIDRANGE-L
<midrange-l@xxxxxxxxxxxxxxxxxx>:
>
>> 1. Does anyone have a recommended solution to achieve converting a pdf
to text. I am after a php or native rpg ish solution. Not python please.
>
> I'd use the pdftotext command from the poppler-utils package in PASE. I
assume the poppler-utils package is available for installation via yum.
>
> [3]https://en.wikipedia.org/wiki/Poppler_(software)
>
> :wq! PoC
>
> --
> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
> list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To
> subscribe, unsubscribe, or change list options,
> visit: [4]https://lists.midrange.com/mailman/listinfo/midrange-l
> or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
> Before posting, please take a moment to review the archives at
> [5]https://archive.midrange.com/midrange-l.
>
> Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: [6]https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
[7]https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

--
Message protected by MailGuard: e-mail anti-virus, anti-spam and content
filtering.
[8]https://www.mailguard.com.au

References

Visible links
1. https://www.oneteamit.com.au/
2. https://pdfbox.apache.org/)
3. https://en.wikipedia.org/wiki/Poppler_(software)
4. https://lists.midrange.com/mailman/listinfo/midrange-l
5. https://archive.midrange.com/midrange-l.
6. https://lists.midrange.com/mailman/listinfo/midrange-l
7. https://archive.midrange.com/midrange-l.
8. https://www.mailguard.com.au/

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.