Thanks Daniel
Cheers
Don
Don Brown
Senior Consultant
[1]OneTeam IT Pty Ltd
P: 1300 088 400
-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Daniel Gross
Sent: Monday, 23 June 2025 2:51 PM
To: midrange-l@xxxxxxxxxxxxxxxxxx
Subject: Re: Convert PDF to text
Hi Don,
no - both extract the same text parts, as both use the same data
structures inside the PDF.
The only difference is, when working with PDFbox procedures directly from
RPG (using *JAVA objects and prototypes) you have complete control over
the whole process.
If anything fails you have a Java exception, that you can handle, and
recover from. This is much easier, than checking a sub-process for success
or failure.
And you have complete control over the JVM start - which means, that you
can control when the JVM is loaded - in my case, it's a background job,
which is waiting for work - and it's loading the JVM right from the start,
so it doesn't have any delay when doing its work.
When using PDFbox as a command line utility, there is practically no
difference.
I think it will be in July with the blog post - but I will post at
LinkedIn, and hopefully I remember to send you a mail, when it's out.
Regards,
Daniel
> Am 23.06.2025 um 01:11 schrieb Don Brown via MIDRANGE-L
<midrange-l@xxxxxxxxxxxxxxxxxx>:
>
> Hi Daniel,
>
> While I have this working I would like to see you blog article.
>
> With PDFbox I did have to include the -sort switch to get the text
> back in a more meaningful order.
>
> Have you noticed any difference in Ghostscript to PDFbox (or any
> other
> tools) with the accuracy/completeness of the text returned ?
>
> Thanks
> Don
>
>
>
> Don Brown
>
> Senior Consultant
>
> [1]OneTeam IT Pty Ltd
> P: 1300 088 400
>
> -----Original Message-----
> From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
> Daniel Gross
> Sent: Monday, 16 June 2025 7:00 PM
> To: midrange-l@xxxxxxxxxxxxxxxxxx
> Subject: Re: Convert PDF to text
>
> Hi Don,
>
> it depends.
>
> From the "command line" you can use Ghostscript. The latest PASE
> version from the IBM repository should be OK, with
>
> gs
> -DEVICE=txtwrite
> -o output.txt
> input.pdf
>
> you should get an output - but you maybe have to experiment with the
> encoding, as this is not fixed in PDF documents.
>
> From RPG I would always use PDFbox ([2][2]
https://pdfbox.apache.org/) -
> with this, you have complete control over the PDF processing.
>
> But you can also use PDFbox from the command line using
>
> java
> -jar pdfbox-app-3.y.z.jar
> export:text
> [OPTIONS]
> -i=<infile>
>
> But make sure, to use a reasonable new 64-bit JVM - I'm using Java 21
> 64-bit, and it's quite fast - in fact after the initial JVM loading,
> Java is near native performance.
>
> I had the task to split PDF files - up to 5 or 6 pages, Ghostscript
> (PASE) was faster - but with 10 or more pages, PDFbox (Java 21
> 65-bit) was always faster. And it got even better, if more than one
> file was to split in the same Job/Session - PDFbox was always faster,
> as the JVM stayed in memory and even the JAR file was kept loaded.
>
> So as I said - it really depends on what you want to do exactly - and
how.
> I.e. if this text should go into a database table, I would recommend
> going the RPG/Java/PDFbox way.
>
> I'm in the process to write a bit about RPG, Java and PDFbox in the
> nexts weeks on my blog. If you like I can give you sneak peek of it.
> It's a bit overwhelming at the beginning with JNI, JVM initialization
> and RPG to Java prototypes - but once you got it, pack everting you
> need into a service program, and be happy.
>
> HTH and kind regards,
> Daniel
>
>> Am 16.06.2025 um 10:37 schrieb Patrik Schindler <poc@xxxxxxxxxx>:
>> Hello Don,
>>> Am 16.06.2025 um 09:12 schrieb Don Brown via MIDRANGE-L
>> <midrange-l@xxxxxxxxxxxxxxxxxx>:
>>> 1. Does anyone have a recommended solution to achieve converting a
>>> pdf
> to text. I am after a php or native rpg ish solution. Not python please.
>> I'd use the pdftotext command from the poppler-utils package in PASE.
>> I
> assume the poppler-utils package is available for installation via yum.
>> [3][3]
https://en.wikipedia.org/wiki/Poppler_(software)
>> :wq! PoC
>> --
>> This is the Midrange Systems Technical Discussion (MIDRANGE-L)
>> mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
>> To subscribe, unsubscribe, or change list options,
>> visit: [4][4]
https://lists.midrange.com/mailman/listinfo/midrange-l
>> or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
>> Before posting, please take a moment to review the archives at
>> [5][5]
https://archive.midrange.com/midrange-l.
>> Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription
>> related
> questions.
> --
> This is the Midrange Systems Technical Discussion (MIDRANGE-L)
> mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
> To subscribe, unsubscribe, or change list options,
> visit: [6][6]
https://lists.midrange.com/mailman/listinfo/midrange-l
> or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
> Before posting, please take a moment to review the archives at
> [7][7]
https://archive.midrange.com/midrange-l.
>
> Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription
> related questions.
>
> --
> Message protected by MailGuard: e-mail anti-virus, anti-spam and
> content filtering.
> [8][8]
https://www.mailguard.com.au
>
> References
>
> Visible links
> 1. [9]
https://www.oneteamit.com.au/
> 2. [10]
https://pdfbox.apache.org/)
> 3. [11]
https://en.wikipedia.org/wiki/Poppler_(software)
> 4. [12]
https://lists.midrange.com/mailman/listinfo/midrange-l
> 5. [13]
https://archive.midrange.com/midrange-l.
> 6. [14]
https://lists.midrange.com/mailman/listinfo/midrange-l
> 7. [15]
https://archive.midrange.com/midrange-l.
> 8. [16]
https://www.mailguard.com.au/
> --
> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
> list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To
> subscribe, unsubscribe, or change list options,
> visit: [17]
https://lists.midrange.com/mailman/listinfo/midrange-l
> or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
> Before posting, please take a moment to review the archives at
> [18]
https://archive.midrange.com/midrange-l.
>
> Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: [19]
https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
[20]
https://archive.midrange.com/midrange-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
--
Message protected by MailGuard: e-mail anti-virus, anti-spam and content
filtering.
[21]
https://www.mailguard.com.au
References
Visible links
1.
https://www.oneteamit.com.au/
2.
https://pdfbox.apache.org/)
3.
https://en.wikipedia.org/wiki/Poppler_(software)
4.
https://lists.midrange.com/mailman/listinfo/midrange-l
5.
https://archive.midrange.com/midrange-l.
6.
https://lists.midrange.com/mailman/listinfo/midrange-l
7.
https://archive.midrange.com/midrange-l.
8.
https://www.mailguard.com.au/
9.
https://www.oneteamit.com.au/
10.
https://pdfbox.apache.org/)
11.
https://en.wikipedia.org/wiki/Poppler_(software)
12.
https://lists.midrange.com/mailman/listinfo/midrange-l
13.
https://archive.midrange.com/midrange-l.
14.
https://lists.midrange.com/mailman/listinfo/midrange-l
15.
https://archive.midrange.com/midrange-l.
16.
https://www.mailguard.com.au/
17.
https://lists.midrange.com/mailman/listinfo/midrange-l
18.
https://archive.midrange.com/midrange-l.
19.
https://lists.midrange.com/mailman/listinfo/midrange-l
20.
https://archive.midrange.com/midrange-l.
21.
https://www.mailguard.com.au/
As an Amazon Associate we earn from qualifying purchases.