|
Am 23.06.2025 um 01:11 schrieb Don Brown via MIDRANGE-L <midrange-l@xxxxxxxxxxxxxxxxxx>:
Hi Daniel,
While I have this working I would like to see you blog article.
With PDFbox I did have to include the -sort switch to get the text back in
a more meaningful order.
Have you noticed any difference in Ghostscript to PDFbox (or any other
tools) with the accuracy/completeness of the text returned ?
Thanks
Don
Don Brown
Senior Consultant
[1]OneTeam IT Pty Ltd
P: 1300 088 400
-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Daniel Gross
Sent: Monday, 16 June 2025 7:00 PM
To: midrange-l@xxxxxxxxxxxxxxxxxx
Subject: Re: Convert PDF to text
Hi Don,
it depends.
From the "command line" you can use Ghostscript. The latest PASE version
from the IBM repository should be OK, with
gs
-DEVICE=txtwrite
-o output.txt
input.pdf
you should get an output - but you maybe have to experiment with the
encoding, as this is not fixed in PDF documents.
From RPG I would always use PDFbox ([2]https://pdfbox.apache.org/) - with
this, you have complete control over the PDF processing.
But you can also use PDFbox from the command line using
java
-jar pdfbox-app-3.y.z.jar
export:text
[OPTIONS]
-i=<infile>
But make sure, to use a reasonable new 64-bit JVM - I'm using Java 21
64-bit, and it's quite fast - in fact after the initial JVM loading, Java
is near native performance.
I had the task to split PDF files - up to 5 or 6 pages, Ghostscript (PASE)
was faster - but with 10 or more pages, PDFbox (Java 21 65-bit) was always
faster. And it got even better, if more than one file was to split in the
same Job/Session - PDFbox was always faster, as the JVM stayed in memory
and even the JAR file was kept loaded.
So as I said - it really depends on what you want to do exactly - and how.
I.e. if this text should go into a database table, I would recommend going
the RPG/Java/PDFbox way.
I'm in the process to write a bit about RPG, Java and PDFbox in the nexts
weeks on my blog. If you like I can give you sneak peek of it. It's a bit
overwhelming at the beginning with JNI, JVM initialization and RPG to Java
prototypes - but once you got it, pack everting you need into a service
program, and be happy.
HTH and kind regards,
Daniel
Am 16.06.2025 um 10:37 schrieb Patrik Schindler <poc@xxxxxxxxxx>:to text. I am after a php or native rpg ish solution. Not python please.
Hello Don,
Am 16.06.2025 um 09:12 schrieb Don Brown via MIDRANGE-L<midrange-l@xxxxxxxxxxxxxxxxxx>:
1. Does anyone have a recommended solution to achieve converting a pdf
I'd use the pdftotext command from the poppler-utils package in PASE. Iassume the poppler-utils package is available for installation via yum.
[3]https://en.wikipedia.org/wiki/Poppler_(software)questions.
:wq! PoC
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To
subscribe, unsubscribe, or change list options,
visit: [4]https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
[5]https://archive.midrange.com/midrange-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: [6]https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
[7]https://archive.midrange.com/midrange-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
--
Message protected by MailGuard: e-mail anti-virus, anti-spam and content
filtering.
[8]https://www.mailguard.com.au
References
Visible links
1. https://www.oneteamit.com.au/
2. https://pdfbox.apache.org/)
3. https://en.wikipedia.org/wiki/Poppler_(software)
4. https://lists.midrange.com/mailman/listinfo/midrange-l
5. https://archive.midrange.com/midrange-l.
6. https://lists.midrange.com/mailman/listinfo/midrange-l
7. https://archive.midrange.com/midrange-l.
8. https://www.mailguard.com.au/
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related questions.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.