|
I have installed the tika java application from apache.org on our
power server running IBM I version 7.2 and on my PC running Windows
10. We get a word document from one of our vendors. We need to extract
the text from it to process the document.
I have this working on my PC by using the following command:
C:\Users\gord\Downloads>java -jar tika-app-1.14.jar -t
"C:\Users\gord\Documents\Goralta Invoice.doc" >
"C:\Users\gord\Documents\Goralta Invoice.txt"
The next step was to get the same result on our Power Server. I ran
the following command first
JAVA CLASS('/java/Tika/tika-app-1.14.jar') PARM('-t'
'/java/Tika/Goralta
Invoice.doc')
It displays the text it extracted on the screen. Great. So I know the
java program is working on our system. The last step is to get the
results to be ported to a file.
I have tried many different combinations but I cannot get it to work.
Here is an example of what I have tried and the error we are getting.
JAVA CLASS('/java/Tika/tika-app-1.14.jar') PARM('-t'
'"/java/Tika/Goralta Invoice.doc" > "/java/Tika/Goralta Invoice.txt"')
Exception in thread "main" java.net.MalformedURLException: no protocol:
"/java/Tika/Goralta Invoice.doc" > "/java/Tika/Goralta In voice.txt"
at java.net.URL.<init>(URL.java:609)
at java.net.URL.<init>(URL.java:506)
at java.net.URL.<init>(URL.java:455)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:472)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Java program completed with exit code 1
I am not a Java programmer. We use Java tools that make things easier
for us as RPG programmers. Any help you can provide will be very much
appreciated.
Gordon Schneider
403-236-0601
Trans Am Piping Products Ltd.
--
This is the Java Programming on and around the IBM i (JAVA400-L)
mailing list To post a message email: JAVA400-L@xxxxxxxxxxxx To
subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/java400-l
or email: JAVA400-L-request@xxxxxxxxxxxx Before posting, please take a
moment to review the archives at
http://archive.midrange.com/java400-l.
JAVA CLASS('/java/Tika/tika-app-1.14.jar') PARM('-t'
'/java/Tika/Goralta Invoice.doc')
Jack
I tried QSH and the command works. The only problem is that my output has
unrecognizable characters instead of the text I was looking for. Is there
something else I can do to get the text in a proper format?
David
I see your option to output the text to a database file. That could work
and then I would write an RPG program to read the data and parse out what I
need. I will keep that solution as an option. Thanks
Gord
-----Original Message-----
From: JAVA400-L [mailto:java400-l-bounces@xxxxxxxxxxxx] On Behalf Of
java400-l-request@xxxxxxxxxxxx
Sent: January 20, 2017 11:00 AM
To: java400-l@xxxxxxxxxxxx
Subject: JAVA400-L Digest, Vol 15, Issue 13
Send JAVA400-L mailing list submissions to
java400-l@xxxxxxxxxxxx
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.midrange.com/mailman/listinfo/java400-l
or, via email, send a message with subject or body 'help' to
java400-l-request@xxxxxxxxxxxx
You can reach the person managing the list at
java400-l-owner@xxxxxxxxxxxx
When replying, please edit your Subject line so it is more specific than
"Re: Contents of JAVA400-L digest..."
Today's Topics:
1. Output results from tika to a file (Gordon Schneider)
2. Re: Output results from tika to a file (Jack Woehr)
3. Re: Output results from tika to a file (David Gibbs)
----------------------------------------------------------------------
message: 1
date: Fri, 20 Jan 2017 17:29:18 +0000
from: Gordon Schneider <schneiderg@xxxxxxxxxxxxxxxxx>
subject: Output results from tika to a file
I have installed the tika java application from apache.org on our power
server running IBM I version 7.2 and on my PC running Windows 10. We get a
word document from one of our vendors. We need to extract the text from it
to process the document.
I have this working on my PC by using the following command:
C:\Users\gord\Downloads>java -jar tika-app-1.14.jar -t
"C:\Users\gord\Documents\Goralta Invoice.doc" > "C:\Users\gord\Documents\Goralta
Invoice.txt"
The next step was to get the same result on our Power Server. I ran the
following command first
JAVA CLASS('/java/Tika/tika-app-1.14.jar') PARM('-t' '/java/Tika/Goralta
Invoice.doc')
It displays the text it extracted on the screen. Great. So I know the java
program is working on our system. The last step is to get the results to be
ported to a file.
I have tried many different combinations but I cannot get it to work. Here
is an example of what I have tried and the error we are getting.
JAVA CLASS('/java/Tika/tika-app-1.14.jar') PARM('-t' '"/java/Tika/Goralta
Invoice.doc" > "/java/Tika/Goralta Invoice.txt"')
Exception in thread "main" java.net.MalformedURLException: no protocol:
"/java/Tika/Goralta Invoice.doc" > "/java/Tika/Goralta In voice.txt"
at java.net.URL.<init>(URL.java:609)
at java.net.URL.<init>(URL.java:506)
at java.net.URL.<init>(URL.java:455)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:472)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Java program completed with exit code 1
I am not a Java programmer. We use Java tools that make things easier for
us as RPG programmers. Any help you can provide will be very much
appreciated.
Gordon Schneider
403-236-0601
Trans Am Piping Products Ltd.
------------------------------
message: 2
date: Fri, 20 Jan 2017 10:34:39 -0700
from: Jack Woehr <jwoehr@xxxxxxxxxxxxxxxxxxxxxxxx>
subject: Re: Output results from tika to a file
Redirection with the > symbol is part of the shell, not part of java
Why not run it from QSH or QP2SHELL? then what you are doing will work.
On Fri, Jan 20, 2017 at 10:29 AM, Gordon Schneider <
schneiderg@xxxxxxxxxxxxxxxxx> wrote:
I have installed the tika java application from apache.org on our
power server running IBM I version 7.2 and on my PC running Windows
10. We get a word document from one of our vendors. We need to extract
the text from it to process the document.
I have this working on my PC by using the following command:
C:\Users\gord\Downloads>java -jar tika-app-1.14.jar -t
"C:\Users\gord\Documents\Goralta Invoice.doc" >
"C:\Users\gord\Documents\Goralta Invoice.txt"
The next step was to get the same result on our Power Server. I ran
the following command first
JAVA CLASS('/java/Tika/tika-app-1.14.jar') PARM('-t'
'/java/Tika/Goralta
Invoice.doc')
It displays the text it extracted on the screen. Great. So I know the
java program is working on our system. The last step is to get the
results to be ported to a file.
I have tried many different combinations but I cannot get it to work.
Here is an example of what I have tried and the error we are getting.
JAVA CLASS('/java/Tika/tika-app-1.14.jar') PARM('-t'
'"/java/Tika/Goralta Invoice.doc" > "/java/Tika/Goralta Invoice.txt"')
Exception in thread "main" java.net.MalformedURLException: no protocol:
"/java/Tika/Goralta Invoice.doc" > "/java/Tika/Goralta In voice.txt"
at java.net.URL.<init>(URL.java:609)
at java.net.URL.<init>(URL.java:506)
at java.net.URL.<init>(URL.java:455)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:472)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Java program completed with exit code 1
I am not a Java programmer. We use Java tools that make things easier for
us as RPG programmers. Any help you can provide will be very much
appreciated.
Gordon Schneider
403-236-0601
Trans Am Piping Products Ltd.
--
This is the Java Programming on and around the IBM i (JAVA400-L) mailing
list
To post a message email: JAVA400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/java400-l
or email: JAVA400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/java400-l.
--
Absolute Performance, Inc.
12303 Airport Way, Suite 100
Broomfield, CO 80021
NON-DISCLOSURE NOTICE: This communication including any and all
attachments is for the intended recipient(s) only and may contain
confidential and privileged information. If you are not the intended
recipient of this communication, any disclosure, copying further
distribution or use of this communication is prohibited. If you received
this communication in error, please contact the sender and delete/destroy
all copies of this communication immediately.
------------------------------
message: 3
date: Fri, 20 Jan 2017 11:46:17 -0600
from: David Gibbs <david@xxxxxxxxxxxx>
subject: Re: Output results from tika to a file
On 1/20/2017 11:29 AM, Gordon Schneider wrote:
JAVA CLASS('/java/Tika/tika-app-1.14.jar') PARM('-t'
'/java/Tika/Goralta Invoice.doc')
Use the OVRDBF to override STDOUT to a file.
OVRDBF FILE(STDOUT) TOFILE(yourfile) OVRSCOPE(*JOB)
david
--
David Gibbs
midrange.com
IBM i on Power Systems: For when you can't afford to be out of business!
I'm riding a metric century (100 km / 65 miles) in the American Diabetes
Association's Tour de Cure to raise money for diabetes research, education,
advocacy, and awareness. You can make a tax deductible donation to my ride
by visiting http://lsteml.diabetessucks.net. My goal is $6000 but any
amount is appreciated.
See where I get my donations from ... visit http://lsteml.diabetessucks.
net/map for an interactive map (it's a geeky thing).
I may have diabetes, but diabetes doesn't have me!
------------------------------
Subject: Digest Footer
--
This is the Java Programming on and around the IBM i (JAVA400-L) digest
list
To post a message email: JAVA400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/java400-l
or email: JAVA400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/java400-l.
------------------------------
End of JAVA400-L Digest, Vol 15, Issue 13
*****************************************
--
This is the Java Programming on and around the IBM i (JAVA400-L) mailing
list
To post a message email: JAVA400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/java400-l
or email: JAVA400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/java400-l.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.