× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



But - you may want to keep title text from the <HEADER> segment.

Also - formatting the text output may meed a bit of rework if DIV or TABLES are used for formatting in the original HTML.

And just for added fun, look out for javascript and stylesheets (and embedded style info). :)


Regards,
John McKay mba
www.rpglanguage.com
www.mckaysoftware.ie

----- Original Message ----- From: "Vern Hamberg" <vhamberg@xxxxxxxxxxx>
To: "RPG programming on the IBM i / System i" <rpg400-l@xxxxxxxxxxxx>
Sent: Tuesday, September 21, 2010 1:51 PM
Subject: Re: Convert HTML to plain text


Mike

No code, just questions!

Is the HTML in a PF or in a STMF? The latter is preferable, methinks. I
can see a couple things to do - first look for closing tags - scan for
"</" - and the scan back for the matching opening tag. Then take on the
unary (my term) tags like <br>.

There's also the need, perhaps, to take out <html> and <head> and not
the contents of <body>.

Or maybe, as I get from another site, it's enough to strip everything
between "<" and ">" in that order - unless you have comparison operators
in there! Sites in the google below discuss these issues.

I did a quick google on "strip html tags". One link -
http://weblogs.asp.net/rosherove/archive/2003/05/13/6963.aspx -
discusses using regular expressions. Another -
http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page
- discusses issues about text you still want inside some tags.

Looks as if grep or sed or the like could do the work, with an
appropriate expression. And those are callable from RPG or CL through QSH.

HTH
Vern

On 9/21/2010 7:13 AM, Mike Cunningham wrote:
Would anyone happen to have RPG code to take HTML and strip off all the tags and just have plain text that would be printed using normal print files? I have a form that needs to be displayed on a web page and also printed from an RPG application. Part of the form is data collected using a rich-text editor on a web page that is stored as HTML in a variable length field. Works great when the form is on a webpage as it is a what-you-see-is-what-you-get function. Any special editing put in the rich-text editor shows on the web page exactly as entered. Problem is taking that html code and printing it using a normal print file to an outq then the printer. Stripping out the html tags might not be too bad. Dealing with<br> tags and<p> tags and<ul><li> can be a bit more challenging but I think word wrap is going to be the hardest. The print file line is 80 characters and I need to be sure to not break a word between lines. Some tricky code and I thought I would just see if anyon!
e !
might have done this already and would share their code.

Thanks

--
This is the RPG programming on the IBM i / System i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.