rssfeed.pl - Generate RSS feed from web pages
option 1: rssfeed [<newsfile> [<rssfile>]]
If a workfile name is given then that file is read instead of the filename given in the config section. If an rssfile name is given then output is written to that file.
option 2: rssfeed [<newsfilename> [<rssfilename>]] < <newsfile>
newsfilename is optional if not present then output goes to workfile named in config section. If rssfilename is pressent that file is used for the rss output else from config section.
option 3: cat xx | rssfeed [<newsfilename> [<rssfilename>]]
like option 2 just from a pipe
-h --help
Display help
-m --man
Display a full man page
-c configfile --config=configfile
use the named file as the configuration file. For example: rssfeed.pl --config=path/test.config
-n --noesc
do not escape < or >. If not set then < = < and > = >
-r --resetdate
do not use the value in date="..." if it exists, instead use todays date-time.
This program reads a new (html) file and looks for <rssfeed> tags that should be inside html comments like this:
<!-- <rssfeed> --> some html <!-- </rssfeed> -->
Strictly speaking the rssfeed tag does not need to be inside comments, also you can have:
<!-- <rssfeed> then html which is inside the comment </rssfeed> -->
which lets you have code that does not appear on the web page.
This program extracts the <h2> element as the <title> element of the rss.
If there is a <a name=... tag the text of the name is appended to the link with a so the link goes directly to the anchor.
The program creates a temp file news.php.rss which has the <rssfeed> tag replaced with <rssfeed date='...'> which has the date this program was run. If the news.php file has the date='...' attrubute on the rssfeed tag then that date is used instead of the current date. After the program is done it copies the news.php file to news.php.old and then moves the news.php.rss file to replace news.php.
The <rssfeed> tag can take several other attributes:
date="..." article date title="..." article title url="..." the base url of the target page page="..." the page file name anchor="..." the anchor name noesc do not escape html codes
each of these attributes takes the place of tag item between the <rssfeed> tag. For example:
<rssfeed url="http://www.xyz.com" page="XYZ.php" anchor="this" title="XYZ test" date="Sun, 26 Apr 2009 19:58:59 GMT"> <h2>Some text here</h2> <p>Some more text as a description</p> </rssfeed>
This section of code would produce the following <item> sectoin in the rssfeed.xml file:
<item>
<title>XYZ test</title>
<link>http://www.xyz.com/XYZ.php#this</link>
<description><h2>Some text here</h2><p>Some more text as a description</p> </description>
<pubDate>Sun, 26 Apr 2009 19:58:59 GMT</pubDate>
</item>
The <h2> tag is ignored as a title if the title attribute is provided. The same goes for the other attributes. The link attribute takes the place of the default link set in the configuration section, this lets you have <rssfeed> tags in one file that reference another file or site.
rssfeed.pl
The default behavior, the files mentioned in the configuration file or the defaults are used.
rssfeed.pl def.html
The file 'def.html' is read instead of the 'newsfile' mentioned in the configuration file or default. The file 'def.html' is updated and a 'def.html.old' is the backup. The rss feed goes into the file mentioned in the configuration.
rssfeed.pl def.html abc.xml
Like above but the rss feed goes into 'abc.xml'.
rssfeed.pl xyz.html < def.html
The file to be parsed is 'def.html', the rss feed output goes to the file mentioned in the configuration file or defaults, the new html goesss to 'xyz.html'.
rssfeed.pl xyz.html abc.xml < def.html
The file to be parsed is 'def.html', the rss feed output will go to 'abc.xml', the new html goes to 'xyz.html'.
wget -O - http://localhost/def.php | rssfeed.pl
If you have rssfeed tags that at generated dynamically you can pipe the output from the webpage to rssfeed.pl. Assumming the configuration file or defaults are set to 'newsfile=webpath/def.php', 'rssfile=webpath/abc.xml' the rss output would go to webpath/abc.xml, the file webpath/def.php would be updated and a backup file webpath/def.php.old would be created.
rssfeed.config default configuration file. Should be in the same directory as the rssfeed.pl. Can be created by cutting and pasting the default configuration from the script and changing the variables to fit your site.
The <rssfeed ...> can be split over several lines; however, the ending MUST be on a line by itself. If the <rssfeed> tag is inside a comment the end comment can be on the same line as the ending > of the tag.
This is OK:
<-- <rssfeed title="Hi There"> -->
This is NOT OK:
<-- <rssfeed title="Hi There"> --> <p>Some html on the same line</p>
I guess this could be thought of as a BUG but I like to think of it as a feature:)
Probably, if you find any please let me know at the email addresses below. Thanks.
Barton Phillips
|
|
|---|
Last Modified May 02, 2010 15:44:14 MDT