What is RSS, and what does RSS stand for?
The RSS Parser
Outputting the Feed
Specific implementations
The basic idea of the RSS parser described in this article is to turn the feed into an associative array whose structure represents the XML structure of the feed, with two exceptions:
1. As already mentioned, the RSS and CHANNEL tags are skipped, as are any XML document declarations at the start of the feed. This is to simplify the resulting array.
2. The series of ITEM tags are grouped together in an ITEMS array within the tree structure. Otherwise it would only be possible to store one ITEM in the associative array.
For example, the following is an example of a simple RSS feed:
<rss version="2.0">
<channel>
<title>English Standard Version Bible Verses</title>
<link>http://www.gnpcb.org/esv/rss2.0/</link>
<description>Verses from the ESV Bible.</description>
<item>
<title>Proverbs 3:11-12</title>
<link>http://www.gnpcb.org/esv/search/?passage=Proverbs+3%3A11-12</link>
<description>My son, do not despise the LORD's discipline or be weary of his reproof, for the LORD reproves him whom he loves, as a father the son in whom he delights.</description>
</item>
<item>
<title>John 3:16</title>
<link>http://www.gnpcb.org/esv/search/?passage=John+3%3A16</link>
<description>"For God so loved the world, that he gave his only Son, that whoever believes in him should not perish but have eternal life."</description>
</item>
</channel>
</rss>
In general, the content of each tag is assumed to not contain any XML or HTML code (i.e. code intended to be output), since this would confuse the parser (by default it treats all tags in the same way). There are 3 exceptions:
1. The HTML is encoded using HTML character entities (eg <strong>).
2. The HTML is enclosed in a <![CDATA[htmlcode]]> tag
3. The XML namespace is specified in the parent tag, for example: <body xmlns="http://www.w3.org/1999/xhtml">HTML in here</body>
To effectively use the feed, you need to know it's structure. Most feeds simply use the three TITLE, LINK and DESCRIPTION elements, however there are some variations to this - the Good News Publishers "Read Through the Bible in a Year" feeds are a good example of this - they make use of method 3 for including HTML in the main content, which is contained in a BODY tag rather than a DESCRIPTION tag (which simply contains a link to the text on the ESV web site).
To parse an RSS feed, you need to create an instance of the RSSParser class:
<?php
$myrssfeed = new RSSParser(xmlfilepath[,encodeampersands=FALSE]);
?>
The xml filepath can be (and often is) a remote file path, as long as your web server allows this (remote fopen).
The second, optional, argument is used to specify whether you want to encode any ampersands contained in the feed. Most feeds will not require this, and in some instances you won't want to do it, but it was necessary to correctly handle html-encoded characters in the "Read Through the Bible in a Year" feeds.
If the previously listed RSS feed was parsed, it would be turned into the following php array:
<?php
# contents of $myrssfeed->rssarray
array (
"TITLE" => "English Standard Version Bible Verses",
"LINK" => "http://www.gnpcb.org/esv/rss2.0/",
"DESCRIPTION" => "Verses from the ESV Bible.",
"ITEMS" => array (
0 => array (
"TITLE" => "Proverbs 3:11-12",
"LINK" => "http://www.gnpcb.org/esv/search/?passage=Proverbs+3%3A11-12",
"DESCRIPTION" => "My son, do not despise the LORD's discipline or be weary of his reproof, for the LORD reproves him whom he loves, as a father the son in whom he delights."
),
1 => array (
"TITLE" => "John 3:16",
"LINK" => "http://www.gnpcb.org/esv/search/?passage=John+3%3A16",
"DESCRIPTION" => ""For God so loved the world, that he gave his only Son, that whoever believes in him should not perish but have eternal life.""
)
)
)
?>
Paul Davey
Paul Davey is an administrator here at CWM and is the webmaster of Whitford Church in Western Australia.