An Introduction to RSS

What is RSS, and what does RSS stand for?
The RSS Parser
Outputting the Feed
Specific implementations


What is RSS, and what does RSS stand for?

RSS is the name given to several related forms of XML that forms a framework for sharing news headlines and also news stories. It can also be used for just about any set of information that you wish to share. A good intoduction to RSS is available at XML.com. There are a few definitions of what RSS stands for - one is Really Simple Syndication, and two others are Rich Site Summary and RDF Site Summary. Also, if you are interested in how to create a basic RSS feed, then there is an article right here on the CWM website.

RSS versions 0.91-0.94 and 2.0 have the same basic structure. RSS 1.0 is slightly different (and slightly more complicated) - it is based on RDF (another XML format). The RSS parser described in this article is designed to be simple and flexible in the way it reads the feed, and seems to be compatible with most of the RSS formats. It does not, as yet, handle the attributes of the RDF format.

An example of an RSS feed is available from Good News Publishers. Note that this particular feed is RSS 2.0. An RSS feed may contain a number of tags, or elements, however there are several fundamental tags common to most feeds:

TITLE
DESCRIPTION
LINK

and one or more Items that have the following structure:

ITEM
-> TITLE
-> DESCRIPTION
-> LINK

Other common tags are IMAGE, CHANNEL and RSS. The RSS parser in this article ignores the RSS and CHANNEL tags since these generally offer no benefit and just add extra levels to the resulting tree-structure.


Paul Davey
Paul Davey is an administrator here at CWM and is the webmaster of Whitford Church in Western Australia.

An Introduction to RSS : The RSS Parser

What is RSS, and what does RSS stand for?
The RSS Parser
Outputting the Feed
Specific implementations


The RSS Parser

The basic idea of the RSS parser described in this article is to turn the feed into an associative array whose structure represents the XML structure of the feed, with two exceptions:
1. As already mentioned, the RSS and CHANNEL tags are skipped, as are any XML document declarations at the start of the feed. This is to simplify the resulting array.
2. The series of ITEM tags are grouped together in an ITEMS array within the tree structure. Otherwise it would only be possible to store one ITEM in the associative array.

For example, the following is an example of a simple RSS feed:

<rss version="2.0">
<channel>
<title>English Standard Version Bible Verses</title>
<link>http://www.gnpcb.org/esv/rss2.0/</link>
<description>Verses from the ESV Bible.</description>
<item>
<title>Proverbs 3:11-12</title>
<link>http://www.gnpcb.org/esv/search/?passage=Proverbs+3%3A11-12</link>
<description>My son, do not despise the LORD's discipline or be weary of his reproof, for the LORD reproves him whom he loves, as a father the son in whom he delights.</description>
</item>
<item>
<title>John 3:16</title>
<link>http://www.gnpcb.org/esv/search/?passage=John+3%3A16</link>
<description>"For God so loved the world, that he gave his only Son, that whoever believes in him should not perish but have eternal life."</description>
</item>
</channel>
</rss>

In general, the content of each tag is assumed to not contain any XML or HTML code (i.e. code intended to be output), since this would confuse the parser (by default it treats all tags in the same way). There are 3 exceptions:

1. The HTML is encoded using HTML character entities (eg &lt;strong&gt;).
2. The HTML is enclosed in a <![CDATA[htmlcode]]> tag
3. The XML namespace is specified in the parent tag, for example: <body xmlns="http://www.w3.org/1999/xhtml">HTML in here</body>

To effectively use the feed, you need to know it's structure. Most feeds simply use the three TITLE, LINK and DESCRIPTION elements, however there are some variations to this - the Good News Publishers "Read Through the Bible in a Year" feeds are a good example of this - they make use of method 3 for including HTML in the main content, which is contained in a BODY tag rather than a DESCRIPTION tag (which simply contains a link to the text on the ESV web site).

To parse an RSS feed, you need to create an instance of the RSSParser class:

<?php
$myrssfeed = new RSSParser(xmlfilepath[,encodeampersands=FALSE]);
?>

The xml filepath can be (and often is) a remote file path, as long as your web server allows this (remote fopen).

The second, optional, argument is used to specify whether you want to encode any ampersands contained in the feed. Most feeds will not require this, and in some instances you won't want to do it, but it was necessary to correctly handle html-encoded characters in the "Read Through the Bible in a Year" feeds.

If the previously listed RSS feed was parsed, it would be turned into the following php array:

<?php
# contents of $myrssfeed->rssarray
array (
"TITLE" => "English Standard Version Bible Verses",
"LINK" => "http://www.gnpcb.org/esv/rss2.0/",
"DESCRIPTION" => "Verses from the ESV Bible.",
"ITEMS" => array (
0 => array (
"TITLE" => "Proverbs 3:11-12",
"LINK" => "http://www.gnpcb.org/esv/search/?passage=Proverbs+3%3A11-12",
"DESCRIPTION" => "My son, do not despise the LORD's discipline or be weary of his reproof, for the LORD reproves him whom he loves, as a father the son in whom he delights."
),
1 => array (
"TITLE" => "John 3:16",
"LINK" => "http://www.gnpcb.org/esv/search/?passage=John+3%3A16",
"DESCRIPTION" => ""For God so loved the world, that he gave his only Son, that whoever believes in him should not perish but have eternal life.""
)
)
)
?>


Paul Davey
Paul Davey is an administrator here at CWM and is the webmaster of Whitford Church in Western Australia.

An Introduction to RSS : Outputting the Feed

What is RSS, and what does RSS stand for?
The RSS Parser
Outputting the Feed
Specific implementations


Outputting the Feed

The end result of having this array is being able to output the RSS feed in whatever format you choose. A fairly generic way would be to use the following code to print out a single item:

<?php
# set $item to be one of the items in the ITEMS array
$item = $myrssfeed->rssarray['ITEMS'][0];
?>
<p><a href="<?php echo $item['LINK']; ?>" target="extnews"><?php echo $item['TITLE']; ?></a><br />
<?php echo $item['DESCRIPTION']; ?></p>

This would produce the following HTML:

<p><a href="http://www.gnpcb.org/esv/search/?passage=Proverbs+3%3A11-12" target="extnews">Proverbs 3:11-12</a><br />
My son, do not despise the LORD's discipline or be weary of his reproof, for the LORD reproves him whom he loves, as a father the son in whom he delights.</p>

Since the RSS items are stored in the ITEMS array, it is simply a matter of looping through this array to display them all:

<?php
while (list($key,$item) = each($myrssfeed->rssarray['ITEMS'])) {
# code to display each item: $item['TITLE'], $item['LINK'], $item['DESCRIPTION']
}
?>

Many RSS feed providers ask that you acknowledge them as the source of your information, by providing a link to their site, which is usually included in the feed for you convenience:

<p>This feed is provided courtesy of <a href="<?php echo $myrssfeed->rssarray['LINK']; ?>" target="extnews"><?php echo $myrssfeed->rssarray['TITLE']; ?></a></p>

The PHP code that handles the XML namespace tags makes use of some functions that are (listed but) not documented in the PHP manual.

For normal tags, the execution order is quite simple:

Start of tag (eg LINK) -> function startElement()

End of tag (eg LINK) -> function endElement()

For namespace tags, the execution order of these functions is shown below:

Start of namespace tag (eg BODY) -> function nsstartElement() -> function startElement()

End of namespace tag (eg BODY) -> function nsendElement() -> function endElement()

These handlers are defined as below:

xml_set_start_namespace_decl_handler($this->parser, "nsstartElement");

xml_set_end_namespace_decl_handler($this->parser, "nsendElement");

The other key function, to make use use of the namespace functionality of the expat extension, is the xml_parser_create_ns() function (used instead of the xml_parser_create() function).


Paul Davey
Paul Davey is an administrator here at CWM and is the webmaster of Whitford Church in Western Australia.

An Introduction to RSS : Specific implementations

What is RSS, and what does RSS stand for?
The RSS Parser
Outputting the Feed
Specific implementations


Specific implementations

It is possible to query the "Read Through the Bible in a Year" feeds for previous or future days, in case someone misses a day. You just need to construct the appropriate query string for the feed filepaths (such as this link, for example)

It is also possible to alter it's content in a number of ways, such as whether or not to include the footnotes as shown here.

You can also obtain a plain-text version of the feed too, without any HTML formatting at this address.

The last format may be better suited to include a passage in a plain text email.

To see an example of the implementation of this script, please see these pages:
Bible Reading Plan (www.whitford.org.au/resources/bible-reading-plan)

News Feeds (www.whitford.org.au/resources/news-feeds)
Paul Davey
Paul Davey is an administrator here at CWM and is the webmaster of Whitford Church in Western Australia.