PHP Seo – Sitemap and Robots.txt

by Ryan on March 19, 2010

PHP-Seo is a module that can plug into any application or framework to make SEO life a bit easier. Specifically, it provides an object oriented way of generating sitemap.xml and robots.txt files.

Download

php-seo.tar.gz
Github

Documentation

  1. Installation
  2. Creating a Sitemap
  3. Creating Robots.txt
  4. Unit Testing

Installation

After you have downloaded PHP-Seo you will need to add the library to the global include path. This can be done by:

set_include_path(
    get_include_path() . PATH_SEPARATOR .
    '/path/to/php-seo/library/'
);

Creating A Sitemap

Creating a sitemap is easy.

require_once 'PSeo/Sitemap/Xml.php';
 
$sitemap = new PSeo_Sitemap_Xml();
$sitemap->addUrl('http://www.potstuck.com');
$sitemap->addUrl('http://www.potstuck.com/category/programming/');
 
echo $sitemap->content();

Will output:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.potstuck.com</loc>
</url>
<url>
<loc>http://www.potstuck.com/category/programming/</loc>
</url>
</urlset>

When building a sitemap the only required field is a Url. However, there are other optional fields that you may wish to pass.

  • loc – The Url. (required)
  • lastmod – Last modification date.  In ISO 8601 or YYYY-MM-DD.
  • changefreq – How often the document changes: never, monthly, daily, always, etc.
  • priority – Range from 0.0 to 1.0, 1 being the most important

For the most part, only passing the Url (by using $sitemap->addUrl()) is sufficient. However if you would like to take advantage of the additional fields you can use addUrlData function by passing it an array. Example:

require_once 'PSeo/Sitemap/Xml.php';
 
$sitemap = new PSeo_Sitemap_Xml();
$sitemap->addUrlData(array(
    'loc' => 'http://www.potstuck.com',
    'lastmod' => '2010-03-15',
    'changefreq' => 'monthly',
    'priority' => '1.0'
));
 
echo $sitemap->content();

Will output:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.potstuck.com</loc>
<lastmod>2010-03-15</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
</url>
</urlset>

You can generate a plain text sitemap by using the PSeo_Sitemap_Txt() class. A plain text sitemap is just a list of Urls.

// old
// require_once 'PSeo/Sitemap/Xml.php';
// $sitemap = new PSeo_Sitemap_Xml();
 
// new
require_once 'PSeo/Sitemap/Txt.php';
$sitemap = new PSeo_Sitemap_Txt();

Creating Robots

Creating a robots.txt is simple. By default, the User-Agent will be *, the path / will be allowed, and nothing will be disallowed. See the following code/output for a clearer picture.

require_once 'PSeo/Robots/Txt.php';
 
$robots = new PSeo_Robots_Txt();
 
echo $robots->content();

Will output:

User-Agent: *
Allow: /

You can of course change all of these settings by using the built in functions of the Robots class.

$robots->setUserAgent(‘User Agent’)
Use any string as a user agent

$robots->setSitemap(‘Url To Sitemap’)
Use the full domain and path to the sitemap, http:// included

$robots->allowUrl(‘Url’)
Allow a single url

$robots->blockUrl(‘Url’);
Block a single url

An Example

require_once 'PSeo/Robots/Txt.php';
 
$robots = new PSeo_Robots_Txt();
 
$robots->blockUrl('/private');
$robots->blockUrl('/wp-admin');
$robots->setSitemap('http://www.potstuck.com/sitemap.xml');
 
echo $robots->content();

Will output:

User-Agent: *
Disallow: /private
Disallow: /wp-admin
Allow: /
Sitemap: http://www.potstuck.com/sitemap.xml

Multiple User Agents

You may need to block and allow certain Urls for different User Agents. The best way to do this is by creating multiple robot objects for each agent. Example:

require_once 'PSeo/Robots/Txt.php';
 
$robotsBotA = new PSeo_Robots_Txt();
$robotsBotA->setUserAgent('Bot A');
$robotsBotA->blockUrl('/blockAAA');
 
$robotsBotB = new PSeo_Robots_Txt();
$robotsBotB->setUserAgent('Bot B');
$robotsBotB->blockUrl('/blockBBB');
 
echo $robotsBotA->content();
echo $robotsBotB->content();

Will output:

User-Agent: Bot A
Disallow: /blockAAA
Allow: /
User-Agent: Bot B
Disallow: /blockBBB
Allow: /

Unit Testing

PHP-Seo includes a test suite that will unit test the library. Running this test is simple:

[ryan@localhost]$ cd /path/to/php-seo/tests/
[ryan@localhost]$ phpunit AllTests
PHPUnit 3.3.17 by Sebastian Bergmann.
 
....................
 
Time: 0 seconds
 
OK (20 tests, 29 assertions)

Testing within your application
If you would like to test the library within your application, then inside of your test suite you can include the PHP-Seo test suite. To do this your AllTests file should look something like this:

require_once '/usr/share/php/php-seo/tests/AllTests.php';
 
class AllTests extends PHPUnit_Framework_TestSuite {
 
    public static function suite() {
 
        $suite = new AllTests();
 
        // your unit test suites here
 
        // test php-seo
        $suite->addTestSuite('PseoTests_AllTests');
 
        return $suite;
    }
}

Once this is done whenever you test your application the PHP-Seo library will be tested as well.

No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS