Setting up a google sitemap is an easy way to force google to notice your site. A sitemap is just a simple xml file that lists every url you want google to know about.
They are especially useful if…
- You have dynamic content.
- Your site is new and google is unaware of it.
- You use a lot of AJAX or Flash.
You also get the added benefit of seeing where the googlebot looked last, where it encountered errors, and your sites top search keywords.
So it’s helpful, but is it easy to setup? If you’re using Ruby on Rails (or any other ruby based framework) it’s cake!
Step 1: Created a script (RAILS_ROOT/scripts/sitemap)
This script will collect all relevant urls and create a file at RAILS_ROOT/public/sitemap.xml that contains info about each url. For example, let’s pretend we have a site devoted to hippo pictures, our script would look like this…
#!/usr/bin/env ruby
ENV['RAILS_ENV'] ||= "production"
Dir.chdir(File.expand_path(File.dirname(__FILE__) + "/..")) # Change current directory to RAILS_ROOT
require "config/environment" # Start up rails
# These two lines make life super easy… It allows you to call url_for/link_to outside of a controller or view
include ActionController::UrlWriter
default_url_options[:host] = 'www.hippos-are-awesome.com'
filename = "#{RAILS_ROOT}/public/sitemap.xml"
hippo_pics = HippoPic.find(:all) # Such a wonderful collection
File.open(filename, "w") do |file|
xml = Builder::XmlMarkup.new(:target => file, :indent => 2)
# This
xml.instruct!
xml.urlset "xmlns" => "http://www.sitemaps.org/schemas/sitemap/0.9" do
for hippo_pic in hippo_pics
xml.url do
xml.loc url_for(:controller => "hippos", :id => hippo_pic.id)
xml.lastmod hippo_pic.updated_at.xmlschema
xml.changefreq "weekly"
xml.priority 0.5
end
end
end
end
For more info about what the lastmod, changefreq and priority mean in the sitemap, google explains it all here. Basically they tell google which urls are more important.
Step 2: Create a daily or weekly cronjob to run the sitemap script
Just switch to the user that runs your ruby apps and add this to its crontab.
20 2 * * * PATH_TO_RAILS_APP/script/sitemap # Runs the sitemap script every morning
Step 3: Let google know about your sitemap
Head over to google’s webmaster tools and follow the instructions on how to point google to your sitemap
That’s it. Some other additional things to consider are
- gzip your sitemap. Google can read them just fine and you save on bandwidth.
- If you have more than 50,000 links you need to split your sitemap into several files.
- Other search engines (like yahoo) can take google style sitemaps too.
4 Comments
Shouldn’t the sitemap file go into RAILS_ROOT/public/sitemap.xml instead of RAILS_ROOT/sitemap.xml?
It totally should. Thanks for the catch.
This is a good tutorial, thanks for sharing it. Would you mind posting a comment on how you would modify it to include multiple model types?
Also, even more important than a Google sitemap is making sure your site is appropriately cached and speedy to the Googlebot when it is going through. Even on a large site if you have flat HTMl files the bot can go through and index your content relatively quickly. Too many database calls will make Googlebot slow down and ultimately not go through your site completely.
Michael
Ive posted a google sitemap generator which works with multiple models and rails.
http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
Post a Comment