Wednesday 21 December 2011

Updated PostgreSQL Download Infrastructure

Back at the tail end of November, the PostgreSQL sysadmin/web team successfully migrated the PostgreSQL website to a new platform, based on PostgreSQL, Django, Lighttpd, and Varnish, to replace the old somewhat complex and messy PHP platform used in the past. Functionally and visually the website is almost identical to what it was, though the behind-the-scenes management interface is now vastly improved, as is the "Your Account" section which now offers users much more control over their submissions like news and events etc.

One other change that went largely unnoticed however was in the downloads section of the website, specifically the FTP area. This is a web interface over the content on the PostgreSQL FTP site, ftp.postgresql.org, that gives users a nice way to browse the files and directories on the site. On the old website, when the user clicked to download a file they would then be taken to a page of flags where they could select a mirror site to download the file from. The mirrors were all third party servers over which we had no control, aside from an automated system to ensure we only listed those which had content which was no more than 48 hours out of date. This arrangement made sense years ago when bandwidth was more precious, however with the bandwidth available to us these days it's really just clunky and inconvenient for users to have to choose one of 75 flags to reach a server that may not be entirely up to date.

To resolve this, as the new website infrastructure went live we also pushed a new download infrastructure into production. What you'll see now is direct links to files on download servers we run ourselves, from the website. This gives us a number of advantages:

  • The user has a much slicker experience when downloading, both in terms of the workflow, and often the speed of downloads (because some of the old mirrors were much faster than others).
  • We can push out files to the download servers in minutes, rather than days.
  • We can collect meaningful statistics to help us understand what users are downloading.
  • We can automatically (and invisibly) disable download servers in the event of problems, within minutes.
So what does this mean to the end user? Well, for a user coming from the website downloads take fewer clicks and avoid the "flags" page which could be daunting and potentially confusing. For those using the FTP site directly, there are a number of changes:
  • The site can be accessed at ftp://ftp.postgresql.org/, rather than using one of the individual mirror hostnames we used in the past.
  • Because there are no third party mirrors, there are no inconsistent paths to the content (one mirror in the past may have had content under /pub/mirrors/postgresql/ whilst another may have used /u/postgres/).
  • We can now offer downloads over HTTP using the same paths as FTP - you can use http://ftp.postgresql.org/pub/... or ftp://ftp.postgresql.org/pub/...
  • We've also opened up RSYNC access to all users, where previously only registered mirror sites could rsync the downloads from us: rsync://ftp.postgresql.org::pgsql-ftp.
Because there are lots of links to files on the old mirror network on the web, we've left the old mirror hostnames in place for the time being (though they are no longer being monitored), however they will be phased out over time.

Happy downloading!