Friday, 30 July 2010

Cleaning up the Oracle mess

I've been caught off-guard by the number of news stories that have been posted about Oracle turning off their Solaris boxes that participated in the PostgreSQL Buildfarm and the fact that I've been working to replace them. I think most of us in the community assumed it was going to happen sooner or later - Sun were contributors to PostgreSQL, but once they bought MySQL I think it was the beginning of the end of their contributions to the project. Being acquired by Oracle was the solid-gold nail in the coffin.

Anyway, just to clarify a few points in some of the articles I've seen:
  1. My employer and corporate contributor to PostgreSQL, EnterpriseDB is providing the hardware for the new buildfarm members. We try to give back to the community wherever we can, and this is one thing that we can do quite easily.
     
  2. So far I've enabled two Solaris SPARC buildfarm members, one running the Sun Studio compiler, and one using GCC. There are equivalent Solaris Intel members being setup at the moment, as well as a couple of Windows 7 ones using VC++ 2008 32 and 64 bit.
     
  3. Contrary to reports and what might be inferred from the short bio on my blog, I actually work for EnterpriseDB on a full time basis. The "in my spare time I work for..." bit was my poor British attempt at humour. Sorry 'bout that - it won't happen again!
Well, that's about it I guess. We now return you to your scheduled programming.

Wednesday, 28 July 2010

Postgres, Passwords and Installers

By far the most common issues we see reported with the "one-click" PostgreSQL installers that we build here at EnterpriseDB are password related. In this post I'll explain what the passwords are, why we need them, and how to reset them.

Superuser Password

The Superuser Password is used by Postgres to secure the built-in "postgres" superuser account in the database itself. This is the only account found in a fresh installation. The password is setup during the initial installation of the database server, and may be changed at any point in the future using pgAdmin, or by issuing an SQL query such as:

ALTER USER postgres WITH PASSWORD 'VeryVerySecret';


The superuser password is not required to upgrade Postgres to a newer version, however it will be required if you install certain StackBuilder modules, such as PostGIS or any of the PHP applications that are available. The password is required for these installers because they connect to the database server and create databases and other objects required to run the software.


Service Password

The service password is the one that tends to confuse people. Postgres runs under a special operating system user account for security reasons. This account is created on your machine when the installer runs, and unless overridden on the command line, it will be called "postgres". On Unix-like operating systems such as Linux and Mac OS X, the account is setup without a password and users generally never need to worry about it again.

On Windows however, things are slightly different. In a normal installation of Postgres we'll setup the service account on Windows to use the same password that we use for the superuser account (expert users can override this behaviour using command line options for the installer). We need to do this because Windows requires that service accounts have secured passwords.

Note: If you change the superuser password in the future, that does not change the service password as well.

When you upgrade Postgres on Windows, in order to re-install the service we need to know the service password which is why the installer will require that it be re-entered correctly.

If you have forgotten the password, you can reset it on the command line (there are GUI tools that can be used as well, but they are not available on all versions of Windows). Start a command prompt, and then enter a command like:

net user postgres *

You will be prompted to enter and confirm a new password for the user. If you do this, you must then also update the password in the service configuration for any Postgres or related services (such as pgAgent or pgBouncer) that may use the account, or expect them to fail to start at the next reboot. You can do this using the Services management console which can be found in the Administrative Tools folder. Just right-click each service, select Properties, and enter the new password and click OK.

We get a lot of people asking us to reset their service password, as they mistakenly think it's related to either their Postgres Community Login, or their account on the EnterpriseDB website. It's not - and we cannot change it for you!

Uninstallation


If you uninstall Postgres, the service account will not be removed from the operating system. This is because each individual uninstaller has no way of knowing if you are using the service account with other packages or installations. This isn't a huge problem unless you come to reinstall Postgres at a later time and realise you've completely forgotten what the password was set to - in which case you can reset it as shown above. If you prefer to remove the account, you can use a command like:

net user postgres /delete

Summary

So to recap, we have Superuser passwords which are used to secure the database server's superuser account, and on Windows a service password to secure the operating system service account used by Postgres. Accounts use the same username (postgres) and the same password in a normal installation, but either password can be changed independently of the other post-installation. Both passwords are specific to your own machine, and can be changed (or the service account deleted) as shown.

For a walkthrough of the installation process, please see this guide.

If you need any further assistance having read this post, please use the pgsql-general@postgresql.org mailing list, or the EnterpriseDB Forums.

Thursday, 1 July 2010

Postgres Testimonials

Those of you who attended PG East in Philadelphia earlier this year may recall that we were recording user testimonials on video. Well, we've finally got a bunch of those edited and ready for public consumption - you can find them on the EnterpriseDB YouTube channel.

These videos are ideal material to help persuade your boss or co-workers to adopt Postgres - for example, this video features Jim Nasby talking about how Enova Financial managed $1 billion worth of loans last year using Postgres, having previously scrapped MySQL after just a few months of use.



Check them out. Spread the word.

SQL Injection attacks

There was another malicious attack targeting websites running Microsoft Internet Information Server (IIS) recently. I would imagine that the vast majority of website developers and DBAs are aware of SQL Injection attacks and how to avoid them - primarily, by using nothing but parameterised queries in your code - and a large percentage of those have probably seen the excellent XKCD comic strip that illustrates (yeah, yeah, pun intended) the point nicely.

It never hurts to have a belt-and-braces approach to these things though, which is why version 9.0 of our Postgres Plus Standard Server and version 8.4 of Advanced Server will include a new module called SQL/Protect. This is a relatively small plugin to Postgres prototyped by Jim Mlodgenski, the Chief Architect here at EnterpriseDB which helps protect against these sort of attacks. It uses a number of techniques, such as detection of tautology in WHERE clauses of your SQL to try to detect and prevent injection attacks. It can also run in a learning mode in which it will monitor your web app and learn the usage patterns of the database so it can prevent anything out of the ordinary being executed.

Look out for it in our upcoming releases.

Thursday, 27 May 2010

Comparing VoltDB to Postgres

I've been asked a few times recently for my opinion on VoltDB, the new database server architected by the 'father' of Postgres, Dr Michael Stonebraker so rather than repeating myself over and over again it seems like a good idea to write it all down.

VoltDB is an in-memory, lockless relational database that maintains ACID compliance, has a SQL interface and claims to offer massive performance increases and scalability over 'traditional' relational databases. If you take the time to visit the website, and download some of the docs or even the product itself, if you're a database geek like me you'll probably be pretty impressed. The technology is interesting - the ability to avoid locking seems like paradise, as does the linear scaling.

There are downsides though. In order to implement the key features of the DBMS, Stonebraker has had to design the system to work with a pretty narrow set of use-cases. Lets consider some of the pros and cons and compare them to Postgres.

In memory database

VoltDB is an in-memory database. This means it can be very fast. It also gives us two potential problems:

  1. The database must fit into the available memory on the system. That means that with a single server with 4GB of RAM, a practical database size limit may be in the order of 3GB or so, once you allow 1GB for the OS and memory required to actually operate the database server.

  2. Durability (the D in ACID) must be provided through replication of data to one or more secondary servers over the network, or through writing periodic snapshots to disk. Because replication is synchronous within the cluster and asynchronous between clusters, it is possible for a cluster-wide power failure to cause the loss of committed transactions.

Of course, in some circumstances these may be non-issues. VoltDB scales horizontally extremely well (near linearly in fact), so if your database is large, you can add more servers to get the storage you need. This won't suit people running multi-terabyte databases of course - RAM is cheap these days, but not that cheap - especially when you multiply by 2 or more for durability and redundancy!

In contrast, Postgres stores it's data on non-volatile storage - a direct attached hard disk, or SAN for example. The issues here are:

  1. Since Postgres uses both a shared buffer cache and the kernel cache (and potentially Infinite Cache in EnterpriseDB's Postgres Plus Advanced Server), in a well-configured system - like VoltDB - it will read most if not all of its data from memory. Unlike VoltDB, Postgres will still have locking and buffer management overhead however and of course, any disk reads will be much slower.

  2. Durability is achieved through the immediate logging of all transactions to a sequential transaction log, and later writing of updated pages to the heap. This is much slower than VoltDB's in-memory operations as we write everything to disk twice.

Summary: VoltDB is fast, because it's in-memory. This creates serious practical limits on database size though - how big is your budget? You'll also need at least two servers, with independent power supplies for any real durability. Postgres is slower, most noticeably when the working set doesn't fit into a cache, but you can store multi-terabyte databases on very cheap hard disks with full durability.

Lockless database

Traditional DBMSs allow for concurrent access by using granular locking and other techniques such as MVCC (Multi Version Concurrency Control). By locking only the smallest part of the database required to perform a specific operation, they ensure that other users can still access the rest of the database concurrently. Locks are held for as short a period as possible.

In contrast, VoltDB is a lockless database and therefore doesn't suffer from any of the complexities involved in lock management or MVCC snapshot management that Postgres does. This is especially important when scaling horizontally, as lock management and snapshot management are two of the most complex problems to solve when building clustered database systems based on Postgres.

VoltDB's solution isn't a panacea for these issues though. To achieve lockless operation, every request is serialised through each partition of the database. This means that if you have one partition (i.e. a single server), each database request will be run sequentially, with no parallelism at all. If you have partitioned data, then requests that affect different individual partitions may run in parallel, however any multi-partition requests must be run against the entire cluster on their own.

So what does this mean? Primarily, the effect is that performance is likely to tank if you try to run any complex queries on the system. Any long running, multi-partition query will block all other users of the database until it completes, which means that the system is only suitable for simple OLTP applications, with well thought out data structures. Don't be tempted to sneak any reports into your apps!

In contrast, Postgres can easily handle both OLTP and DW (data warehouse) workloads. MVCC ensures that 'readers never block writers', so you can run complex reports at the same time as tens or hundreds of OLTP operations are running concurrently. The downside is, that it is significantly harder to build multi-master clusters with Postgres.

Summary: VoltDB's lockless architecture makes it easy to scale horizontally, but limits concurrency. Postgres is harder to scale out, but offers excellent concurrency through fine-grained locking and MVCC.

Evolution and growth

The schema in a VoltDB database is defined in a runtime 'catalog'. This includes not just the schema, but also the details of the different nodes in the cluster, and the size of the database. Any changes that are required to the schema, to the configuration of the cluster, or to the size of the database currently require:

  • Dump of the data to non-volatile storage and shutdown of the VoltDB server on each node.
  • Reconfiguration of the runtime catalog.
  • Restart of the VoltDB server on each node, and reload of the data from storage.

In Postgres, database objects can be modified entirely 'on the fly', without shutting down the server or dumping or reloading data. The database size can grow to theoretically unlimited sizes, constrained only by the amount of storage available without the need to restart. Most of the clustering solutions for Postgres don't require a shutdown to install or reconfigure them.

Summary: VoltDB cannot offer high availability if the database grows beyond the predicted size, nor can changes to the schema (such as those that may be required by a software upgrade) be made without shutting down the system. Postgres allows your system to grow and evolve without requiring downtime.

Ecosystem, drivers and tools

VoltDB is a new project. At present, it doesn't have any real community, the only tools available are those shipping with the server, and the API is based on calling Java stored procedures. Actually, that's a bit awkward really. You essentially have to write a data access layer on the server in Java, which encapsulates all of the SQL queries you need to run. Your client code then calls those procedures directly.

Postgres has a long history. There is a large and vibrant community, with hundreds of tools, add-ons, utilities and so-on available. APIs to access the database are available for a host of languages including C, C++, .NET, Perl, Python, TCL, Ruby, PHP, Java and more. Stored procedures (functions) can be written in common languages including C, Perl, TCL, Ruby, Python PHP and Java.

Summary: The long Postgres history means that there are far more tools, utilities and interfaces available for it. VoltDB can catch up, but it will likely take many years. Further, the Java API limits how you can access the database - there is no way to connect via a standard ODBC or JDBC driver to allow you to use generic query tools for example.

Conclusion

VoltDB is an interesting product, but one with limited use-cases. If your database can fit in the memory of your server cluster, and you can architect your application to avoid any kind of complex query then it can offer vast performance advantages over traditional DBMSs like Postgres, and has potential to scale extremely well.

For most users though, concurrency and the ability to run complex queries are real issues, as is the ability to scale the database beyond sizes that are economical to keep in RAM, without having to dump and reload the data and restart the server to accommodate expansion and evolution of the system.

I can see some interesting use cases for VoltDB, and I'll be keeping an eye on its evolution and time permitting, trying it out for size. It's by no means a universal replacement for Postgres or any other similar DBMS (nor does it claim to be), but it could prove to be a very useful tool in the right situation.

Tuesday, 25 May 2010

PGCon 2010 summary

I arrived home from PGCon 2010 yesterday. As always the conference was held in Ottawa, Canada to which many of the PostgreSQL developers now make an annual pilgrimage.

We started on Wednesday with the third annual developer meeting, this year held at Arc The Hotel for the second time. Many thanks to Nadine for ensuring everything went without a hitch. There were 29 developers present, working their way through a packed agenda in an extremely productive manner. There were some great ideas for 9.1, with lots of people claiming items to work on. If we get only half of that done, 9.1 will be an incredible release.

There are some photos from the meeting on my SmugMug page.

Thursday was the start of the conference proper, beginning with an excellent presentation from Gavin M. Roy comparing PostgreSQL with NoSQL databases. The talk can be heard here. I ducked in and out of a few more talks over the day, but spent much of my time working with Magnus and Stefan on the new infrastructure hosting platform that we're polishing off for the project's hosting.

On Thursday night we had the annual EnterpriseDB party, held at the Velvet Room. Good food, plenty of drink, and people discussing Postgres well into the early hours of Friday. Unfortunately I don't have any photos as I was too busy running around giving out beer tokens to remember to take any.

Friday was the second day of the main conference, starting at 10AM (thankfully) with David Fetter and Jonathan Leto kicking off my day talking about PL/Parrot. Dan hosted the closing session which consisted largely of an auction, the main purpose of which was to show how the PostgreSQL community can raise more money than the BSD community for charity. And we did. By quite some margin. The biggest earner being a PostgreSQL 9.0 sweatshirt signed by all of the developers at the developer meeting, which amid a circus of bidding between teams led by Jim Nasby and Gavin Roy ended up going for $1517, donated by both camps. Photos, are here.

The week was finished off with a short stop in Montreal with a number of other community members, before taking advantage of cheap Sunday flights to get back home for Monday morning!

Elected to the PostgreSQL Europe Board

Magnus posted the results of the PostgreSQL Europe board elections yesterday, and I'm happy to have be one of the successful candidates, along with Guillaume Lelarge and Andreas Scherbaum. Everyone standing has made important contributions to PostgreSQL however and now that the election is over I can say that any of them would have been good choices in my opinion.

I'm looking forward to working on new initiatives to promote PostgreSQL within Europe, as well as the ongoing work to organise our 2010 conference in Amsterdam, so a big "thank you" goes to all those who voted for me.