Wednesday, 10 July 2019

Reverse Proxying to pgAdmin

Reverse proxying requests to a pgAdmin server is becoming more and more popular if posts to the mailing lists are to be taken as an indicative measure; more often than not when using pgAdmin in a container (of which there have now been over 10 million pulls)! Typically users will deploy a reverse proxy for a couple of reasons; to host multiple applications in different subdirectories under the same domain, or to add SSL/TLS support independently of the application.

Because of the number of questions asked, I spent a little time over the last couple of days doing some testing and updating the documentation with some examples. Here's a blog-ified version of that work.


Nginx

Nginx is winning the battle of the web servers these days, beating out Lighttpd (which is still a fine piece of software) and the ageing and arguably bloated Apache HTTPD. All of these servers support reverse proxying, and whilst I've looked at Nginx, the examples shown below can easily be translated to the other servers if you prefer to run them instead.

In the following examples, we have pgAdmin running in a Docker container (in which it's hosted under Gunicorn). For simplicity, the examples have Nginx running on the host machine, but it can also be easily run in another container, sharing a Docker network with pgAdmin. In such a configuration there is no need to map the pgAdmin container port to the host.

The container is launched as shown below. See the documentation for information on other useful environment variables you can set and paths you can map.

The commands below will pull the latest version of the container from the repository, and run it with port 5050 on the host mapped to port 80 on the container. It will set the default username and password to user@domain.com and SuperSecret respectively.

docker pull dpage/pgadmin4
docker run -p 5050:80 \
    -e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
    -e "PGADMIN_DEFAULT_PASSWORD=SuperSecret" \
    -d dpage/pgadmin4

A simple configuration to reverse proxy with Nginx to pgAdmin at the root directory looks like this:

server {
    listen 80;
    server_name _;

    location / {
        proxy_set_header Host $host;
        proxy_pass http://localhost:5050/;
        proxy_redirect off;
    }
}

Here we tell Nginx to listen on port 80, and respond to any server name (sent by the client in the Host header). We then specify that all requests under the root directory are proxied back to port 5050 on the local host, and that the Host header is passed along as well. The proxy_redirect option tells the server not to rewrite the Location header.

But what if we want to host pgAdmin under a subdirectory, say /pgadmin4? In this case we need to change the path at the top of the location block and add the X-Script-Name header to the requests made to the pgAdmin container to tell it what subdirectory it's hosted under. This is shown below:

server {
    listen 80;
    server_name _;

    location /pgadmin4/ {
        proxy_set_header X-Script-Name /pgadmin4;
        proxy_set_header Host $host;
        proxy_pass http://localhost:5050/;
        proxy_redirect off;
    }
}

OK, so that's cool but we're talking about super top secret database stuffs here. It needs to be encrypted! Adding SSL/TLS support to the configuration is largely unrelated to pgAdmin itself, except that as with the subdirectory, we need to tell it the URL scheme (http or https) to use. We do this by setting the X-Scheme header. The other changes are to add a redirect from http to https, and to configure SSL/TLS:

server {
    listen 80;
    return 301 https://$host$request_uri;
}

server {
    listen 443;
    server_name _;

    ssl_certificate /etc/nginx/server.crt;
    ssl_certificate_key /etc/nginx/server.key;

    ssl on;
    ssl_session_cache builtin:1000 shared:SSL:10m;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
    ssl_prefer_server_ciphers on;

    location /pgadmin4/ {
        proxy_set_header X-Script-Name /pgadmin4;
        proxy_set_header X-Scheme $scheme;
        proxy_set_header Host $host;
        proxy_pass http://localhost:5050/;
        proxy_redirect off;
    }
}


Traefik

Traefik is becoming increasingly popular in containerised environments because it's designed to integrate with the orchestration platform being used and auto-configure itself as much as possible. The examples below show it working with plain Docker, but it will work with Swarm, Compose, Kubernetes and other orchestrators as well. Note that Traefik is designed purely for reverse proxying, routing and load balancing; it's not a general purpose webserver.

In the examples below, the following Traefik configuration is used. Because it auto-configures itself, the changes we make to get the desired configuration are to the way the pgAdmin container is deployed, not to the Traefik configuration:

defaultEntryPoints = ["http", "https"]

[entryPoints]
  [entryPoints.http]
    address = ":80"
      [entryPoints.http.redirect]
        entryPoint = "https"
  [entryPoints.https]
    address = ":443"
      [entryPoints.https.tls]

[docker]
domain = "domain_name"
watch = true

With this configuration, Traefik will automatically detect containers as they are launched, and reverse proxy to them using the virtual hostname generated from the container name and the domain in its config file, e.g. <container_name>.<domain>. SSL/TLS is enabled in this setup, with a redirect from plain http to https. The certificate used will be the default one built into Traefik; see the documentation for details on how Let's Encrypt or certificates from other issuers can be used.

To host pgAdmin at the root directory, we simply launch a container with the correct name, and no host to container port mapping:

docker pull dpage/pgadmin4
docker run --name "pgadmin4" \
    -e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
    -e "PGADMIN_DEFAULT_PASSWORD=SuperSecret" \
    -d dpage/pgadmin4

With the configuration and commands above, Traefik will host pgAdmin at https://pgadmin4.domain_name/. Of course, the domain name should be changed to a real one, and a suitable CNAME, A or AAAA record should be added to the DNS zone file.

In order to host pgAdmin under a subdirectory, as in the Nginx example we need to tell both the proxy server and pgAdmin about the subdirectory. We tell pgAdmin by setting the SCRIPT_NAME environment variable, and we tell Traefik by adding a label to the container instance. For example:

docker pull dpage/pgadmin4
docker run --name "pgadmin4" \
    -e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
    -e "PGADMIN_DEFAULT_PASSWORD=SuperSecret" \
    -e "SCRIPT_NAME=/pgadmin4" \
    -l "traefik.frontend.rule=PathPrefix:/pgadmin4" \
    -d dpage/pgadmin4


Conclusion

Users are using reverse proxy servers to provide an interface between their clients and the pgAdmin server. These can be more traditional servers such as Nginx, or purpose designed reverse proxy servers such as Traefik. In either case, it's simple to configure SSL/TLS support or to host pgAdmin in a subdirectory.

Sunday, 30 June 2019

Indexing Documents for Full Text Search

During a conversation with $BOSS a couple of days ago the topic of the new Full Text Search in MongoDB came up. He (knowing of, but presumably having never worked with FTS in PostgreSQL) suggested I might create a tool like the one the MongoDB guys were proudly showing off in their keynote video from their recent conference to make it easy to index and search HTML documents on disk. I looked at him with a slight tilt of the head and remarked that it's already easy to do that, and just takes a few lines of Python code or similar.

So, here I am on my flight back to the UK having written 36 lines of code whilst munching on my breakfast and enjoying my free seat upgrade. I should probably note that those 36 lines of code include import statements, comments, blank lines for readability, and a nicely formatted 7 line SQL query that could easily be just one line. Plus, the vast majority of the code is really just boilerplate to read a directory of files and extract their contents.

Anyway, before I can go into that, it was necessary to first create a schema to hold the data we'd need to query. This is basically just a table to hold whatever information you like about each document, with a tsvector column for searching. For my test I decided to make the path to the file the primary key, and to store the title of the HTML document and the text in the body. The body text isn't really needed, but it does make it a touch easier to use a trigger to automatically maintain the tsvector data - plus of course, I could also then use that column and get rid of the source documents.

Here's what the table looked like:

  CREATE TABLE docs
  (
      path text NOT NULL,
      title text NOT NULL,
      body text NOT NULL,
      tsv tsvector NOT NULL,
      PRIMARY KEY (path)
  );

  CREATE INDEX docs_tsv_idx
      ON docs USING gin (tsv);

I also added the aforementioned trigger to maintain the tsvector. I got slightly fancy with this, indexing both the title and the body fields, and giving them different weights for ranking purposes:

CREATE FUNCTION update_tsv() RETURNS trigger
    LANGUAGE 'plpgsql' VOLATILE NOT LEAKPROOF
AS $BODY$
begin
  new.tsv :=
    setweight(to_tsvector('pg_catalog.english',
      coalesce(new.title,'')), 'A') ||
    setweight(to_tsvector('pg_catalog.english',
      coalesce(new.body,'')), 'D');
 return new;
end
$BODY$;

CREATE TRIGGER update_tsv
       BEFORE INSERT OR UPDATE ON docs
       FOR EACH ROW EXECUTE PROCEDURE update_tsv();

Now that we have somewhere to store our data, here's the code to load it with data. This should be run in a Python Virtual Environment that has the psycopg2 and BeautifulSoup4 packages installed:

from bs4 import BeautifulSoup
import psycopg2
import glob

# Load from the docs/ directory
path = 'docs/*.html'

# Connect to the database and begin a transaction
conn = psycopg2.connect("dbname=fts_blog user=fts_blog password=fts_blog")
cur = conn.cursor()

for file in glob.iglob(path):
    print("Loading: {} ".format(file))

    with open(file) as f:
        html = f.read()

        # Read the HTML and extract the title and body
        soup = BeautifulSoup(html, features="html.parser")
        title = soup.find('title').decode_contents(formatter="html").replace('&nbsp;', ' ')
        body = soup.find('body').decode_contents(formatter="html")

        # Create the SQL statement
        sql = """INSERT INTO docs
            (path, title, body)
        VALUES
            (%s, %s, %s)
        ON CONFLICT ON CONSTRAINT docs_pkey DO UPDATE SET
            title = %s,
            body = %s;"""

        cur.execute(sql, [file, title, body, title, body])

conn.commit()
cur.close()
conn.close()

As you can see, the code is pretty simple. It basically just loops around all the HTML files found in the docs/ subdirectory, uses BeautifulSoup4 to extract the document title and body, and then runs an INSERT on the database table. For convenience, the query also handles conflicts to allow easy re-indexing.

To load some data, I created a directory calls docs/ alongside the script, and copied the PostgreSQL HTML documentation into the directory. I could then run the indexer:

$ python indexer.py
Loading: docs/sql-alterroutine.html 
Loading: docs/backup-dump.html 
Loading: docs/gist-examples.html 
...
...

Process finished with exit code 0

Finally, I needed to test the search capabilities. That's done through a SQL query of course, for example;

SELECT
    ts_rank("tsv", plainto_tsquery('trigger')) AS "rank",
    path,
    title,
    ts_headline(body,
                 plainto_tsquery('trigger'),
                 'StartSel=*,StopSel=*,MaxFragments=2,' ||
                 'FragmentDelimiter=...,MaxWords=30,MinWords=1') AS "headline"
FROM
    docs
WHERE
    tsv @@ plainto_tsquery('trigger')
ORDER BY rank DESC LIMIT 20

So here we are selecting the document rank, the path and title of the document, as well as a headline - a small snippet of the document that highlights where the matches were found within it, as you'd probably see on your favourite search engine. That's all coming from the docs table of course, and is restricted by our search query and then sorted by the rank and limited to 20 results.

NOTE: The search term in the query above is 'trigger'. This word is actually included three times in the query text, so make sure you change them all if using the query above as a starting point for your own.

When we run the query we see the output below which can of course, be used however we like in an application.


So there we have it; a table with an index and trigger, and a trivial Python script and we have a working indexer and search engine. Oh, and in case you were wondering, no, this is not new functionality in PostgreSQL. It was added to the core server with version 8.3 which was released in 2008. Prior to that there was a contrib module called tsearch2, offering largely equivalent functionality from version 7.4 which was released in 2003. Do try to keep up Mongo 😀

Wednesday, 29 May 2019

Avoiding Gmail's confidential mode

So this is one of the very few (maybe the first?) blog entries I've written that aren't directly related to PostgreSQL, however, it does affect how I (and others) may work on the project.

Last night I received email from Google about my personal G Suite account which I use for all my day-to-day email, which for the most part is related to work on pgAdmin and PostgreSQL. Google were proudly announcing the rollout of their new Gmail Confidential Mode update. If you've not come across this yet, then essentially what it does is allow users to send emails that can be deleted or expired after a certain amount of time, optionally require SMS verification to open them, and prevent printing (but not screen-shots of course), forwarding or downloading etc.

When using the Gmail web interface, this all works fairly seamlessly. I can see why some people would want it if that's all they use, however, like many people, I also use other clients, for example, via IMAP. In that case, instead of the original email Gmail sends a placeholder email to replace the actual message which contains a link to allow you to login to Google and view the message online (assuming the SMS verification passes and the message hasn't been deleted or expired of course). That's going to be quite inconvenient to me, besides which, I really don't want anyone to be able to control access to emails they've sent me, after I've received them.

There's another problem affecting PostgreSQL's mailing lists however. How long will it be until someone sends such a message to one of the PostgreSQL lists, where it will do nothing but clutter up the archives and annoy other users (who won't be able to read the message anyway as they won't be able to login to Google as pgsql-hackers@postgresql.org or whatever the list address was)?


Fixing the PostgreSQL mail servers

After some discussion with some of the PostgreSQL sysadmin team, we discovered that Gmail adds a header to the messages that have confidential mode enabled (X-Gm-Locker: <token>). This is easy for us to deal with - Stefan was very quickly able to add a rule to the Exim configuration on our inbound SMTP servers that will immediately reject any messages with this header, sending a message back to the user explaining why their message was rejected, and to resend it with confidential mode disabled.


Fixing my Gmail outbound email

Disabling confidential mode for my own outbound email was pretty easy:
  1. Select the option to "Manage this domain"
  2. Navigate the G Suite admin interface to Apps -> G Suite -> Gmail -> User Settings.
  3. Under Confidential Mode, set the Disable option, and save the settings.
Now, confidential mode will no longer be available in the compose message interface in Gmail. Well, once the settings have propagated around and the UI has refreshed anyway.


Rejecting confidential messages from Gmail

This part is a little more complex, and requires a suitable G Suite account. It will not work with plain Gmail.
  • Select the option to "Manage this domain"
  • Navigate the G Suite admin interface to Apps -> G Suite -> Gmail -> Advanced Settings -> Content compliance
  • Add a policy:
    • Give it a name
    • Select "Inbound" for the email messages to be affected in section 1.
    • Set the option for "If ANY of the following match the message" in section 2, then add a new expression:
      • Set the type to "Metadata match".
      • Select "Gmail confidential mode" under Attribute.
      • Select "Message is in Gmail confidential mode" under Match type.
      • Save the expression.
  • In section 3, select Reject the message as the action and enter a suitable custom message, e.g. "This domain does not accept messages sent with Gmail Confidential Mode enabled. Please disable and resend."
  • Save the settings.
You should now find that email is rejected by Google, if confidential mode is enabled. I had my friend Jonathan send me a test message, which was duly rejected with the message:

550 5.7.1 This domain does not accept messages sent with Gmail Confidential Mode enabled. Please disable and resend. - gcdp z20sor188854uan.2 - gsmtp

There. Normality has been restored.

Wednesday, 5 September 2018

Why do we install as root?

A couple of common questions I hear from customers (usually long-time users of a particular database from Redwood) via our guys in the field is “why do we install our software as root?” And “why do we run services as postgres?”. The simple, TLDR; answer is “for security”. For a detailed explanation, read on…

A basic principle when securing a software installation is “install with maximum privilege requirements and run with minimal”. In practice this equates to having software being installed and binaries/executables etc. owned by the root user, whilst the services themselves are actually run under a minimally privileged (and ideally dedicated) service user account, typically postgres in a PostgreSQL installation. Data files, and any other files that need to be modified by the software in normal operation are also owned by the service user account.

Let’s look at the running software first. Postgres (which will in fact refuse to run as root), is a server process which is often running on a network port that is accessible from other nodes on the network. Of course, we should limit access as much as possible to only those nodes that need access using both a firewall (even simple iptables rules will work), and Postgres’ pg_hba.conf access control file, but even with those measures in place, it’s possible that a determined attacker (let’s call him Zero Cool) can still gain access to the port the database server is running on.

Once our arch-nemesis Zero Cool has access to the database server port, he needs a way to escalate his attack. This may involve exploiting an unknown security issue in Postgres itself (as with any software, we hope there are none but we’re kidding ourselves if we think it’s totally secure), or it may be that he’s used other techniques such as social engineering to learn a users credentials.

If Zero gains “regular” access to Postgres, then he will be subject to any security measures (access control lists, RLS policies etc) that limit the scope of what the user account he’s used can access/delete/update/whatever. If the user account has superuser privileges or access to un-trusted procedural languages, or if Zero gained access using a lower-level exploit that allows him to execute arbitrary code in other ways, then he will be able to wreak chaos at a lower level in the system, such as overwriting files on disk.

However - and this is the important bit - assuming there are no exploits in the Operating System that he can leverage to gain further privileges, his chaos will be restricted to things that the service account under which Postgres is running can do. In a well secured system where an unprivileged account like postgres is used, that will be limited to damage to the Postgres data files and other files (or processes etc) that user can modify or control. If Postgres were running under a privileged account like root, Zero would have pwned (in script-kiddie parlance) the entire system at this point!

Now consider the case where the Postgres software files were also owned by the postgres user. Zero would not only be able to affect files and processes owned by the service account, but would also be able to modify the software itself, allowing him the opportunity to add backdoors for future access or other malware such as spyware etc. In the case of software that is started as root (even that which later drops those privileges or switches to another user account for normal operation), this could be exploited to gain even easier privileged access at a later time.

This is why we want our software to be installed and owned by a high privilege user such as root and run as a low privileged user such as postgres. Doing so ensures that even if Zero manages to crack his way into Postgres and potentially access or modify data, he cannot modify the software or other aspects of the host system and thus has a much harder time further escalating his attack.

Thursday, 13 July 2017

Some numbers, oh, and pgAdmin 4 v1.6 was released

It's been far too long since I wrote a blog post (again!), so in a bid to start fixing that, here's an easy one:

pgAdmin 4 v1.6 released

The pgAdmin Development Team are pleased to announce the release of pgAdmin 4 version 1.6. This release of pgAdmin 4 includes over 70 bug fixes and a dozen new features. For details, please see the release notes.
Notable changes in this release include:

  • Significant performance improvements on Windows, massively reducing initial load time and improving UI response for the vast majority of users during testing.
  • Enhancements to the Query Tool enabling the viewing of large query resultsets far more quickly. For example, a simple test query with 96K rows rendered results within 1 second vs. 22 seconds in pgAdmin III during testing!
  • A major rewrite of the Query History tab allows browsing of queries executed in the query tool with full details including the entire query, in a much nicer user interface.
  • The Desktop Runtime now supports detachable tabs, allowing the Query Tool and Debugger to be opened in new tabs and then moved to alternate displays (from 1.5 this was possible in web mode only)
  • The Query Tool's Results Grid has been overhauled with a new, sleek look an feel supporting selection of arbitrary rows, columns or blocks of cells with full copy support and column sizing retention.
  • The Dashboard tab can now be closed if desired, to minimise query traffic resulting from graph updates.
For more information, checkout the online documentation, the screenshots, and of course the download page.

Collaboration

For those that aren't aware, pgAdmin 4 is the fourth complete rewrite of pgAdmin since the first code was written way back in 1998(!). One of the major aims of the technology change we made for the new version was to help attract new developers as we always found them extremely hard to find when we used C++ (we now use Python and Javascript). To kickstart the project, EnterpriseDB (my employer) allowed me to make use of one of our development teams and we embarked on a pretty epic journey to get the first release done. Much of that is documented in an earlier post.

We were happy to find that our plan worked - better than expected in fact - when Pivotal started contributing heavily to the project, bringing new skills and techniques that have enabled us to improve both pgAdmin and the way we design and develop it even further. This release of pgAdmin is the first to include major changes in functionality developed by the Pivotal team, which you can see in the overhauled Query Tool result grid UI and the enhanced Query History tab.

Today, community members, EnterpriseDB staff and Pivotal staff all collaborate to make pgAdmin better and better. Look out on the mailing lists for ways you can get involved, by joining periodic meetings we hold, participating in user focus sessions, or contributing to the project in any other way that may interest you.

Numbers

Well, I promised you some numbers and here they are. This info was compiled for my boss Marc last week who was curious to see how things were working out expanding the developer base on the project. Numbers are based on 2017-01-01 - 2017-07-05 stats from the GIT repository:

Number of committed changes by EDB:     256 (mostly smaller items such as bug fixes)
Number of committed changes by Pivotal:  67 (some larger changes)
Number of committed changes by Others:   62

Lines of code added by EDB:        32,542
Lines of code added by Pivotal:   110,017
Lines of code added by others:     19,221

Lines of code removed by EDB:      20,056
Lines of code removed by Pivotal:  35,688
Lines of code removed by Others:    2,184

I love it when a plan comes together :-)

Friday, 15 April 2016

pgAdmin 4 - The elephant nears the finish line

As you may know, many of us from the pgAdmin team have been hard at work on pgAdmin 4 for some time now. pgAdmin 4 is a complete rewrite of pgAdmin (the fourth, as you may guess), the previous version having reached the end of it's maintainable life after 14 years of development.

Work on the project began slowly, almost two years ago, however the team at EnterpriseDB have ramped up the development pace over the last few months. Right now, we're approaching alpha-readiness which we expect to be at within a few weeks.

Architecture

This new application is designed for operation on both the desktop and a webserver. Written in Python using the Flask framework for the backend, and Javascript/jQuery/Backbone for the frontend, it can easily be deployed as a WSGI application for multiple users in practically any network environment. A small runtime application allows it to be run as a desktop application - this is a Qt executable that incorporates a Python interpreter and web browser along with the main application in a single package that can be installed on a developer laptop as with previous versions of pgAdmin.

Functionality

Whilst the core functionality of pgAdmin 4 remains similar to pgAdmin 3, there are a number of changes we've made:

  • Support for unsupported database versions has been dropped.
  • We haven't re-implemented support for some object types that no one really used in the tool before - for example, operator classes and families.
  • We haven't (yet) reimplemented some of the tools that didn't work so well in pgAdmin 3, such as the graphical query builder or database designer (which was always disabled entirely by default).
  • The Query Tool and Edit Grid have been merged into a single tool. Over coming releases we'll be improving the functionality further to allow in-grid updates to be made to results from arbitrary queries (where a query is determined to be updateable). For now though, updating is allowed when pgAdmin knows the data source is a single table with a primary key.
  • The user interface is more flexible than ever, allowing tabs to be docked and re-arranged in more ways than previously.
  • We've spent time redesigning some of the UI paradigms in pgAdmin 3. Gone are the list controls with Add/Remove buttons, replaced with what we call sub-node grid controls that will allow in-grid editing of key values, with more detail available when needed in expandable rows.
  • We also spend time thinking about how to make it faster to use pgAdmin, by minimising the need to switch between dialogues, using searchable combo boxes and more.
  • The UI is much more attractive, making use of control groupings and expandable regions to make things more readable.

Screenshots

So, enough of the babble, here are some pre-release, semi-polished screenshots:

The main user interface, showing the properties of a function. 

Setting the ACL on a function. 
Adding a member to a composite type using the sub-node grid control. 
The Query Tool and Data Editor.

The Procedural Language Debugger.

Team

As you can imagine, there has been a significant amount of work done to get to this stage, and I really need to express my gratitude to those who have contributed, as well as the executive management team at EnterpriseDB who have allowed me to commit so many people to this project:

Project leadership
  • Ashesh Vashi (engineering team manager, code guru)
  • Karen Blatchley (project manager)

Development team
  • Khushboo Vashi
  • Akshay Joshi
  • Arun Kollan
  • Harshal Dhumal
  • Murtuza Zabuawala
  • Neel Patel
  • Sanket Mehta
  • Surinder Kumar

Packaging
  • Muhammad Aqeel
  • Paresh More
  • Sandeep Thakkar

QA
  • Priyanka Shendge
  • Fahar Abbas

Of course, there are also community members who are starting to contribute fixes and other improvements, such as Thom Brown (on his own time, not EDBs), Seçkin Alan, Ronan Dunklau and Prasad Somwanshi, all of whom (along with others I may have missed) deserve thanks.

Want to help or learn more?

If you want to help, you can checkout the code and start playing with it. We're not yet feature complete (for example, the Tables node in the treeview is still in development), but we're pretty close. Feel free to try out the code, and report or better yet, fix any bugs or issues you may find. If you wish to start working on new features that is also welcome, but please do email the hackers list first to ensure your work is not something that's already on our project plan!

If anyone would like to talk more about pgAdmin 4, I'll be at PGConf.US next week - the organisers know me well and should be able to help you find me for a chat or demo. See you there!

Sunday, 7 December 2014

The story of pgAdmin

I’m often asked how I first became involved in PostgreSQL, and how the pgAdmin project got started. Much as I’m happy to tell the story over beer, it becomes tedious to do so over email after the first half-dozen or so times. So in a vain attempt to save my fingers from future carpal tunnel syndrome, here’s the tale…

I first discovered PostgreSQL in 1997. I was working at a social housing provider in the UK where, at the time, we were running Informix SE and Pick databases. We wanted to start building in-house web applications for managing additional areas of the business. There were no useful interfaces for Pick that could be used with any of the web technologies available to us at the time and the licensing model for Informix and its ODBC driver was prohibitively expensive, so I started looking for alternatives. 

After spending some time researching mSQL and MySQL, I eventually found PostgreSQL, which seemed to offer everything we needed—a SQL-based DBMS with ODBC and C interfaces, a liberal license, a rich set of features, and a vibrant community supporting it that was responsive to bug reports and open to feature requests. My colleagues and I developed a number of applications using PostgreSQL in the following years and the database became a critical part of our business. We had applications written in PHP for the web as well as Visual Basic for the desktop users. 

However, it was early on in our journey with Postgres that, as a Windows shop (on the desktop at least), we found ourselves wishing for a good graphical management tool for the database. At the time, the only option was pgAccess, which was a basic tool written in TCL/TK that was not easy to get to work on Windows. So I decided to write my own in Visual Basic. The first version was never released to the public and was modeled on the pgAccess design. Called pgManager, it was quickly abandoned as we found the UI to be restrictive and, well, not overly usable. The second incarnation was pgAdmin, which was redesigned from the ground up. It was released as an Open Source project and quickly became a popular tool amongst Postgres users (which was extremely gratifying as it was my first Open Source project). 

Some years later, we once again found ourselves suffering due to the way the application had been designed. I shut myself away for a couple of weeks whilst my family were away visiting relatives in Scotland and wrote pgAdmin II, using a far more structured and maintainable design that implemented a much more usable user interface. I was particularly proud of the design and cleanliness of that code, but by 2002, we needed to start over again. This time it wasn't the application design that was holding us back, but rather the choice of technology. Visual Basic didn't handle internationalization or localization well, nor did it run on platforms other than Windows without the use of WINE, under which it was never particularly stable. The hard decision was made to rewrite everything again, this time using C++ with the wxWidgets framework. pgAdmin III looked much like pgAdmin II, but it solved all of the problems our (in particular) Japanese Linux-using friends were experiencing. Now in its third incarnation, pgAdmin remains the most popular Open Source GUI tool for managing Postgres.

I continued to work at the housing provider and make good use of Postgres until 2007, at which time I moved to EnterpriseDB so I could concentrate on my Postgres work full time. At EDB, not only do I work on our own tools, I also continue to contribute to the PostgreSQL community in various ways. I have the privilege of having full support of the management at EDB for my community work, allowing me the freedom to work on whatever I need to in order to fulfill my community roles on the core team, the PostgreSQL Europe board, and of course, pgAdmin. One of the products I’m responsible for at EDB is Postgres Enterprise Manager, which has a fat client interface that’s based on pgAdmin. This has allowed us to put more effort into the maintenance of pgAdmin, with members of the PEM team at EDB regularly contributing code, fixing bugs and reviewing the patches from elsewhere. Ashesh Vashi has even been made a committer for his efforts.

Despite the hard work over the last 12 or so years, the pgAdmin developers have come to the conclusion that there is limited life left in the current code. Aside from it being extremely hard to find C++ developers these days, we’ve also run into numerous difficult-to-fix bugs that can be traced back to issues or design choices in our code, the underlying wxWidgets framework, and some versions of GTK (the choice of which, we typically have no control as we inherit GTK from the users’ Linux distribution). What’s more, the world is shifting to a web based model these days. Cloud deployments are becoming more widely used, as well as simple instances supplied through hosting providers. Users are getting used to being able to pickup their laptop or a tablet and do whatever they need to do without having to install software – open a browser, read and write email, build a spreadsheet or create a database!

Consequently, we're now starting to design pgAdmin 4. The plan is to build a single Python-based pgAdmin that users can either deploy on a web server or run from their desktop. It’ll be designed to “live query” databases rather than query and cache schema details like the current implementation (which was designed with use over a dialup connection in mind!), and we’ll be making the application much more task focused (and therefore usable) than it is at present, as well as implementing some of the oft-requested features that are hard to do in the current code such as direct data editing in the query tool and tabbed query windows. The new code will also be extensible from the ground up, allowing support for new object types or functionality to be simply “dropped in” and auto-discovered at runtime.

We're still in the proof of concept stage at the moment though, so there's nothing available to download just yet. But stay tuned as work kicks up a gear over the next 12 months!