Monday 16 November 2020

macOS Big Sur upgrade breaking PostgreSQL installations

Update: I, and others have subsequently been unable to reproduce the problem below. If you do run into the issue, please let me know in the comments below.

Some of the readers of my blog may be aware that I'm responsible for the PostgreSQL Installers produced by EDB for the PostgreSQL Community. This includes both the Windows and macOS installers, and the now-deprecated Linux installers (use RPMs/DEBs instead)!

Over the weekend I upgraded my laptop to the new Big Sur release from Apple, a.k.a. macOS 11.0. Everything went just fine until I tried to connect to my PostgreSQL 13 development server and found it had failed to startup. I did some digging, and quickly found that there was a permissions problem:

dpage@hal:~$ sudo su - postgres -c '/Library/PostgreSQL/13/bin/postmaster -D/Library/PostgreSQL/13/data'

2020-11-16 16:00:18.544 GMT [9532] FATAL: data directory "/Library/PostgreSQL/13/data" has invalid permissions

2020-11-16 16:00:18.544 GMT [9532] DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).

Thankfully that's pretty easy to fix:

sudo chmod 0700 /Library/PostgreSQL/13/data

LaunchDaemon then tried again to start PostgreSQL, and succeeded:

dpage@hal:~$ ps -ef |grep postgres

  503  9548  9547   0  4:02pm ??         0:00.00 postgres: logger  

  503  9550  9547   0  4:02pm ??         0:00.00 postgres: checkpointer  

  503  9551  9547   0  4:02pm ??         0:00.00 postgres: background writer  

  503  9552  9547   0  4:02pm ??         0:00.00 postgres: walwriter  

  503  9553  9547   0  4:02pm ??         0:00.00 postgres: autovacuum launcher  

  503  9554  9547   0  4:02pm ??         0:00.00 postgres: stats collector  

  503  9555  9547   0  4:02pm ??         0:00.00 postgres: logical replication launcher  

  503  9556  9547   0  4:02pm ??         0:00.03 postgres: postgres ml ::1(49777) idle 

  501  9558  9469   0  4:02pm ttys004    0:00.00 grep postgres

So what happened here? It looks like the upgrade process for Big Sur "fixed" the permissions on the data directory to set them to what it thought was appropriate. Unfortunately, the new permissions are considered too broad by PostgreSQL which refuses to start to draw attention to the possible security issue.

Since correcting the permissions I've rebooted my machine and found that they stayed correct, so it's likely this issue happened during the upgrade process, and not as part of the boot process or some other process that periodically "fixes" permissions.

I've also tried installing PostgreSQL 12, and found that a fresh installation works as it should; the installer correctly sets the permissions on the data directory and PostgreSQL happily starts.

So, if you upgrade to Big Sur and PostgreSQL stops working, check that the upgrade process hasn't modified the permissions on the data directory. The directory should look like this:

dpage@hal:~$ ls -al /Library/PostgreSQL/13 | grep data

drwx------  31 postgres  daemon    992 16 Nov 16:02 data

We will be raising this issue with Apple.

Tuesday 25 August 2020

Testing pgAdmin

Like any software, pgAdmin has bugs. At the time of writing there are 119 new or in-progress issues in the bug tracker, which is pretty good for a piece of software with so many moving parts, and regular-as-clockwork monthly releases. Over 10% of these issues are currently assigned to the reporter to gather further information, plus there will undoubtedly be some duplicates and support issues/feature requests in that number that haven’t yet been picked up in our weekly backlog refinement meeting and either closed or moved to the feature or support trackers.

Manual Testing

We have one person, Fahar, who is permanently assigned as QA to the pgAdmin team at EDB, though he is also responsible for a number of other product releases such as the PostgreSQL installers for Windows and macOS (pgAdmin is the vast majority of his work however).

Fahar primarily does four things:

Works to verify that issues reported to users are reproducible, when they require complex setup or the developers have failed to reproduce them.
Verifies that bugs that have been fixed by the developers actually do fix the reported problem.
Smoke tests release builds; that is, ensures the packages can be installed and basic functionality works prior to public release.
Tests new features once development is believed to be complete.

Automation

All pgAdmin tests and builds are automated using Jenkins. The system will notify the development team of any failures via email and via a dedicated Slack channel (internal to EDB). There are also large screens mounted on the wall in development offices to show the status of the test and build runs in real time.

Automated Testing

We have three primary automated test suites for pgAdmin, which are essential in allowing us to release quickly and regularly, with minimal issues.

Python Tests

The Python tests (sometimes referred to as the API tests) are intended to test the Python application that implements the pgAdmin server. The test suite will instantiate the server and then make hundreds of calls to the server with different inputs to ensure itr responds as expected. A small number of these tests operate directly on the Python code, and do not run through the API (e.g. for testing server import/export).

One of the Python tests is the RE-SQL (Reverse Engineered SQL) test. This is a special case as it’s a single test which is actually comprised of around 460 sub-tests, designed to test the SQL generation in pgAdmin; each sub-test will create or modify an object, and then validate that the SQL generated by pgAdmin is correct.

The Python tests are run on every check in to the pgAdmin GIT repository. There are nearly 1200 tests (plus the 460 RE-SQL tests), which are run on Windows, macOS and Linux against all supported database servers (PostgreSQL 9.5 through 12, and EDB Postgres Advanced Server 9.5 through 12). This equates to nearly 100,000 Python tests being run on every check in. The exact number isn’t quite that because some tests only run on certain database server versions.

Jasmine Tests

The Jasmine tests are unit tests for the Javascript code used in the frontend user interface of pgAdmin, named after the framework that is used to run them. At the time of writing, there are 866 Javascript unit tests, each of which is run on Windows, macOS and Linux. That means that 2,598 Javascript unit tests are run on every check in.

Feature Tests

Feature tests are “full system” tests in which an instance of pgAdmin is started and loaded into a browser, and then inputs are automatically given to the browser to simulate end-to-end operation of the application, with the displayed output then being verified. The feature tests are run on a nightly basis as they take a long time to execute. They are run using Selenoid using Google Chrome as the browser (we plan to add Firefox in the future), and run against all supported database server versions.

There are only around 15 feature tests in the suite (which actually runs from the Python test framework), however each test may actually verify anything from 1 to 20 or more individual pieces of functionality.

Linting

Various tasks run as part of the automated tests to check the code formatting, including Javascript linting and Python PEP-8 validation.

Automated Builds

Test Builds

Test builds of all packages (Docker, tarballs, docs, RPMs, DEBs, installers etc) are made with every check in to the GIT repository. These are discarded after the build completes as their primary purpose is to verify that all the code builds without error on every platform, and that packages can be successfully generated.

Snapshot Builds

Snapshot builds are created on a nightly basis and made available for testing and use by anyone that is interested in contributing or needs a fix in a hurry.

QA Builds

QA builds are produced on the Monday before one of the monthly releases. They are uploaded to a server from where Fahar will download and smoke test them to ensure there are no obvious show-stopping issues. These builds will also be tested to ensure they are carrying the latest version of certain dependencies, such as the OpenSSL libraries.

Static Code Analysis

EDB maintains a SonarQube server for performing static code analysis of all the product code within the company, including the Open Source pgAdmin code. This allows us to monitor the state of a number of key metrics that can be detected through analysis:

Bugs
Security vulnerabilities
Security hotspots (code that should be reviewed as it’s a possible source of a vulnerability)
Code smells (code that looks badly structured or implemented)
Code duplications

The current state of the pgAdmin code can be seen below:

Note that the sharp rise on July 19th was the result of a SonarQube upgrade that added/changed a number of the analysis rules.

We’re in the process of working through the different issues, with focus currently on code smells (mostly overly complex functions that need to be refactored) and security hotspots (mostly regexps in the code). Each of the issues reported may or may not involve a code change - at the very least they will be reviewed and marked as “Won’t fix”.

Developer Testing

All the developers are required to test patches before they are submitted for inclusion in pgAdmin. This includes running all relevant existing tests (there’s no point in running Jasmine tests for a change that only touches Python code for example), and linting tests (e.g. the PEP-8 checks). They are also expected to include suitable tests (or updates to tests) to validate changes they’ve made.

Many of the developers also use static code analysis in their development environments, either the tools built directly into products like PyCharm, or personal SonarQube scans.

Statistics

There are various statistics we can look at as some form of code quality metric.

Team

The team size is interesting because it gives us an idea of how much each member of the team is responsible for on average.

There are currently 6 people working on pgAdmin full time. In addition, we have a number of people at EDB who are primarily focussed on Postgres Enterprise Manager, which as it is based on pgAdmin, occasionally means that that team will contribute to pgAdmin as well. We also have a number of community contributors who submit the occasional patch.

Counting only the full time developers, that’s 19.8 not-yet-fixed bugs per developer.

Users

The number of users is interesting because the bigger the user base, the more likely we are to get issues reported.

Unfortunately, we don’t know how many users we have because pgAdmin is freely available. We do know that Docker is currently reporting over 50 million pulls of the official pgAdmin container, and that’s almost certainly not where most people get their copy of pgAdmin from.

Counting only the Docker users, and assuming each one of them has pulled every one of the 34 releases since the first one uploaded there, that's 1.47 million users (yeah, OK, we definitely don’t have that many users, but still…). That works out at 0.000081 not-yet-fixed bugs per user, though arguably this number is also pretty meaningless. A better one might be the number of bugs reported over the lifetime of the codebase (which is 1856, including issues in new features that ended up being fixed prior to release), which gives us 0.001263 reported and verified bugs per user.

Code

Looking at the code, we see the following stats:

dpage@hal:~/git/pgadmin4$ perl ~/bin/cloc.pl .

3650 text files.

3148 unique files.

1115 files ignored.

https://github.com/AlDanial/cloc v 1.66 T=12.18 s (217.1 files/s, 27242.3 lines/s)

--------------------------------------------------------------------

Language files blank comment code

--------------------------------------------------------------------

Python 714 21204 30011 92424

JavaScript 285 10216 9984 75844

SQL 1384 5383 2889 38516

JSON 108 31 0 24752

SASS 34 995 16 5210

make 2 111 37 3717

CSS 39 376 66 2012

HTML 28 42 20 1548

XML 13 0 0 1208

C++ 9 342 247 1177

Bourne Shell 12 298 204 1096

Qt 3 0 0 945

DOS Batch 1 85 8 284

C/C++ Header 10 94 107 244

Qt Project 1 25 7 99

Objective C++ 1 2 11 14

Windows Resource File 1 0 0 1

--------------------------------------------------------------------

SUM: 2645 39204 43607 249091

--------------------------------------------------------------------

So, there are a shade under a quarter of a million lines of code in pgAdmin, which equates to one not-yet-fixed bug per 2,093 lines of code. By way of comparison, this Stackoverflow answer cites a book that in turn cites a number of studies into industry average bug rates, which are believed to be in the order of 1-25 bugs per 1000 lines of code.

Conclusion

pgAdmin has a relatively small number of developers, but a large amount of code and number of users. From the data we have, we can assume that with such a large user base, bugs are more likely to be reported than if there was a user base of tens or hundreds of users. Despite this, the number of issues per 1000 lines of code seems to be very low in comparison to industry studies.

Despite the huge amount of effort that goes into automated testing and builds, there will always be bugs. We rely on users to give us feedback when they find issues, and aim to fix everything as soon as possible, prioritising based on impact and likelihood of any given user running into each issue. We also try to balance time spent on bug fixes with new features and housekeeping tasks (aimed at keeping the code maintainable).

None of this would be possible without the automation in use for building and testing the code, or without the hard working developers (including Fahar) and users that report and help them fix issues, to whom I owe a huge debt of gratitude.

Thursday 25 June 2020

system_stats extension for PostgreSQL

All functions are restricted for use by superusers and members of the monitor_system_stats role due to the potentially sensitive nature of the information they make available. To grant a user permission to use the extension, grant them that role. For example:

GRANT monitor_system_stats TO nagios;

The extension is currently supported on Linux, macOS and Windows, though not all statistics are available or relevant on all platforms, so you may see some NULL values being returned.

Pull Requests for additional platform support, additional functions, or bug fixes are welcome, as are bug reports.

You can find the code and a source release on Github; please see the README for more information. Devrim Gündüz has also made RPMs available on yum.postgresql.org.

Wednesday 10 July 2019

Reverse Proxying to pgAdmin

Reverse proxying requests to a pgAdmin server is becoming more and more popular if posts to the mailing lists are to be taken as an indicative measure; more often than not when using pgAdmin in a container (of which there have now been over 10 million pulls)! Typically users will deploy a reverse proxy for a couple of reasons; to host multiple applications in different subdirectories under the same domain, or to add SSL/TLS support independently of the application.

Because of the number of questions asked, I spent a little time over the last couple of days doing some testing and updating the documentation with some examples. Here's a blog-ified version of that work.

Nginx

Nginx is winning the battle of the web servers these days, beating out Lighttpd (which is still a fine piece of software) and the ageing and arguably bloated Apache HTTPD. All of these servers support reverse proxying, and whilst I've looked at Nginx, the examples shown below can easily be translated to the other servers if you prefer to run them instead.

In the following examples, we have pgAdmin running in a Docker container (in which it's hosted under Gunicorn). For simplicity, the examples have Nginx running on the host machine, but it can also be easily run in another container, sharing a Docker network with pgAdmin. In such a configuration there is no need to map the pgAdmin container port to the host.

The container is launched as shown below. See the documentation for information on other useful environment variables you can set and paths you can map.

The commands below will pull the latest version of the container from the repository, and run it with port 5050 on the host mapped to port 80 on the container. It will set the default username and password to user@domain.com and SuperSecret respectively.

docker pull dpage/pgadmin4
docker run -p 5050:80 \
    -e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
    -e "PGADMIN_DEFAULT_PASSWORD=SuperSecret" \
    -d dpage/pgadmin4

A simple configuration to reverse proxy with Nginx to pgAdmin at the root directory looks like this:

server {
    listen 80;
    server_name _;

    location / {
        proxy_set_header Host $host;
        proxy_pass http://localhost:5050/;
        proxy_redirect off;
    }
}

Here we tell Nginx to listen on port 80, and respond to any server name (sent by the client in the Host header). We then specify that all requests under the root directory are proxied back to port 5050 on the local host, and that the Host header is passed along as well. The proxy_redirect option tells the server not to rewrite the Location header.

But what if we want to host pgAdmin under a subdirectory, say /pgadmin4? In this case we need to change the path at the top of the location block and add the X-Script-Name header to the requests made to the pgAdmin container to tell it what subdirectory it's hosted under. This is shown below:

server {
    listen 80;
    server_name _;

    location /pgadmin4/ {
        proxy_set_header X-Script-Name /pgadmin4;
        proxy_set_header Host $host;
        proxy_pass http://localhost:5050/;
        proxy_redirect off;
    }
}

OK, so that's cool but we're talking about super top secret database stuffs here. It needs to be encrypted! Adding SSL/TLS support to the configuration is largely unrelated to pgAdmin itself, except that as with the subdirectory, we need to tell it the URL scheme (http or https) to use. We do this by setting the X-Scheme header. The other changes are to add a redirect from http to https, and to configure SSL/TLS:

server {
    listen 80;
    return 301 https://$host$request_uri;
}

server {
    listen 443;
    server_name _;

    ssl_certificate /etc/nginx/server.crt;
    ssl_certificate_key /etc/nginx/server.key;

    ssl on;
    ssl_session_cache builtin:1000 shared:SSL:10m;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
    ssl_prefer_server_ciphers on;

    location /pgadmin4/ {
        proxy_set_header X-Script-Name /pgadmin4;
        proxy_set_header X-Scheme $scheme;
        proxy_set_header Host $host;
        proxy_pass http://localhost:5050/;
        proxy_redirect off;
    }
}

Traefik

Traefik is becoming increasingly popular in containerised environments because it's designed to integrate with the orchestration platform being used and auto-configure itself as much as possible. The examples below show it working with plain Docker, but it will work with Swarm, Compose, Kubernetes and other orchestrators as well. Note that Traefik is designed purely for reverse proxying, routing and load balancing; it's not a general purpose webserver.

In the examples below, the following Traefik configuration is used. Because it auto-configures itself, the changes we make to get the desired configuration are to the way the pgAdmin container is deployed, not to the Traefik configuration:

defaultEntryPoints = ["http", "https"]

[entryPoints]
  [entryPoints.http]
    address = ":80"
      [entryPoints.http.redirect]
        entryPoint = "https"
  [entryPoints.https]
    address = ":443"
      [entryPoints.https.tls]

[docker]
domain = "domain_name"
watch = true

With this configuration, Traefik will automatically detect containers as they are launched, and reverse proxy to them using the virtual hostname generated from the container name and the domain in its config file, e.g. <container_name>.<domain>. SSL/TLS is enabled in this setup, with a redirect from plain http to https. The certificate used will be the default one built into Traefik; see the documentation for details on how Let's Encrypt or certificates from other issuers can be used.

To host pgAdmin at the root directory, we simply launch a container with the correct name, and no host to container port mapping:

docker pull dpage/pgadmin4
docker run --name "pgadmin4" \
    -e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
    -e "PGADMIN_DEFAULT_PASSWORD=SuperSecret" \
    -d dpage/pgadmin4

With the configuration and commands above, Traefik will host pgAdmin at https://pgadmin4.domain_name/. Of course, the domain name should be changed to a real one, and a suitable CNAME, A or AAAA record should be added to the DNS zone file.

In order to host pgAdmin under a subdirectory, as in the Nginx example we need to tell both the proxy server and pgAdmin about the subdirectory. We tell pgAdmin by setting the SCRIPT_NAME environment variable, and we tell Traefik by adding a label to the container instance. For example:

docker pull dpage/pgadmin4
docker run --name "pgadmin4" \
    -e "PGADMIN_DEFAULT_EMAIL=user@domain.com" \
    -e "PGADMIN_DEFAULT_PASSWORD=SuperSecret" \
    -e "SCRIPT_NAME=/pgadmin4" \
    -l "traefik.frontend.rule=PathPrefix:/pgadmin4" \
    -d dpage/pgadmin4

Conclusion

Users are using reverse proxy servers to provide an interface between their clients and the pgAdmin server. These can be more traditional servers such as Nginx, or purpose designed reverse proxy servers such as Traefik. In either case, it's simple to configure SSL/TLS support or to host pgAdmin in a subdirectory.