Tuesday, 25 August 2020

Testing pgAdmin

Like any software, pgAdmin has bugs. At the time of writing there are 119 new or in-progress issues in the bug tracker, which is pretty good for a piece of software with so many moving parts, and regular-as-clockwork monthly releases. Over 10% of these issues are currently assigned to the reporter to gather further information, plus there will undoubtedly be some duplicates and support issues/feature requests in that number that haven’t yet been picked up in our weekly backlog refinement meeting and either closed or moved to the feature or support trackers.

Manual Testing

We have one person, Fahar, who is permanently assigned as QA to the pgAdmin team at EDB, though he is also responsible for a number of other product releases such as the PostgreSQL installers for Windows and macOS (pgAdmin is the vast majority of his work however).


Fahar primarily does four things:


  • Works to verify that issues reported to users are reproducible, when they require complex setup or the developers have failed to reproduce them.

  • Verifies that bugs that have been fixed by the developers actually do fix the reported problem. 

  • Smoke tests release builds; that is, ensures the packages can be installed and basic functionality works prior to public release.

  • Tests new features once development is believed to be complete.

Automation

All pgAdmin tests and builds are automated using Jenkins. The system will notify the development team of any failures via email and via a dedicated Slack channel (internal to EDB). There are also large screens mounted on the wall in development offices to show the status of the test and build runs in real time.

Automated Testing

We have three primary automated test suites for pgAdmin, which are essential in allowing us to release quickly and regularly, with minimal issues.

Python Tests

The Python tests (sometimes referred to as the API tests) are intended to test the Python application that implements the pgAdmin server. The test suite will instantiate the server and then make hundreds of calls to the server with different inputs to ensure itr responds as expected. A small number of these tests operate directly on the Python code, and do not run through the API (e.g. for testing server import/export).


One of the Python tests is the RE-SQL (Reverse Engineered SQL) test. This is a special case as it’s a single test which is actually comprised of around 460 sub-tests, designed to test the SQL generation in pgAdmin; each sub-test will create or modify an object, and then validate that the SQL generated by pgAdmin is correct.


The Python tests are run on every check in to the pgAdmin GIT repository. There are nearly 1200 tests (plus the 460 RE-SQL tests), which are run on Windows, macOS and Linux against all supported database servers (PostgreSQL 9.5 through 12, and EDB Postgres Advanced Server 9.5 through 12). This equates to nearly 100,000 Python tests being run on every check in. The exact number isn’t quite that because some tests only run on certain database server versions.

Jasmine Tests

The Jasmine tests are unit tests for the Javascript code used in the frontend user interface of pgAdmin, named after the framework that is used to run them. At the time of writing, there are 866 Javascript unit tests, each of which is run on Windows, macOS and Linux. That means that 2,598 Javascript unit tests are run on every check in.

Feature Tests

Feature tests are “full system” tests in which an instance of pgAdmin is started and loaded into a browser, and then inputs are automatically given to the browser to simulate end-to-end operation of the application, with the displayed output then being verified. The feature tests are run on a nightly basis as they take a long time to execute. They are run using Selenoid using Google Chrome as the browser (we plan to add Firefox in the future), and run against all supported database server versions.


There are only around 15 feature tests in the suite (which actually runs from the Python test framework), however each test may actually verify anything from 1 to 20 or more individual pieces of functionality.

Linting

Various tasks run as part of the automated tests to check the code formatting, including Javascript linting and Python PEP-8 validation.

Automated Builds

Test Builds

Test builds of all packages (Docker, tarballs, docs, RPMs, DEBs, installers etc) are made with every check in to the GIT repository. These are discarded after the build completes as their primary purpose is to verify that all the code builds without error on every platform, and that packages can be successfully generated.

Snapshot Builds

Snapshot builds are created on a nightly basis and made available for testing and use by anyone that is interested in contributing or needs a fix in a hurry.

QA Builds

QA builds are produced on the Monday before one of the monthly releases. They are uploaded to a server from where Fahar will download and smoke test them to ensure there are no obvious show-stopping issues. These builds will also be tested to ensure they are carrying the latest version of certain dependencies, such as the OpenSSL libraries.

Static Code Analysis

EDB maintains a SonarQube server for performing static code analysis of all the product code within the company, including the Open Source pgAdmin code. This allows us to monitor the state of a number of key metrics that can be detected through analysis:


  • Bugs

  • Security vulnerabilities

  • Security hotspots (code that should be reviewed as it’s a possible source of a vulnerability)

  • Code smells (code that looks badly structured or implemented)

  • Code duplications


The current state of the pgAdmin code can be seen below:



Note that the sharp rise on July 19th was the result of a SonarQube upgrade that added/changed a number of the analysis rules.


We’re in the process of working through the different issues, with focus currently on code smells (mostly overly complex functions that need to be refactored) and security hotspots (mostly regexps in the code). Each of the issues reported may or may not involve a code change - at the very least they will be reviewed and marked as “Won’t fix”.

Developer Testing

All the developers are required to test patches before they are submitted for inclusion in pgAdmin. This includes running all relevant existing tests (there’s no point in running Jasmine tests for a change that only touches Python code for example), and linting tests (e.g. the PEP-8 checks). They are also expected to include suitable tests (or updates to tests) to validate changes they’ve made.


Many of the developers also use static code analysis in their development environments, either the tools built directly into products like PyCharm, or personal SonarQube scans.

Statistics

There are various statistics we can look at as some form of code quality metric.

Team

The team size is interesting because it gives us an idea of how much each member of the team is responsible for on average.


There are currently 6 people working on pgAdmin full time. In addition, we have a number of people at EDB who are primarily focussed on Postgres Enterprise Manager, which as it is based on pgAdmin, occasionally means that that team will contribute to pgAdmin as well. We also have a number of community contributors who submit the occasional patch.


Counting only the full time developers, that’s 19.8 not-yet-fixed bugs per developer.

Users

The number of users is interesting because the bigger the user base, the more likely we are to get issues reported.


Unfortunately, we don’t know how many users we have because pgAdmin is freely available. We do know that Docker is currently reporting over 50 million pulls of the official pgAdmin container, and that’s almost certainly not where most people get their copy of pgAdmin from.


Counting only the Docker users, and assuming each one of them has pulled every one of the 34 releases since the first one uploaded there, that's 1.47 million users (yeah, OK, we definitely don’t have that many users, but still…). That works out at 0.000081 not-yet-fixed bugs per user, though arguably this number is also pretty meaningless. A better one might be the number of bugs reported over the lifetime of the codebase (which is 1856, including issues in new features that ended up being fixed prior to release), which gives us 0.001263 reported and verified bugs per user.

Code

Looking at the code, we see the following stats:


dpage@hal:~/git/pgadmin4$ perl ~/bin/cloc.pl .

    3650 text files.

    3148 unique files.                                          

    1115 files ignored.


https://github.com/AlDanial/cloc v 1.66  T=12.18 s (217.1 files/s, 27242.3 lines/s)

--------------------------------------------------------------------

Language          files          blank        comment           code

--------------------------------------------------------------------

Python              714          21204          30011          92424

JavaScript          285          10216           9984          75844

SQL                1384           5383           2889          38516

JSON                108             31              0          24752

SASS                 34            995             16           5210

make                  2            111             37           3717

CSS                  39            376             66           2012

HTML                 28             42             20           1548

XML                  13              0              0           1208

C++                   9            342            247           1177

Bourne Shell         12            298            204           1096

Qt                    3              0              0            945

DOS Batch             1             85              8            284

C/C++ Header         10             94            107            244

Qt Project            1             25              7             99

Objective C++         1              2             11             14

Windows Resource File 1              0              0              1

--------------------------------------------------------------------

SUM:               2645          39204          43607         249091

--------------------------------------------------------------------


So, there are a shade under a quarter of a million lines of code in pgAdmin, which equates to one not-yet-fixed bug per 2,093 lines of code. By way of comparison, this Stackoverflow answer cites a book that in turn cites a number of studies into industry average bug rates, which are believed to be in the order of 1-25 bugs per 1000 lines of code.

Conclusion

pgAdmin has a relatively small number of developers, but a large amount of code and number of users. From the data we have, we can assume that with such a large user base, bugs are more likely to be reported than if there was a user base of tens or hundreds of users. Despite this, the number of issues per 1000 lines of code seems to be very low in comparison to industry studies.


Despite the huge amount of effort that goes into automated testing and builds, there will always be bugs. We rely on users to give us feedback when they find issues, and aim to fix everything as soon as possible, prioritising based on impact and likelihood of any given user running into each issue. We also try to balance time spent on bug fixes with new features and housekeeping tasks (aimed at keeping the code maintainable).


None of this would be possible without the automation in use for building and testing the code, or without the hard working developers (including Fahar) and users that report and help them fix issues, to whom I owe a huge debt of gratitude.


No comments: