Like any software, pgAdmin has bugs. At the time of writing there are 119 new or in-progress issues in the bug tracker, which is pretty good for a piece of software with so many moving parts, and regular-as-clockwork monthly releases. Over 10% of these issues are currently assigned to the reporter to gather further information, plus there will undoubtedly be some duplicates and support issues/feature requests in that number that haven’t yet been picked up in our weekly backlog refinement meeting and either closed or moved to the feature or support trackers.
Manual Testing
We have one person, Fahar, who is permanently assigned as QA to the pgAdmin team at EDB, though he is also responsible for a number of other product releases such as the PostgreSQL installers for Windows and macOS (pgAdmin is the vast majority of his work however).
Fahar primarily does four things:
Works to verify that issues reported to users are reproducible, when they require complex setup or the developers have failed to reproduce them.
Verifies that bugs that have been fixed by the developers actually do fix the reported problem.
Smoke tests release builds; that is, ensures the packages can be installed and basic functionality works prior to public release.
Tests new features once development is believed to be complete.
Automation
All pgAdmin tests and builds are automated using Jenkins. The system will notify the development team of any failures via email and via a dedicated Slack channel (internal to EDB). There are also large screens mounted on the wall in development offices to show the status of the test and build runs in real time.
Automated Testing
We have three primary automated test suites for pgAdmin, which are essential in allowing us to release quickly and regularly, with minimal issues.
Python Tests
The Python tests (sometimes referred to as the API tests) are intended to test the Python application that implements the pgAdmin server. The test suite will instantiate the server and then make hundreds of calls to the server with different inputs to ensure itr responds as expected. A small number of these tests operate directly on the Python code, and do not run through the API (e.g. for testing server import/export).
One of the Python tests is the RE-SQL (Reverse Engineered SQL) test. This is a special case as it’s a single test which is actually comprised of around 460 sub-tests, designed to test the SQL generation in pgAdmin; each sub-test will create or modify an object, and then validate that the SQL generated by pgAdmin is correct.
The Python tests are run on every check in to the pgAdmin GIT repository. There are nearly 1200 tests (plus the 460 RE-SQL tests), which are run on Windows, macOS and Linux against all supported database servers (PostgreSQL 9.5 through 12, and EDB Postgres Advanced Server 9.5 through 12). This equates to nearly 100,000 Python tests being run on every check in. The exact number isn’t quite that because some tests only run on certain database server versions.
Jasmine Tests
The Jasmine tests are unit tests for the Javascript code used in the frontend user interface of pgAdmin, named after the framework that is used to run them. At the time of writing, there are 866 Javascript unit tests, each of which is run on Windows, macOS and Linux. That means that 2,598 Javascript unit tests are run on every check in.
Feature Tests
Feature tests are “full system” tests in which an instance of pgAdmin is started and loaded into a browser, and then inputs are automatically given to the browser to simulate end-to-end operation of the application, with the displayed output then being verified. The feature tests are run on a nightly basis as they take a long time to execute. They are run using Selenoid using Google Chrome as the browser (we plan to add Firefox in the future), and run against all supported database server versions.
There are only around 15 feature tests in the suite (which actually runs from the Python test framework), however each test may actually verify anything from 1 to 20 or more individual pieces of functionality.
Linting
Various tasks run as part of the automated tests to check the code formatting, including Javascript linting and Python PEP-8 validation.
Automated Builds
Test Builds
Test builds of all packages (Docker, tarballs, docs, RPMs, DEBs, installers etc) are made with every check in to the GIT repository. These are discarded after the build completes as their primary purpose is to verify that all the code builds without error on every platform, and that packages can be successfully generated.
Snapshot Builds
Snapshot builds are created on a nightly basis and made available for testing and use by anyone that is interested in contributing or needs a fix in a hurry.
QA Builds
QA builds are produced on the Monday before one of the monthly releases. They are uploaded to a server from where Fahar will download and smoke test them to ensure there are no obvious show-stopping issues. These builds will also be tested to ensure they are carrying the latest version of certain dependencies, such as the OpenSSL libraries.
Static Code Analysis
EDB maintains a SonarQube server for performing static code analysis of all the product code within the company, including the Open Source pgAdmin code. This allows us to monitor the state of a number of key metrics that can be detected through analysis:
Bugs
Security vulnerabilities
Security hotspots (code that should be reviewed as it’s a possible source of a vulnerability)
Code smells (code that looks badly structured or implemented)
Code duplications
The current state of the pgAdmin code can be seen below:
Note that the sharp rise on July 19th was the result of a SonarQube upgrade that added/changed a number of the analysis rules.
We’re in the process of working through the different issues, with focus currently on code smells (mostly overly complex functions that need to be refactored) and security hotspots (mostly regexps in the code). Each of the issues reported may or may not involve a code change - at the very least they will be reviewed and marked as “Won’t fix”.
Developer Testing
All the developers are required to test patches before they are submitted for inclusion in pgAdmin. This includes running all relevant existing tests (there’s no point in running Jasmine tests for a change that only touches Python code for example), and linting tests (e.g. the PEP-8 checks). They are also expected to include suitable tests (or updates to tests) to validate changes they’ve made.
Many of the developers also use static code analysis in their development environments, either the tools built directly into products like PyCharm, or personal SonarQube scans.
Statistics
There are various statistics we can look at as some form of code quality metric.
Team
The team size is interesting because it gives us an idea of how much each member of the team is responsible for on average.
There are currently 6 people working on pgAdmin full time. In addition, we have a number of people at EDB who are primarily focussed on Postgres Enterprise Manager, which as it is based on pgAdmin, occasionally means that that team will contribute to pgAdmin as well. We also have a number of community contributors who submit the occasional patch.
Counting only the full time developers, that’s 19.8 not-yet-fixed bugs per developer.
Users
The number of users is interesting because the bigger the user base, the more likely we are to get issues reported.
Unfortunately, we don’t know how many users we have because pgAdmin is freely available. We do know that Docker is currently reporting over 50 million pulls of the official pgAdmin container, and that’s almost certainly not where most people get their copy of pgAdmin from.
Counting only the Docker users, and assuming each one of them has pulled every one of the 34 releases since the first one uploaded there, that's 1.47 million users (yeah, OK, we definitely don’t have that many users, but still…). That works out at 0.000081 not-yet-fixed bugs per user, though arguably this number is also pretty meaningless. A better one might be the number of bugs reported over the lifetime of the codebase (which is 1856, including issues in new features that ended up being fixed prior to release), which gives us 0.001263 reported and verified bugs per user.
Code
Looking at the code, we see the following stats:
dpage@hal:~/git/pgadmin4$ perl ~/bin/cloc.pl .
3650 text files.
3148 unique files.
1115 files ignored.
https://github.com/AlDanial/cloc v 1.66 T=12.18 s (217.1 files/s, 27242.3 lines/s)
--------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------
Python 714 21204 30011 92424
JavaScript 285 10216 9984 75844
SQL 1384 5383 2889 38516
JSON 108 31 0 24752
SASS 34 995 16 5210
make 2 111 37 3717
CSS 39 376 66 2012
HTML 28 42 20 1548
XML 13 0 0 1208
C++ 9 342 247 1177
Bourne Shell 12 298 204 1096
Qt 3 0 0 945
DOS Batch 1 85 8 284
C/C++ Header 10 94 107 244
Qt Project 1 25 7 99
Objective C++ 1 2 11 14
Windows Resource File 1 0 0 1
--------------------------------------------------------------------
SUM: 2645 39204 43607 249091
--------------------------------------------------------------------
So, there are a shade under a quarter of a million lines of code in pgAdmin, which equates to one not-yet-fixed bug per 2,093 lines of code. By way of comparison, this Stackoverflow answer cites a book that in turn cites a number of studies into industry average bug rates, which are believed to be in the order of 1-25 bugs per 1000 lines of code.
Conclusion
pgAdmin has a relatively small number of developers, but a large amount of code and number of users. From the data we have, we can assume that with such a large user base, bugs are more likely to be reported than if there was a user base of tens or hundreds of users. Despite this, the number of issues per 1000 lines of code seems to be very low in comparison to industry studies.
Despite the huge amount of effort that goes into automated testing and builds, there will always be bugs. We rely on users to give us feedback when they find issues, and aim to fix everything as soon as possible, prioritising based on impact and likelihood of any given user running into each issue. We also try to balance time spent on bug fixes with new features and housekeeping tasks (aimed at keeping the code maintainable).
None of this would be possible without the automation in use for building and testing the code, or without the hard working developers (including Fahar) and users that report and help them fix issues, to whom I owe a huge debt of gratitude.
No comments:
New comments are not allowed.