How difficult is to create a database software like PostgreSQL ?

Recently on twitter, i have seen a poll by Qovery asking how hard is it to create a database like PostgreSQL. With no surprise, people mostly voted it as "Very hard". However, the word "like" in this question is very ambiguous, as rightly said in one of the comments by Dave Cramer, who is one of the very active contributors and committers of PostgreSQL JDBC driver. Anyways, this ambiguity resulted into an idea of writing an article on this topic by breaking it into 2 questions and discuss them further. PostgreSQL

  1. How difficult is to install and create a PostgreSQL database.
  2. How difficult is it to create an Open Source database software like PostgreSQL ?

Please do not forget to take our Survey at the end of this article. This survey is to help us understand what you think about PostgreSQL. Results will be published through another blog post.

I would like to start with the first question here.

Installing and creating a PostgreSQL database

It might be very easy to get inclined towards a DBaaS platform where a database can be created through a few clicks. DBaaS may also be marketed as a database (available as a service) that requires no maintenance and no administration or tuning as it is automatically managed. Is this really true ? Is Postgres offered on DBaaS really PostgreSQL ? Are users paying more than they estimated and getting into a deep vendor lock-in ? Are users loosing the complete visibility into their databases and lacking a lot of features with DBaaS ? To enable automatic management of backups and high availability similar to what DBaaS promises as a managed service, isn't it wise to consider projects like pgBackRest for backups and Patroni for high availability ? Anyways, i would love to answer these questions in a separate article. Meanwhile, I would encourage you to watch the presentation on - Why Public Database as a Service is Prime for Open Source Distribution, by Peter Zaitsev, CEO of Percona.

Through this article, i would encourage users to try installing PostgreSQL and understand the overall architecture, before falling into a marketing trap leading to a vendor lock-in.

Packages or installers of PostgreSQL are available for Linux, macOS, Windows, Solaris and BSD Operating systems as seen in this PostgreSQL documentation. I would like to take an example of how i would install PostgreSQL on CentOS.

Step 1 : On the PostgreSQL downloads page, click on the Operating system family. You should then see the list of distributions (of the OS family) for which the packages are available.

PostgreSQL packages for various Operating Systems

Step 2 : You could then proceed to choose the appropriate PostgreSQL version and then see the exact commands you could copy and paste to finish the installation and setup.

Steps to Install PostgreSQL

These are the steps i have executed on a CentOS 7.8 machine. As you see in the following log, i am able to install and start PostgreSQL in 3 simple steps.

# Install the repository RPM:
$ sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# Install PostgreSQL:
$ sudo yum install -y postgresql13-server

# Optionally initialize the database and enable automatic start:
$ sudo /usr/pgsql-13/bin/postgresql-13-setup initdb
$ sudo systemctl enable postgresql-13
$ sudo systemctl start postgresql-13

To create a database, i could just use the create database command as seen in the following log. See this PostgreSQL documentation for more details on the available configurations while creating a database.

$ psql -c "CREATE DATABASE opensource"

Now, let us discuss about the second question.

How difficult is it to create an Open Source database software like PostgreSQL ?

To let you figure out the answer yourself, let us discuss some interesting facts about PostgreSQL.

PostgreSQL Community

PostgreSQL Community is one of the most advanced Open Source communities in the world. Nobody owns the software (see PostgreSQL license) and no acquisition can put a hold to the community contributions. There exists a rule that no more than 3 members from the same company can be part of the 7 member Core Team. All the communication about features, bug fixes and design changes are all discussed openly through emails. By subscribing to the mailing lists, one may participate in discussions with the developers about their ideas, code contributions, feature requests or bugs.

Postgres is not an Open Source Product from a single Company. It is an Open Source project by a Global Community, spanned across the world. Contributions are also done by developers from various companies such as NTT Data, Fujitsu, Microsoft, VMware, Red Hat and many more companies that are addressing their domain specific challenges. It is not just a source available software but a software open for criticism, open for modifications, open for re-distributions, open for derived software (see postgres derived databases) giving enough freedom to a user.

SQL Compliance

PostgreSQL is the only database among the most popular databases like Oracle, DB2, SQL Server and MySQL to qualify 170 of the 177 SQL standards (ISO/IEC 9075:2016 aka SQL:2016). If you love sticking to standards, PostgreSQL is the best choice.

Some of the PostgreSQL features and its Open Source ecosystem.

Following are a list of features that are most advanced in Postgres when compared with several other databases. By the way, these are all part of the Community PostgreSQL which is Open Source and requires no licensing. Another fact to remember is that there is no Enterprise version of PostgreSQL that is maintained by the Community.

  1. Streaming replication (byte-by-byte replication similar to Oracle) for High Availability and READ-ONLY standby servers. See Streaming Replication in PostgreSQL for more details.
  2. Built-in Logical Backups and Physical Backups.
  3. Enterprise Grade Open Source Backup Tool - pgBackRest for handling Terabytes of databases.
  4. Logical Replication and Logical decoding. See more details about logical replication in this documentation.
  5. Partitioning and Sub-Partitioning. Supports List, Hash and Range partitions.
  6. Foreign Data Wrappers. Native sharding is possible using Partitioning and Foreign Data Wrappers.
  7. Parallel Query. See this article about parallelism in postgres.
  8. Online Table and Index maintenance including Online Table move. See pg_repack for more details.
  9. Faster major version upgrades using pg_upgrade.
  10. Several hundreds of extensions and numerous Open Source tools for an Enterprise-Grade PostgreSQL setup.
PostgreSQL Release Cycle

There usually exists a major release around the months of September or October each year. A minor version release is aimed every quarter. Usually, on the second Tuesday of the second month of each quarter, you would see a minor release for the currently supported major versions. Depending on the impact of the bugs, we may see a minor version released more sooner than expected.

One has the privilege of watching the CommitFest where the patches that are being reviewed and committed and the discussion can be seen.

Migrations to PostgreSQL

To help with migrations to PostgreSQL, contributors like Gilles Darold (Ora2PG, pgtt and orafce), Peter Eisentraut , Laurenz Albe (Oracle_FDW), Pavel Stehule (plpgsql_check and orafce) and more developers have done huge contributions through Open Source extensions and tools. One just need to know that there exists a huge Open Source ecosystem around PostgreSQL to help with database migrations to PostgreSQL without the need of any enterprise licensing.

PostgreSQL Documentation

Being an Author of multiple books on PostgreSQL, i feel happy to share that there is no such book that includes more information than what we see in the beautiful PostgreSQL documentation. PostgreSQL documentation can be a primary source for all the internals of any topic in PostgreSQL. One can simply rely on Postgres docs to install, configure, tune, perform replication and for backups. All credits to Jonathan Katz, Peter Eisentraut, Josh Berkus, Thomas G Lockhart and the entire Community who have been continuously contributing to the PostgreSQL documentation and the website. If you are interested, also see this post by Robert Haas, one of the very active PostgreSQL contributors and a Core Team member, to know the number of commits or the contributions done by each developer in the year 2019.

Conclusion

Well, after reading this article, you might have already understood that it is very difficult to build another database software like PostgreSQL and it may take decades. The Community gets all the credits on making PostgreSQL the world's most loved database. Building a similar team as the PostgreSQL Global Community is almost an impossible task. As this is impossible, it is wise to enjoy the freedom of using PostgreSQL and contribute to PostgreSQL. The collaboration of a global community made it possible for PostgreSQL to become the DBMS of the year in DB engine rankings for 3 times in the last 5 years (2017, 2018 and 2020).

We will update you with the results of this Survey.

Please Contact Us to know more about the robust Open Source solutions available for PostgreSQL and how we can help you migrate from Oracle and other commercial databases to PostgreSQL.

Leave a Comment

Your email address will not be published.

Scroll to Top