CHAR(1) to Boolean transformation while migrating to PostgreSQL

PostgreSQL supports a wide variety of data types when compared to a few commercial databases like Oracle. For performance benefits, it may be wise to choose the optimal data types supported in PostgreSQL, instead of choosing the same ones as the source, while migrating from another database. In this article, we are going to discuss one such case where there exists a boolean column type but is stored as CHAR(1) with a CHECK constraint in Oracle. This is because Oracle does not support a boolean datatype at SQL level. We shall discuss how we can perform a seamless transformation of CHAR(1) to boolean datatype while migrating to PostgreSQL.

Let us consider an example of an Oracle table with a CHAR(1) type and a CHECK constraint as following. Here, we see a table named : order_status that consists of columns : id and order_processed.

-- In Oracle
CREATE TABLE "ORDER_STATUS"
( "ID" NUMBER(4,0),
"ORDER_PROCESSED" CHAR(1),
CHECK (status in ('Y','N', '1', '0', 'J', 'N', 'T', 'F')) ENABLE
)

-- When we insert a valid record, the insert is successful.
SQL> INSERT INTO ORDER_STATUS VALUES (200,'Y');

1 row created.

-- When we insert an invalid record, the insert fails as expected.
SQL> INSERT INTO ORDER_STATUS VALUES (201, 'Z');
INSERT INTO ORDER_STATUS VALUES (201, 'Z')
*
ERROR at line 1:
ORA-02290: check constraint (SYS.SYS_C0012821) violated

Now, the application might use the following query to determine the orders with status set to ‘Y’ or ‘T’ or ‘J’ or ‘1’.

SQL> select ID, ORDER_PROCESSED from ORDER_STATUS WHERE ORDER_PROCESSED IN ('Y','J','T','1');

ID S
---------- -
100 Y
102 J
103 T
106 1

Replicating a similar behavior in PostgreSQL and in other relational databases is not a complex job. We can choose the same datatype in PostgreSQL using the following approach.

To demonstrate the approach, let us create the same table in PostgreSQL and add a CHECK constraint and validate if the behavior looks the same.

-- In PostgreSQL
migops=# CREATE TABLE order_status
(id INT, 
order_processed char(1) check (order_processed in ('Y','N', '1', '0', 'J', 'N', 'T', 'F')));
CREATE TABLE

-- When we insert a valid record, the insert is successful.
migops=# INSERT INTO order_status VALUES (200,'Y');
INSERT 0 1

-- When we insert an invalid record, the insert fails as expected.
migops=# INSERT INTO order_status VALUES (201, 'Z');
ERROR: new row for relation "order_status" violates check constraint "order_status_order_processed_check"
DETAIL: Failing row contains (201, Z).

Now, let us execute the same application query and see the results.

migops=# select ID, ORDER_PROCESSED from ORDER_STATUS where ORDER_PROCESSED IN ('Y','J','T','1');
id | status 
-----+--------
200 | Y
100 | Y
102 | J
103 | T
106 | 1
(5 rows)

With the above example, we notice that the same behavior can be easily replicated in PostgreSQL.

Transformation from char(1) to boolean

Unlike Oracle, PostgreSQL supports the boolean data type, which can be defined at the column level. In the following example, we can see the column : order_processed is set to a boolean datatype. We will be able to insert a list of acceptable values such as ‘yes’, ‘no’, ‘1’,’0’,’true’,’false’,’Y’,’N’,’T’,’F’ ,etc. that are all converted to true or false.

migops=# CREATE TABLE order_status
(id INT, 
order_processed boolean);
CREATE TABLE

migops=# INSERT INTO order_status VALUES (101,'yes');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (102,'no');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (103,'1');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (104,'0');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (105,'true');
INSERT 0 1

While migrating data, we can write a simple transformation that updates all the true character flags as boolean true and the false character flags as false. The application query would then be simplified to the following.

migops=# select * from order_status where order_processed;
id | order_processed 
-----+-----------------
101 | t
103 | t
105 | t
(3 rows)

The above query returns all the records that had the order_processed column values set to ‘yes’,’1’,’true’,’Y’,’T’ or other legal values. This would be the best datatype for the purpose of performance benefits. We would see a gain up to 30% or more in performance when we choose a boolean datatype instead of CHAR(1).

However, let us assume that these character boolean flag values are inserted via an external or a 3rd party service, where we do not have any control over the values. The values can be anything other than a ‘yes’ or  ‘no’ or ‘true’ or ‘false’. For example, when we try to insert the value as ‘J’, we see an error as follows.

migops=# INSERT INTO order_status VALUES (502,'J');
ERROR: invalid input syntax for type boolean: "J"

But the requirement is to still let the application insert this value and also let the query return true only when one of the expected flags is found.

As we all know, PostgreSQL offers many developer friendly features. One of the greatest ones is called "CAST". By using "CAST" on objects, we can apply some kind of transformations during the query execution. To create a "CAST" object that does some kind of data transformation, we need a transformation or an output function. Let us write a simple transformation function for this purpose.

Transformation function for Casting Char to Boolean

The kind of transformation we need here is to convert all the true character flag values as boolean true values. Which means, if the order_processed column has one of it's values as 'Y', 'T', 'J', '1' then treat it as boolean true, otherwise treat it as boolean false. Following output/transformation function that can be used for this purpose.

migops=# CREATE OR REPLACE FUNCTION char_to_bool(char) RETURNS bool AS
$
BEGIN
RETURN $1 IN ('Y','1', 'J', 'T');
END;
$
LANGUAGE PLPGSQL;
CREATE FUNCTION

This function takes one argument, and then returns TRUE if the argument matches with one of the values from ('Y','1', 'J', 'T') as specified in the function.

Now, let us create the CAST object. Here, the CAST(type casting or type conversion) is happening from char to boolean, with the help of the output/transformation function that we created above.

migops=# CREATE CAST (char as bool) WITH FUNCTION char_to_bool(char);
CREATE CAST

To test how type conversion works, let us create the same table but with a CHAR(1) datatype.

migops=# CREATE TABLE order_status
migops-# (id INT, 
migops(# order_processed char(1));
CREATE TABLE
migops=# INSERT INTO order_status VALUES (500,'Y');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (501,'N');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (502,'J');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (503,'T');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (504,'F');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (505,'0');
INSERT 0 1
migops=# INSERT INTO order_status VALUES (506,'1');
INSERT 0 1

Without type casting, it returns an error when we try to select the records that contain the expected values.

migops=# select * from order_status where order_processed ;
ERROR: argument of WHERE must be type boolean, not type character
LINE 1: select * from order_status where order_processed ;
                                         ^

The above behavior is expected as PostgreSQL is unable to identify the cast object that was created. Let us try the same query with an explicit casting and see the results.

migops=# select * from order_status where order_processed::BOOL;
id | order_processed 
-----+-----------------
500 | Y
502 | J
503 | T
506 | 1
(4 rows)

After adding the explicit casting to the order_processed column, we are able to get the expected output.

Implicit Casting to make the tranformation seamless

PostgreSQL also offers us to create implicit cast objects, which get invoked automatically whenever required. Now, let us re-create the same cast object as implicit and see the results.

migops=# DROP CAST (char as bool);
DROP CAST
migops=# CREATE CAST (char as bool) WITH FUNCTION char_to_bool(char) AS IMPLICIT;
CREATE CAST

As you see in the above snippet, while creating the cast object, we created it as implicit by using the 'AS IMPLICIT' clause. Now, let us execute the same query without any explicit casting.

migops=# select * from order_status where order_processed ;
id | order_processed 
-----+-----------------
500 | Y
502 | J
503 | T
506 | 1
(4 rows)

After creating the cast object as implicit cast, PostgreSQL is able to identify the right casting automatically. By leveraging PostgreSQL developer friendly features, we can define our own type conversions, which simplifies the SQL operations.

Please Note : As this type of casting can be a global behavior for all the sql operations, we need to make sure that we know the implications involved. It might even make it difficult to find the application bugs.

If we need to avoid such global behavior, we can rather create our own custom type and then casting the custom type to boolean would make it a wise approach.

CHAR(1) vs Explicit Casting vs Implicit Casting vs Boolean performance difference.

A simple benchmark showed the following TPS with CHAR(1), Type casting and Implicit Type casting. Following are the results observed with a simple benchmark against 100,000 records.

CHAR vs Boolean datatype in PostgreSQL

With an expected performance improvement by using boolean datatype, if your application only inserts a valid boolean, then, it is wise to use the boolean datatype for the benefit of performance.

Please Contact Us to know more about the robust Open Source solutions available for PostgreSQL and how we can help you migrate from Oracle and other commercial databases to PostgreSQL. Ask us about our fastest data migration tool that has been developed in-house for the purpose of data migrations from Oracle to PostgreSQL.

Leave a Comment

Your email address will not be published.

Scroll to Top