Ora2PG now supports oracle_fdw to increase the data migration speed

It has been 20 Years since i have been maintaining the Ora2Pg project, an Open Source software for Oracle to PostgreSQL migrations. The first version of Ora2Pg released on 9th May, 2001. Since then, there have been several features related to schema conversions and data migrations. Over a period of time, i have witnessed several tens of thousands of migrations using Ora2Pg that also increased the need of several optimizations. One of such features is a faster data migration by copying multiple tables in parallel followed by parallel jobs to migrate each table. To optimize it to the best speed possible, the version 22.0 of Ora2Pg now supports the Foreign Data Wrapper, oracle_fdw, to increase the data migration speed. This is particularly useful for tables with BLOB because data needs a transformation to bytea that was known to be slow with Ora2Pg and faster with the C implementation in oracle_fdw.

Installation

In order to use Ora2Pg with oracle_fdw, you need to install both Ora2Pg and oracle_fdw. Oracle_fdw has to be installed on the PostgreSQL server to which you are migrating the data to. Ora2Pg can be installed on the PostgreSQL server or any intermediate server between Oracle and PostgreSQL servers. To install the oracle_fdw extension, please following these Installation steps. To install Ora2Pg, please following the steps in this Installation manual of the official Ora2Pg documentation.

Ora2Pg Configuration

Upon installing the Oracle instant client, the ORACLE_HOME has been set as following in the ora2pg.conf configuration file.

ORACLE_HOME    /u01/app/instantclient_12_2
Connection to Oracle

In this example, I have an Oracle 18c instance running on the host : 192.168.1.37, with a portable database (PDB) called pdb1, where the sample schema : HR has been loaded.

ORACLE_DSN dbi:Oracle:host=192.168.1.37;service_name=pdb1;port=1521
ORACLE_USER HR
ORACLE_PWD hrpwd

Once you have set the Oracle database DSN, you can execute ora2pg to see if it works successfully.

$ ora2pg -t SHOW_VERSION -c config/ora2pg.conf
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0

Getting to this step successfully is the hardest part of implementing Ora2Pg when you are a novice because of the relative complexity of the installation process. Reading the documentation carefully will normally get you there easily.

Configuration before starting Ora2Pg for the Schema and Data Migration

I want to export the content of the Oracle HR schema into the hr schema in PostgreSQL and set the owner of all objects to the user : gilles. I also want to preserve the case of the Oracle object name. For this purpose, i have set the following configuration in the ora2pg.conf file.

SCHEMA       HR
EXPORT_SCHEMA 1
PG_SCHEMA hr
FORCE_OWNER gilles
PRESERVE_CASE 1

Note that Ora2Pg 22.0 needs these settings to be able to use data export through oracle_fdw. Short coming release 22.1 will allow the use of oracle_fdw without preserving case and schema export. UPDATE: Release 22.1 published and available for download on GitHub

Migrating Schema using Ora2Pg

Before starting the data migration, we need to schema to be created in the PostgreSQL database. Ora2Pg can be used to perform the schema migration from Oracle to PostgreSQL. The Oracle schema as extracted using the ORACLE_DSN and the configuration specified in the ora2pg.conf file will be converted to a PostgreSQL specific syntax, with an appropriate data type mapping. In this example, we will just export the tables definition and create the same tables in PostgreSQL. We can create a directory in which the converted PostgreSQL DDL can be stored.

$ mkdir tables
$ ora2pg -t TABLE -p -c config/ora2pg.conf -b tables
[========================>] 7/7 tables (100.0%) end of scanning.
Retrieving table partitioning information…
[========================>] 7/7 tables (100.0%) end of table export.

As seen in the above log, the DDL export converted to PostgreSQL syntax has been created in file tables/output.sql. We can now create the PostgreSQL database named : hr that is owned by the user : gilles, and import the DDL definitions generated in the previous step.

$ createdb hr -O gilles
$ psql -d hr -U gilles -f tables/output.sql
Data Migration using Ora2Pg

Firstly, configure Ora2Pg to perform data migration on the fly instead of dumping the Oracle data to files followed by importing them using psql. To enable this behavior of direct data migration, we need to set the PG_DSN (PostgreSQL data source) to connect to PostgreSQL.

PG_DSN          dbi:Pg:dbname=hr;host=localhost;port=5432
PG_USER gilles PG_PWD gilles

You need the Perl module DBD::Pg be installed, all Linux distributions have their own package for this library, otherwise you can find the sources on CPAN.

We can then proceed with the data migration using the following step as an example :

$ ora2pg -t COPY -c config/ora2pg.conf 
[========================>] 7/7 tables (100.0%) end of scanning.
[========================>] 25/25 rows (100.0%) Table COUNTRIES (25 recs/sec)
[==> ] 25/215 total rows (11.6%) - (1 sec., avg: 25 recs/sec).
[========================>] 27/27 rows (100.0%) Table DEPARTMENTS (27 recs/sec)
[=====> ] 52/215 total rows (24.2%) - (1 sec., avg: 52 recs/sec).
[========================>] 107/107 rows (100.0%) Table EMPLOYEES (107 recs/sec)
[=================> ] 159/215 total rows (74.0%) - (1 sec., avg: 159 recs/sec).
[========================>] 19/19 rows (100.0%) Table JOBS (19 recs/sec)
[===================> ] 178/215 total rows (82.8%) - (2 sec., avg: 89 recs/sec).
[========================>] 10/10 rows (100.0%) Table JOB_HISTORY (10 recs/sec)
[====================> ] 188/215 total rows (87.4%) - (2 sec., avg: 94 recs/sec).
[========================>] 23/23 rows (100.0%) Table LOCATIONS (23 recs/sec)
[=======================> ] 211/215 total rows (98.1%) - (2 sec., avg: 105 recs/sec).
[========================>] 4/4 rows (100.0%) Table REGIONS (4 recs/sec)
[========================>] 215/215 total rows (100.0%) - (2 sec., avg: 107 recs/sec).
[========================>] 215/215 rows (100.0%) on total estimated data (2 sec., avg: 107 recs/sec)
Data Migration using Ora2Pg with oracle_fdw

This is done just by giving a name for the foreign server to be created. Everything will be automatic based on the configuration we have already set. In ora2pg.conf just set the following configuration -

  FDW_SERVER      orcl

As i have already done the data migration in the previous step, I have instructed Ora2Pg to truncate the PostgreSQL tables before migrating the data.

TRUNCATE_TABLE    1

I have then started the data migration using the FDW_SERVER parameter to enable data migration using oracle_fdw.

$ ora2pg -t COPY -c config/ora2pg.conf
[========================>] 7/7 tables (100.0%) end of scanning.
[========================>] 7/7 tables (100.0%) end of table export
NOTICE: schema "ora2pg_fdw_import" does not exist, skipping
[========================>] 215/215 rows (100.0%) on total estimated data (2 sec., avg: 107 recs/sec)

That’s it.

After the migration, Ora2pg will leave a schema named ora2pg_fdw_import containing all foreign tables created to migrate the data. To cleanup the ora2pg_fdw_import schema and the mapping objects created in it, use the following SQL command.

DROP SCHEMA ora2pg_fdw_import CASCADE ;

And to remove the oracle_fdw extension with foreign server and user mapping created by Ora2Pg, we can use the following command.

DROP EXTENSION oracle_fdw CASCADE ;
Data Migration Performance

The timings reported in the above examples are not representative of a real data migration speed because of the number of rows and the systems hosting the databases. Let us consider a table with more data, so that we can see the timing differences.

Taking a table of size : 30GB with 100,000,000 rows with a geometry column thing is a bit different. With a data export with oracle_fdw through a single process we obtain the following results.

$ ora2pg -t COPY -c config/ora2pg.conf -a TABLE_TEST
[========================>] 1/1 tables (100.0%) end of scanning.
NOTICE: user mapping for "gilles" already exists for server "orcl", skipping
[========================>] 1/1 tables (100.0%) end of table export.
NOTICE: drop cascades to 7 other objects
DETAIL: drop cascades to foreign table ora2pg_fdw_import."COUNTRIES"
drop cascades to foreign table ora2pg_fdw_import."DEPARTMENTS"
drop cascades to foreign table ora2pg_fdw_import."EMPLOYEES"
drop cascades to foreign table ora2pg_fdw_import."JOBS"
drop cascades to foreign table ora2pg_fdw_import."JOB_HISTORY"
drop cascades to foreign table ora2pg_fdw_import."LOCATIONS"
drop cascades to foreign table ora2pg_fdw_import."REGIONS"
[========================>] 100000000/100000000 rows (100.0%) on total estimated data (4183 sec., avg: 23906 recs/sec)

23,900 tuples per seconds is not so bad for an export from an Oracle database hosted on a VirtualBox machine with 4 cores and 6 GB of memory but this only uses a single process to export the data.

Let’s try to use 4 cores to export the data and see the performance. This test table does not have a primary key or a unique index so we need to give one column to Ora2Pg that can be used to balance the data export on the 4 processes. I have to edit ora2pg.conf to give a numeric column with unique numbers as follow :

DEFINED_PK      TABLE_TEST:ACQTN_SEQ_NO

Column ACQTN_SEQ_NO of table TABLE_TEST is filled by a sequence value, so we can expect a good re-partition of the data based on modulo, even if there is no unique constraint.

Here are the new results using Oracle data extraction on 4 connections (-J 4) :

$ ora2pg -t COPY -c config/ora2pg.conf -a TABLE_TEST -J 4
[========================>] 1/1 tables (100.0%) end of scanning.
NOTICE: user mapping for "gilles" already exists for server "orcl", skipping
[========================>] 1/1 tables (100.0%) end of table export.
NOTICE: drop cascades to foreign table ora2pg_fdw_import."TABLE_TEST"
[========================>] 100000000/100000000 total rows (100.0%) - (1250 sec., avg: 80000 recs/sec), TABLE_TEST
[========================>] 100000000/100000000 total rows (100.0%) - (1250 sec., avg: 80000 recs/sec), TABLE_TEST
[========================>] 100000000/100000000 total rows (100.0%) - (1250 sec., avg: 80000 recs/sec), TABLE_TEST
[========================>] 100000000/100000000 total rows (100.0%) - (1250 sec., avg: 80000 recs/sec), TABLE_TEST
[========================>] 100000000/100000000 rows (100.0%) on total estimated data (1250 sec., avg: 80000 tuples/sec)

So 80,000 tuples per seconds is far better. Of course with better hardware and 16 or 32 cores we could have a higher throughput.

Running the export the same way but without the use of oracle_fdw gives the following results.

ora2pg -t COPY -c config/ora2pg.conf -a TABLE_TEST -J 4 -j 2
[========================>] 1/1 tables (100.0%) end of scanning.
[======> ] 25000000/100000000 rows (25.0%) Table TABLE_TEST-part-0 (3032 sec., 8245 recs/sec)
[======> ] 25000000/100000000 rows (25.0%) Table TABLE_TEST-part-2 (3031 sec., 8248 recs/sec)
[======> ] 25000000/100000000 rows (25.0%) Table TABLE_TEST-part-3 (3031 sec., 8248 recs/sec)
[======> ] 25000000/100000000 rows (25.0%) Table TABLE_TEST-part-1 (3031 sec., 8248 recs/sec)
[========================>] 100000000/100000000 rows (100.0%) Table TABLE_TEST (3032 sec., 32981 recs/sec)
[========================>] 100000000/100000000 rows (100.0%) on total estimated data (3033 sec., avg: 32970 tuples/sec)

Ora2Pg parallel migration using 4 jobs but without oracle_fdw, migrates on an average of 32,970 tuples per second compared to 80,000 tuples/sec with oracle_fdw. So the performance gain is more than double.

Ora2PG Data Migration performance with Oracle_FDW

Conclusion

As seen in the above chart, the performance on Ora2Pg with oracle_fdw can be more than double when compared with Ora2Pg without oracle_fdw. To use this feature, you must have the latest version of Ora2Pg installed. Also, please note that this feature may not be available for you when you are migrating data to a PostgreSQL database that does not support oracle_fw. For example, Amazon RDS/Aurora and some more cloud providers do not offer oracle_fdw as a supported extension. In such a case, please contact MigOps to know how we can help you optimize the data migration speed while migrating to Cloud.

Looking for support on Migrations to PostgreSQL ? Please fill the following form.

2 thoughts on “Ora2PG now supports oracle_fdw to increase the data migration speed”

  1. Anthony Nowocien

    Hi Gilles,
    congratulations on this very nice addition and your continuous support during all those years. We’ve used this approach a couple of times to improve performance whenever very large tables were involved and it’s good to see it integrated.
    Best regards.

  2. Alexandre Arruda

    Hi Gilles,

    I want to thank you for 20 years of dedication in this amazing tool.
    With oracle_fdw something wich was good, became awesome !
    Best regards,

    Alexandre

Leave a Comment

Your email address will not be published.

Scroll to Top