Announcing PG_DBMS_JOB in PostgreSQL for Oracle DBMS_JOB compatibility

When you are working on Oracle to PostgreSQL migrations, one of the Oracle packages that cause conversion issues is DBMS_JOB. Traditionally, we used extensions like : pg_agent, pg_cron or more recently pg_timetable for scheduling jobs. All of these tools or extensions use a cronjob like scheduling method which does not give a full compatibility of the features provided by Oracle DBMS_JOB. Translating calls to DBMS_JOB.SUBMIT() into a cron setting is more painful than being impossible when there is an execution interval lesser than a minute. There are several such DBMS_JOB compatibility issues including asynchronous scheduling which is not possible with the already existing extensions in PostgreSQL. Myself and MigOps hate to stay without Open Source solutions. So, we are announcing PG_DBMS_JOB extension for Oracle DBMS_JOB compatibility, released under PostgreSQL License. An interesting point here is that we have included full compatibility of Oracle DBMS_JOB in PG_DBMS_JOB PostgreSQL extension.

Why did we develop the Oracle DBMS_JOB compatibility for PostgreSQL ?

MigOps is started with a motto to eliminate vendor lock-in while migrating from Oracle to PostgreSQL. So, we take a very unique approach while solving migration problems. If we see that there is a feature that is only available in Oracle but not in PostgreSQL, unless possible with a work-around costing a significant developer time, we brain-storm to see if we can develop a solution that is Open Source. For example, one of our customers was looking for compatibility with DBMS_ERRLOG feature in Oracle and thus we developed an extension : pg_dbms_errlog. Similarly, for another customer we developed the extension : credcheck for enforcing username and password rules/policies. This way, not only our customers, but we enable other PostgreSQL users across the world to enjoy the freedom of PostgreSQL.

We generally start database migrations from Oracle to PostgreSQL upon a detailed assessment that is beyond some of the automated tools, to avoid migration errors. Similarly, In a recent migration assessment for one of our customers, I have encountered a use case where DBMS_JOB was used to run asynchronous tasks immediately, without letting the current session to wait until the end of the task. There was no such equivalent feature in PostgreSQL for such a use case. We could use extensions like pg_background or dblink to execute asynchronous tasks, but at the end of the session you need to wait for the result of the asynchronous tasks or else they will get cancelled.

So, we decided to implement the exact equivalent of Oracle DBMS_JOB for PostgreSQL and thus developed this extension : pg_dbms_job.

The new pg_dbms_job PostgreSQL extension provides full compatibility with the DBMS_JOB Oracle module. It implements the same behavior and provide the same API as Oracle DBMS_JOB. It allows to manage scheduled jobs from a job queue and allows to execute tasks asynchronously. A job definition in pg_dbms_job consists of - a code to execute, the date of its execution and how often the job has to be executed. A job could be an SQL command, a plpgsql code or an existing stored procedure.

Architecture of pg_dbms_job

This extension executes an SQL script, that creates all the objects needed by this extension and a daemon that must be attached to the database where jobs are defined. The daemon is responsible for executing the asynchronous jobs that are queued and also the jobs that are scheduled. It can be executed on the same host as the PostgreSQL database server where the jobs are defined or on any other remote host. This enables the possibility to use pg_dbms_job on DBaaS platforms like Amazon RDS, Aurora, Google Cloud SQL for PostgreSQL and Microsoft Azure Database for PostgreSQL.

The schedule time is taken from the database host but not from where the daemon is running.

PG_DBMS_JOB in PostgreSQL for Oracle DBMS_JOB compatibility

 

The job execution is caused by a NOTIFY event received by the scheduler when a new job is submitted or modified. The notifications are polled every 0.1 second. When there is no notification, the scheduler polls the tables where job definitions are stored, every job_queue_interval seconds (5 seconds by default). This mean that in a worst-case scenario, a job will be executed job_queue_interval seconds after the next execution date is defined.

The number of jobs that can be executed at the same time is limited to 1000 by default. If this limit is reached, the daemon will wait for a process to end and then start a new one.

The use of an external scheduler daemon instead of a background worker is a choice made in our design decisions. This is because a background worker being able to fork thousands of sub-processes from a PostgreSQL background worker is not a good idea for a Production database.

Using the extension : pg_dbms_job

In this article, I will take three examples :

  • An asynchronous job with immediate execution
  • An asynchronous job with deferred execution
  • A scheduled job that runs every 12 hours

A job can be submitted using the dbms_job.submit() function, similar to Oracle. It has the following syntax.

dbms_job.submit (
    job       OUT bigint,
    what      IN  text,
    [ next_date IN  timestamp(0) with time zone DEFAULT current_timestamp
    [ , interval  IN  text DEFAULT NULL
    [ , no_parse  IN  boolean DEFAULT false ] ] ] );

The last parameter is provided for compatibility only but has no effect. PostgreSQL will evaluate the code only during the job execution.

Parameter what is the code that the job has to execute. It will be executed from a plpgsql DO block.

If the submit() stored procedure is called without the next_date (when) and interval (how often) parameters or that they are set to NULL, then, the job is executed immediately as an asynchronous process. If next_date is set to current timestamp or a date in the past and that interval is NULL, then, the job is also executed immediatly as an asynchronous process.

In all other cases, the job is to be started when appropriate but if the interval is NULL, the job is executed only once and then the job is deleted.

1. Asynchronous job with immediate execution

Let us consider a job that can be executed to send an email when a new user account is created. Sending mail can take some time and we don't want to wait for a timeout or until the email is really sent. To send an email from the database, we could use this excellent Orafce extension orafce_mail contributed by Pavel Stehule that implements the UTL_MAIL Oracle package.

BEGIN
CALL utl_mail.send(sender => 'contact@migops.com',
                   recipients => user_email,
                   subject => 'Welcome to your new account',
                   message => e'blah, blah, blah...');
END;

If we want to submit this job in an asynchronous way using the pg_dbms_job PostgreSQL extension, we will execute the following code :

BEGIN;
DO $
DECLARE
    jobno bigint;
    sender_email text := 'contact@migops.com';
    to_email text := 'newuser@domain.com';
BEGIN
   SELECT dbms_job.submit(
      'BEGIN CALL utl_mail.send(sender => '''||sender_email||''',
                   recipients => '''||to_email||''',
                   subject => ''Welcome to your new account'',
                   message => e''blah, blah, blah...''); END;'
      ) INTO jobno;
   RAISE NOTICE 'Email to % send by job %', to_email, jobno;
END;
$;
COMMIT;

Once the job is submitted, an entry is stored in table dbms_job.all_async_jobs and a trigger would send a notification to the scheduler to inform that a new asynchronous job is to be executed. A job is stored in an asynchronous job queue if the interval is NULL and if the next_date is in the past or is NULL. Here both are NULL.

While the transaction is not committed, the scheduler cannot see the new job, so we can have a look into the queue of asynchronous job :

regress_dbms_job=*# SELECT * FROM dbms_job.all_async_jobs ;
-[ RECORD 1 ]-------------------------------------------------------------
job         | 15
log_user    | gilles
schema_user | "$user", public
create_date | 2021-08-26 15:05:00.905228+02
what        | BEGIN CALL utl_mail.send(sender => 'contact@migops.com',    +
            |                    recipients => 'newuser@domain.com',      +
            |                    subject => 'Welcome to your new account',+
            |                    message => e'blah, blah, blah...'); END;
this_date   | 

Then, we can COMMIT the transaction to really submit the job.

The scheduler daemon is looking for job submission notifications every 0.1 seconds when there is no activity. When a new job is notified, the scheduler collects all asynchronous jobs from the queue table, forks a sub-process per job to execute and delete the job in the queue before processing it. The jobs are processed asynchronously and the scheduler goes back immedialetly to listen for notification.

After the job ends, a corresponding entry is available in the table dbms_job.all_scheduler_run_details, in our case with the following information :

regress_dbms_job=# SELECT * FROM dbms_job.all_scheduler_job_run_details;
-[ RECORD 1 ]-----+----------------------------------------------------------------------------
log_id            | 17
log_date          | 2021-08-26 15:07:19.056399+02
owner             | gilles
job_name          | 15
job_subname       | 
status            | ERROR:  orafce.smtp_url is not specified                                   +
                  | DETAIL:  The address (url) of smtp service is not known.                   +
                  | CONTEXT:  SQL statement "CALL utl_mail.send(sender => 'contact@migops.com',+
                  |                    recipients => 'newuser@domain.com',                     +
                  |                    subject => 'Welcome to your new account',               +
                  |                    message => e'blah, blah, blah...')"                     +
                  | PL/pgSQL function inline_code_block line 7 at CALL
error             | 55000
req_start_date    | 
actual_start_date | 2021-08-26 15:07:19+02
run_duration      | 0
instance_id       | 
session_id        | 
slave_pid         | 30888
cpu_used          | 
additional_info   | The query failed

Here, we have intentionally generated an error by not setting orafce_mail correctly for the purpose of dedmonstration of the information that is logged.

The information is also available in the scheduler log file as seen in the following log.

2021-08-26 15:07:19 [30888]: [793] LOG:  executing job 15
2021-08-26 15:07:19 [30888]: [909] ERROR:  job 15 failure, reason: ERROR:  orafce.smtp_url is not specified
DETAIL:  The address (url) of smtp service is not known.
CONTEXT:  SQL statement "CALL utl_mail.send(sender => 'contact@migops.com',
                   recipients => 'newuser@domain.com',
                   subject => 'Welcome to your new account',
                   message => e'blah, blah, blah...')"
PL/pgSQL function inline_code_block line 7 at CALL
2021-08-26 15:07:19 [30888]: [773] LOG:  end of subprocess executing job 15
2. Asynchronous job with deferred execution

Now suppose that I want to execute a task 30 seconds after my session ends. Lets say that this is to clear some cache after a certain time. I can write a job that is called with pg_sleep set to 30 seconds. This way, it gets executed immediately as seen in the following log.

DO $
DECLARE
jobid bigint;
BEGIN
        SELECT dbms_job.submit( 'BEGIN PERFORM pg_sleep(30); CALL cache_expires(); END;' ) INTO jobid;
END;
$;

But, the above approach is not very neat and optimal.

For this kind of use case, we can use the next_date parameter to set the execution date and remove the call to pg_sleep, as seen in the following log.

DO $
DECLARE
jobid bigint;
BEGIN
        SELECT dbms_job.submit(
        'BEGIN CALL cache_expires(); END;',
        current_timestamp + '30 seconds'::interval
        ) INTO jobid;
END;
$;
SELECT current_timestamp;

The execution of the above command shows that the asynchronous Job has been successfully executed at least 30 seconds after its submission:

DO
regress_dbms_job=> SELECT current_timestamp;
-[ RECORD 1 ]-----+------------------------------
current_timestamp | 2021-08-26 16:39:14.338956+02

regress_dbms_job=> SELECT * FROM dbms_job.all_scheduler_job_run_details;
-[ RECORD 1 ]-----+------------------------------
log_id            | 22
log_date          | 2021-08-26 16:39:46.052998+02
owner             | regress_dbms_job_user
job_name          | 20
job_subname       | 
status            | 
error             |      
req_start_date    | 
actual_start_date | 2021-08-26 16:39:45+02
run_duration      | 1
instance_id       | 
session_id        | 
slave_pid         | 33262
cpu_used          | 
additional_info   | 
3. Scheduled job that runs every 12 hours

To schedule a job that needs to run periodically, we need to set the next date of execution and the SQL code to automatically calculate the next execution date, using interval.

To execute the previous example periodically twice a day, i.e. every 12 hours, we can submit the job using the following code.

DO $
DECLARE
jobid bigint;
BEGIN
        SELECT dbms_job.submit(
        what => 'BEGIN CALL cache_expires(); END;',
        next_date => date_trunc('day', current_timestamp)+'1 day'::interval,
        job_interval => 'date_trunc(''hour'', current_timestamp)+''12 hours''::interval'
        ) INTO jobid;
END;
$;

Note that interval is a string that contains the code that will be evaluated to calculate the next execution date, unlike next_date which is a timestamp.

regress_dbms_job=# SELECT * FROM dbms_job.all_scheduled_jobs;
-[ RECORD 1 ]-----------------------------------------------------------
job         | 21
log_user    | gilles
priv_user   | gilles
schema_user | "$user", public
last_date   | 
last_sec    | 
this_date   | 
this_sec    | 
next_date   | 2021-08-27 00:00:00+02
next_sec    | 
total_time  | 
broken      | f
interval    | date_trunc('hour', current_timestamp)+'12 hours'::interval
failures    | 
what        | BEGIN CALL cache_expires(); END;
nls_env     | 
misc_env    | 
instance    | 0

If a scheduled job completes successfully, then, its new execution date is placed in next_date. The new date is calculated by evaluating the SQL expression defined as interval. The interval parameter must evaluate to a time in the future.

Conclusion

For more information on using this extension, you should read the documentation where all the API's and tables attributes are described. This is clearly not the same approach as a cron based scheduler. If you are migrating from an Oracle world, you may spend a little time to find the corresponding date functions to schedule jobs, but you would find that the way you use pg_dbms_job is completely similar to Oracle DBMS_JOB. We have included all the compatibility of Oracle DBMS_JOB in this extension pg_dbms_job to help you save time while migrating from Oracle to PostgreSQL. If you are thinking to migrate an Oracle database to PostgreSQL and worried about how to proceed further, please contact us or fill the following form. One of our experts will contact you.

Need our assistance in PostgreSQL Migrations, Assessments, PostgreSQL Consulting and Support ?
Please fill this form to have a conversation with our team.

Leave a Comment

Your email address will not be published.

Scroll to Top