Migrate from Sybase to PostgreSQL Without Downtime: Achieve Seamless Failback
Apr 24,2025
In response to rising demands for data reliability and regulatory compliance, a leading public sector organization embarked on a critical project to migrate its core healthcare and public health information systems from Sybase ASE to PostgreSQL. Facing the imminent end-of-support for Sybase ASE and increasing performance bottlenecks, the organization prioritized a seamless, zero-downtime migration strategy.
This case study explores how TapData enabled real-time, bi-directional database synchronization to ensure business continuity, minimal risk, and a smooth transition to a modern, high-availability PostgreSQL environment.

Overview

This project involved the migration of mission-critical applications from using a Sybase ASE database to using a PostGreSQL database. In order to ensure a smooth, accurate and complete migration, with minimum downtime and the ability to fall-back, our strategy required establishing a real-time replication path from Sybase ASE to PostGreSQL, then following data validation, cutting over the application to use PostGreSQL while reversing the data replication flow so that the Sybase ASE database stayed current with the new PostGreSQL database for some burn-in period of parallel running. This allowed for the possibility of a fail-back path for the application without data loss in the event of any issues encountered with the application running on PostGreSQL.

 

Step 1: TapData installation and database configuration

The TapData and MongoDB are deployed on three servers to create a highly available and load-balanced three-node TapData cluster architecture. This architecture ensures that even if one node fails, the remaining nodes can continue to provide services, preventing system downtime caused by a single point of failure. Through the load balancing mechanism, traffic is evenly distributed across all nodes, preventing any single node from becoming overloaded and affecting overall performance, thus improving the system’s response speed and stability. At the same time, data consistency is maintained across the nodes within the cluster through synchronization mechanisms, ensuring that data is not lost even if nodes fail or switch, thus ensuring business continuity and reliability.
📖 Sharing the TapData Three-Node Load Balancing and High Availability Deployment Documentation:
 https://docs.tapdata.io/administration/production-deploy/install-tapdata-ha-with-3-node

Step 2: Migration of the database objects from Sybase ASE to PostGreSQL

Besides migrating the base schema, there were other database objects that required migration and careful consideration of the impacts to the replication streams, including: logical views, materialized views, stored procedures, triggers, indexes, constraints, sequences and default values. Even the base schema itself required some careful adjustments for identity columns, primary keys, foreign keys, and field type mapping. While both Sybase ASE and PostGreSQL use similar SQL syntax, there are differences that need to be accounted for. Using TapData for the migration, all of these were handled by TapData automatically except for views, stored procedures, triggers, and check constraints which needed to be migrated manually.

Field Type Mapping

Invariably, when replicating from one database to another, there will be some field types that do not have an exact equivalent match, in which case the closest match should be identified. Such inexact matches can result in a difference in data value range or precision between the two databases. Since the existing application uses Sybase ASE, our preference is to have the PostGreSQL field types hold the larger range or precision so that there would be no loss of data precision going from Sybase ASE to PostGreSQL. This means that, when we reverse the data replication flow to run from PostGreSQL to Sybase ASE, that there is a risk of data precision loss in this reverse direction. This risk is partially mitigated by the fact that it is the same application (modified for PostGreSQL) that will be writing to PostGreSQL, so there is no reason to expect data to be out of bounds for the original Sybase ASE schema.

Primary Key / Uniqueness Considerations

However, special care must be taken when such fields are used in the primary keys for matching data, where a difference in precision could result in unexpected mismatches of primary keys. For example, DateTime field type exists in both databases, but have different precisions.
The DATETIME type in Sybase ASE has a precision of 1/300th of a second (~3.33 milliseconds).
The TIMESTAMP type in PostgreSQL (both TIMESTAMP WITHOUT TIME ZONE and TIMESTAMP WITH TIME ZONE) supports precision down to microseconds (1/1,000,000th of a second).
Note: The default precision is microseconds, but PostgreSQL allows specifying precision explicitly, e.g., TIMESTAMP(3) for milliseconds or TIMESTAMP(6) for full microsecond precision.
So, the values Dec 23 2024 18:48:13:125 and Dec 23 2024 18:48:13:128 would be unique in PostGreSQL, but would both end up as Dec 23 2024 18:48:13:126 in Sybase ASE which could cause problems if this field was the primary key or requires uniqueness.

Field Type Mapping Table

Here is the way in which TapData mapped the data types using its default mapping (see table below). While this is not an exhaustive map of all possible field types in Sybase ASE, it covers all the field types in use for our particular project. Note that TapData also allows custom field type mapping to be configured if needed.

 

Type Category
Sybase ASE Type
PostGreSQL Type
Character
CHAR
CHAR
VARCHAR
VARCHAR
TEXT
TEXT
UNICHAR
UNICHAR
UNITEXT
TEXT
UNIVARCHAR
VARCHAR
Integer
BIGINT
BIGINT
INT
INT
SMALLINT
SMALLINT
TINYINT
TINYINT
UNSIGNED BIGINT
NUMERIC(20)
UNSIGNED INT
BIGINT
UNSIGNED SMALLINT
INT
Numeric, decimal, float
NUMERIC
NUMERIC
DECIMAL
DECIMAL
FLOAT
DOUBLE
DOUBLE
DOUBLE
REAL
REAL
Date/Time
DATE
DATE
DATETIME
TIMESTAMP
SMALLDATETIME
TIMESTAMP(0)
TIME
TIME
Money
MONEY
MONEY
SMALLMONEY
MONEY
Binary
BINARY
BYTEA
VARBINARY
BYTEA
IMAGE
BYTEA
TIMESTAMP
BYTEA
Bit
BIT
BOOLEAN

 

Some of these mappings also require a mapping of data (e.g., BIT: 0/1 => BOOLEAN: true/false). TapData performs this mapping automatically for this case.

Loss of Range cases

  1. UNSIGNED BIGINT => NUMERIC(20,0)
UNSIGNED BIGINT (Sybase ASE 15.7+):
  • Range: 0 to 18,446,744,073,709,551,615 (2⁶⁴ – 1).
NUMERIC(20,0) (PostGreSQL):
  • -9,999,999,999,999,999,999 to 9,999,999,999,999,999,999.
PostGreSQL has a smaller range in this case, but it was confirmed that there were no data points in the Sybase UNISIGNED BIGINT fields with values greater than 9,999,999,999,999,999,999.

Space padding

Some field types, such as CHAR() and UNICHAR() will automatically pad extra spaces to reach the target string length, which is consistent behavior between the two databases. However, if you choose to map such fields with VARCHAR() fields then these padded spaces could end up as a difference between the databases that you may need to account for when performing data validation. We avoided this issue with the mapping we used.
TapData will automatically trim spaces from the right side of CHAR() and UNICHAR() data from the source before writing it to the target. In this case, PostGreSQL will then pad spaces again when the data is written to it.

Timestamp

Sybase ASE has a non-standard TIMESTAMP type that does not store date/time values. Instead, it is an automatically updated binary value used for row versioning and optimistic concurrency control. When migrating schemas using TapData, TIMESTAMP field type will be mapped to BINARY, allowing the exact data from the Sybase ASE source to be written to the PostGreSQL target without any data loss. However, if new records are written directly to the same table in PostGreSQL outside of TapData’s replication, then no timestamp will be created for that new row and the value will be null. So, care must be taken by the application developers to implement concurrency control using a different method after migrating to PostGreSQL. If the project just needs an actual timestamp (not for concurrency control), then you may consider to use PostGreSQL’s TIMESTAMP DEFAULT now() syntax for this case.

Primary Keys & Identity Columns

Primary Keys

Not all of the source database tables had primary keys defined in the schema, but the CDC replication requires primary keys to be defined so that it can distinguish between an insert and an update. In TapData, these fields are identified as “update fields”. When in doubt, the primary key can be set to all fields in a table if there was no primary key defined in the source.

Identity Columns

Identity (or sequence) columns are fields where unique numeric values are auto-generated for a column, which are typically used as a primary key. This is the same for both databases. However, there are differences.
PostGreSQL will not allow the use of the NUMERIC field type for an identity column, but Sybase ASE will allow it, so TapData automatically handles this type mapping case differently than the type mapping table above. Instead of mapping Sybase’s NUMERIC(20,0) to PostGreSQL’s NUMERIC(20), TapData will map it to PostGreSQL’s BIGINT if the column is marked as an IDENTITY column. If you were to just map your Sybase ASE NUMERIC IDENTITY column to a PostGreSQL NUMERIC IDENTITY column, then PostGreSQL will throw an error.

Identity Columns and CDC

Because we are using CDC to capture changes in Sybase ASE and then write the same records to PostGreSQL, identity columns carry some risks if not handled properly. Imagine if the target database was replacing the generated IDs from Sybase ASE with its own generated IDs and they didn’t match for some reason. This would cause all kinds of problems, particularly as these IDs are often used as foreign keys in other tables and in those other tables, it would be the source database IDs that are used rather than the newly generated ones on the target side. Some examples that could lead to these mismatched IDs include if updates were written out of sequence, an update was missed, or the internal sequence number generator was adjusted in one database without adjusting in the other, etc.
To avoid these risks, we want to implement all identity columns such that manual insertions are allowed. This ensures that the values generated in Sybase ASE are written exactly as is to the PostGreSQL database. Because we may also need to reverse the sequence for a fail-over scenario, the same flexibility must be added to both databases. For PostGreSQL, the field can be created as “GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY” to enable this option.
Unfortunately, the Sybase ASE implementation for IDENTITY columns is problematic. To override the auto-generated value, you need to have SET IDENTITY_INSERT ON and you can only do so for one table at a time. If it is turned on for a second table, then Sybase ASE will automatically switch it off for the first. Thankfully, TapData handles this automatically, executing SET IDENTITY_INSERT ON for each table just before writing to it, if that table has an IDENTITY column.
Note that Sybase ASE’s sequences work differently and offer more flexibility than identity, but these did not feature in our project so are not explained here.

Identity numbers

Note that when an explicit value is provided for an identity/auto-increment field, the sequence number does not advance automatically. If the sequence number needs to be updated, then that will require additional steps, such as these queries for PostGreSQL:
SELECT setval(‘EmployeeSeq’, 50001)
ALTER TABLE Employee ALTER COLUMN EmployeeSeq SET GENERATED BY DEFAULT RESTART WITH 50001
Additionally, in the event of a fail-over scenario where the primary is offline and the secondary becomes the new primary, there is a chance that some transactions in the old primary were not replicated to the new primary before the old one went offline. So, to allow for easier options for recovering such data from the old primary when it is recovered by avoiding primary key ID conflicts, it is best for the secondary (new primary) to start with sequence numbers that are not simply the next numbers after the latest ID received.
To best handle these issues, TapData offers a clever feature where you can set the page size for synchronizing identity sequence values. For example, if you set it to a page size of 100, then every time the CDC process reads another page of logs from the source database, it inserts an additional SQL statement per page of logs that advances the sequence number to the source’s current sequence number + 100 in the target PostGreSQL database. This way, if at any time the applications switch from writing to Sybase ASE to write to PostGreSQL instead, then the sequence numbers in PostGreSQL are already set to values that have not been used yet and no additional work is needed to adjust these numbers. They are ready to go. When switching, there may be some break in the numbers, but this is not a major issue, and it in fact helps in a fail-over scenario where there are missing rows that were inserted in the source but had not been transferred to the target before the source went down. Upon recovering the source, these missing rows could more easily be synchronized to the target without risk of sequence number collisions.
With this approach, we don’t have to keep manually advancing the sequence numbers on the target server, TapData will automatically update the sequence numbers in the target on an on-going basis, there’s no adjustments needed when failing over servers, and there’s no additional risk of ID conflicts.

Foreign Keys & Constraints

Foreign keys and similar cross-table constraints are a potential minefield for issues when working with CDC. Some SQL queries performed on the source database which update/insert/delete more than one row will be translated into separate SQL queries for each row affected when the database records the change in the log files that are used for CDC. To enable faster throughput, Tapdata has an option to enable parallel writes where it will split the updates from the log files to separate threads to execute in parallel. In order to maintain order consistency of updates to a table, this allocation of updates to threads will ensure that all updates to the same table are handled by the same thread. However, it does not guarantee that updates to separate tables are performed in exactly the same sequence as they were performed in the source database.
Consequently, to avoid write failures due to foreign key checks that may fail for parallel writes if foreign keys are used, TapData has implemented an approach when writing to PostGreSQL that it will turn off the foreign key and other constraints when inserting replicated data, but only for that session, using the command: session_replication_role = 'replica'. Other connections are not affected. This allows the replication to proceed with the parallel writes feature enabled.
For Sybase, there is no equivalent feature for turning off constraints at the session level, so when replicating data to Sybase with Foreign Keys or other constraints, it is advisable not to turn on the parallel writes feature.

Default Values

In theory, default values should be very straight-forward. Both Sybase ASE and PostGreSQL only apply Defaults on INSERTs, so there are no UPDATE scenarios to consider. Of course, you can explicitly update a value to DEFAULT to revert it to its default value, but that doesn’t cause an issue.
Let’s say that you have a default value of 1 for field “quantity” in your source database, then the same default value is setup in your target database. Here are some scenarios:
Scenario
Source
Target
Explicitly set quantity=3 to source
quantity=3
quantity=3
Insert new row with quantity blank to source
quantity=1 (from default)
quantity=1 (from source)
Insert new row with quantity blank to target
quantity=1 (from default)
Explicitly set quantity=3 to target
quantity=3
That said, there are some differences between Sybase ASE and PostGreSQL for default values. For example, the syntax for defining a timestamp field is a little different.
Sybase ASE:
CREATE TABLE Employees ( — other fields… CreatedAt DATETIME DEFAULT getdate() );
PostGreSQL:
CREATE TABLE Employees ( — other fields… CreatedAt TIMESTAMP DEFAULT NOW() );
Further, Sybase ASE doesn’t support defaults in a number of cases where PostGreSQL does. In particular for Sybase ASE, you can’t set a default value for TEXT or IMAGE fields, complex expressions are not supported, and there is limited support for using functions (getdate(), newid(), host_name(), and suser_name() only in v16+) as the default value. For our project, these didn’t present any challenges as the migration is from Sybase ASE (with more limited support for default values) to PostGreSQL (which has more support for default values).

Indexes

In general, the migration of indexes was also fairly straightforward with a few differences in terms of what the two databases support. Sybase ASE supports CLUSTERED INDEX but PostGreSQL does not. PostGreSQL supports expression/function indexes and partial indexes, but Sybase ASE does not. For our project, we only needed to handle the CLUSTERED INDEX case. PostGreSQL will allow periodic clustering by executing the CLUSTER command, but it won’t maintain the clustering automatically. Since this has no effect on the functional behavior of the database, only performance characteristics, it is simple enough to setup a regular job to perform manual CLUSTER activities periodically during times of low loads. Since TapData executing the CLUSTER command on an empty table has no effect, and TapData would not know when is a suitable time to execute the CLUSTER command, it is not handled by TapData, and is instead left to the DBAs to setup their own schedule for regular CLUSTER operations. When TapData creates the schema and indexes in the PostGreSQL target, it will simply ignore the CLUSTER part of the source schema definition. The only other comment on indexing I will mention is that the primary keys used for lookup for CDC updates should all have indexes, but TapData takes care of creating these for you automatically where missing, so no additional action was required.

Logical and Materialized Views

Both logical and materialized views do not cause update logs to be written to the database, so they have no effect on the CDC capture mechanism. In short, setting up these views in source and target are independent activities and can be maintained accordingly without involvement of or impact to CDC flows. The migrations of these views were thus performed manually, rather than using a particular tool to do it. TapData will ignore the views when replicating schemas and data.

Stored Procedures

Stored procedures are not called by queries that are written to the database update log. Instead, SQL queries that call stored procedures result in one or more database updates that are, in turn, recorded in the database update log as the simple inserts, updates, and deletes that result. So, most stored procedures can be implemented on both the source and target side of the CDC replication and will only be executed once on the source side when called, with the resulting impact to the data still replicated to both databases. The implementation on the target side is ignored until such a time as the users switch over to use the target database directly instead. Some stored procedures create DDL (structural) changes to the schema rather than changes to the data. If these DDL changes are also CDC-replicated, then they are treated the same way. If DDL is not replicated, then careful consideration is needed for how to handle these scripts in order to ensure that the target stays in sync with the source.
There tends to be more differences in the syntax, capabilities, and functions available to stored procedures, so the migration of each procedure is a task in itself that goes beyond the scope of this document and is handled manually.

Triggers

Triggers are like stored procedures, except that because they can be triggered by updates to the data, they don’t mix well with CDC replication and require careful adjustments. For example, if I have a trigger for any updates on table A which should cause an insert to table B, and that trigger is defined in both source and target, I could end up with two rows inserted into table B of my target. One is the replicated row that was inserted into table B of the source, and the other is the target database’s trigger-generated row. For this reason, each trigger needs to be looked at carefully as to its function and a decision made as to how to handle that case. Here are some options to choose from:
1) If the trigger’s only function is to create an audit record in another table (“audit table”), and there are no manual records written to this table, only the ones written by the triggers, then the copy of the triggers in the target database can be turned on, but the CDC replication from source to target should exclude the audit table. This way, the audit records are local audit records.
2) If the trigger writes to or updates a table that is also updated by other means than the triggers, then it is best to let the CDC process replicate data for that table and the associated triggers should be disabled in the target while the CDC replication is active and then switched back on when the CDC replication is stopped and the target becomes the active primary database. TapData takes this approach, but disables the triggers at a session-level automatically, so that they are still on for other users of the database.
3) If the trigger writes to or updates multiple tables, and they don’t all cleanly fall into option 1 or 2 above, then option 2 is likely needed, unless the trigger function is split to separate the impacts to the multiple tables. In all cases, the details of the migrated script implementation needs to be properly tested to ensure the desired result.

Conclusion

Through the deployment of TapData’s high-availability replication solution and customized migration strategies, the organization successfully transitioned from Sybase ASE to PostgreSQL with zero business disruption, enhanced scalability, and long-term operational resilience.
This project not only safeguarded critical data services during the migration but also established a repeatable framework for future system upgrades. The proven flexibility and reliability of TapData in heterogeneous database environments position it as a trusted partner for complex real-time data integration and modernization initiatives.

 

Sharing:

Tapdata is a low-latency data movement platform that offers real-time data integration and services. It provides 100+ built-in connectors, supporting both cloud and on-premises deployment, making it easy for businesses to connect with various sources. The platform also offers flexible billing options, giving users the freedom to choose the best plan for their needs.

Email: team@tapdata.io
Address: #4-144, 18 BOON LAY WAY, SINGAPORE 609966
Copyright © 2023 Tapdata. All Rights Reserved