Invalid XML character during changelog snapshot

Description

The changes done with this particular commit https://github.com/liquibase/liquibase/commit/61254b816a8516672194c173684b9c81eb163d68

are a bit hard to track down the rationale, as well as what the implementation is specifically trying to remove. We encounter this problem when attempting to generate a db snapshot from postgres. The problem is the error message doesn't include what the actual `text` and/or character is that is determined to be invalid.

Instead we get an error like:
liquibase.exception.UnexpectedLiquibaseException: Invalid string encoding on column.value. To resolve, remove the invalid character on the database and try again in changeSet

with no indication of where this may be occurring. This makes it hard to diagnose.
It is also difficult to determine what codepoints are being rejected based on this:

1 2 3 4 if (!(Character.isISOControl(current) && current != '\n' && current != '\r' && current != '\t') && ((codePoint == 0x9) || (codePoint == 0xA) || (codePoint == 0xD) || ((codePoint >= 0x20) && (codePoint <= 0xD7FF)) || ((codePoint >= 0xE000) && (codePoint <= 0xFFFD)) || ((codePoint >= 0x10000) && (codePoint <= 0x10FFFF))))

I don't know if this is a "bug", but it is at least hard to deal with when encountered. It isn't clear what a workaround can be when you get this error message. It seems the db data itself just contains unicode code points that are being rejected.

Environment

postgres db snapshot

Status

Assignee

Unassigned

Reporter

Mike Rodriguez

Labels

None

Components

Affects versions

3.6.0
3.6.2
3.6.3
3.6.1

Priority

Blocker