One of our partners came across a special issue with an AS/400 (a.k.a. iSeries a.k.a IBM System i) returning data in EBCDIC encoding to Pentaho Data Integration.
For instance this can happen when one of the tables or fields have a CCSID of 65535 what means no encoding.
PDI in this instance was running on the Pentaho Platform.
There are 2 relevant entries in the JT Open FAQ :
- What character conversion issues must my program deal with?
- Why is the Toolbox JDBC returning EBCDIC characters to my Java program?
The second entry led to the solution. All that was needed was adding the following option to the URL in the JNDI data source:
;translate binary=true
If you are using a JDBC connection, you can add the option to in the Options pane:
Option: translate binary
Value : true
This solved the issue.
To make this issue a bit more vexing, when you develop the transformation in spoon, doing a Preview will display correctly converted characters. Even if you haven't set the translate binary, so don't get caught that it works on your laptop but wrong on the Server!