Not valid characters - how to fix?
Hello
in some TXT files used to fed BoardI have some characters that are not recognized.
I know that I can do something in the source data but for now I jsut want to fix the issue replacing such characters.
Setting a formula in the ETL is seems working but when I run the data reader it does not work
see the picture
can someone help?
Answers
-
you can set the encoding of your csv files directly in the datareader settings:
Try to find the encoding of your files and the characters should then display correctly (however, the prerequisite for this is that the characters are displayed correctly in your source file).
Cheers
Johannes
1 -
Hi Daniele,
I don't have time to test myself, right at the moment - but I'm more inclined to think that you'll need to use the CHAR() function, rather than (as I'm assuming you've done) copying and pasting the funny character and putting it inside double-quotes. I don't know how many different invalid characters you potentially have ..... it could become quite a large number of nested SUBSTITUTE() formulas to remove all of the possible characters that need to be stripped out
0 -
Thank you Johannes Blischke and Craig Parris
I know that I have to work on source data readers playing with encoding setup but unfortunately I have multiple sources from txt files and it requires too much work at the moment. Moreover in the future such files have to replaced with some obdc connection to a sql db so for sure I solve these issues.
But now I was searching a workaround for characters that are already "not identified" in board and appears as <?>.
I print the tree or entity affected and re-reading it I was trying to repair the character, but without success (as explained above)
but, as far I can see, seems not possible
0 -
Ciao Daniele Di Lorenzo,
As already suggested you can control the encoding of the file in the datareader protocol itself. The ETL will inherit the encoding of the file when executing so it wont be able to execute the formula properly in case of a mismatch, like in your specific case.
When having to deal with multiple files without knowing the source encoding, the best thing you can do is batch-convert them all into UTF8 and set all your readers to UTF 8, if you are on premise.
On the cloud you can still do this via an ETL tool or copying, converting and copying back via AZCopy tool
You can run this simple powershell command on a file
Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt
This will convert any text file to a utf8 encoded file, whatever source encoding they have
Hope this helps
Michele2 -
0