 |
SoftTree Technologies
Technical Support Forums
|
|
Author |
Message |
gemisigo
Joined: 11 Mar 2010 Posts: 2165
|
|
[11.1.125 Pro] - File encoding in SE |
|
SE correctly recognizes file encoding as UTF only when the file has been saved as UTF8 with BOM. If I save it as UTF8 in another editor, SE always takes it is ASCII. The problem is that I've found no way to convince it to assume it is UTF8. Setting the encoding in File > File Encoding will convert the file to UTF8 with BOM from ASCII and that will ruin a perfectly encoded UTF8 file.
Is there a fix or a workaround for this?
|
|
Fri Feb 14, 2020 6:55 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7948
|
|
|
|
The workaround here is using File -> Convert Text -> UTF8 to Text menu after the file is opened
|
|
Mon Feb 17, 2020 9:43 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7948
|
|
|
|
Just want to clarify one point, if there is no BOM, then a file is loaded as ASCII, which is the default file format. The editor doesn't analyze file content for UTF-8 encoding, it only checks for presence of U+FEFF.
BOM use in UTF-8 files is optional, unless the files are intended to be shared with other programs, which don't know upfront which file encoding is used.
|
|
Mon Feb 17, 2020 10:47 am |
|
 |
gemisigo
Joined: 11 Mar 2010 Posts: 2165
|
|
|
|
 |
 |
Just want to clarify one point, if there is no BOM, then a file is loaded as ASCII, which is the default file format. |
That's exactly my point. SE thinks the file is ASCII when it is not. It is encoded UTF8 but it doesn't have a BOM because other apps I open that file with do not require it and when concatenating multiple files into a single file, that BOM gets in the way. So it is just plain UTF8 without BOM.
The proposed workaround does not work (around), it actually wrecks the contents. There are comments in the file. Those comments are in Hungarian and contain accented characters. Many of them. When I open the file without BOM, it is considered as ASCII, and the accented characters are garbled.
These are the original comments:
 |
 |
COMMENT ON COLUMN "TADA_DISTANCE_MATRIX_ELEMENT"."START_TARIFF_ZONE_TID" IS 'Kezdő zóna azonosító';
COMMENT ON COLUMN "TADA_DISTANCE_MATRIX_ELEMENT"."END_TARIFF_ZONE_TID" IS 'Záró zóna azonosító';
|
And this is what SE thinks they are (and how they are going to be added to the database object in case the code is executed):
 |
 |
COMMENT ON COLUMN "TADA_DISTANCE_MATRIX_ELEMENT"."START_TARIFF_ZONE_TID" IS 'KezdÅ‘ zóna azonosÃtÅ‘';
COMMENT ON COLUMN "TADA_DISTANCE_MATRIX_ELEMENT"."END_TARIFF_ZONE_TID" IS 'Záró zóna azonosÃtÅ‘';
|
Now, if I apply the suggested workaround SE converts the file from ASCII (which it is not) to UTF8 (and save it with BOM) and that completely maims the comments. When that happens even the apps (that correctly recognized that file as UTF8) see the comments as a heap of disfigured characters and are no longer able to display them properly.
The code survives, but the comments are slaughtered and that's sort of undesired.
I don't want SE to convert the file to something. I don't even want it to analyze the contents (though that would be nice). I need a method to dissuade it to stubbornly look at that file as an ASCII file. To convince it, it should interpret the contents as UTF8 instead of ASCII.
|
|
Tue Feb 18, 2020 9:52 am |
|
 |
SysOp
Site Admin
Joined: 26 Nov 2006 Posts: 7948
|
|
|
|
Thank you. I will share your feedback with the team.
|
|
Tue Feb 18, 2020 10:06 am |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|