[11.1.125 Pro] - File encoding in SE

[11.1.125 Pro] - File encoding in SE

SE correctly recognizes file encoding as UTF only when the file has been saved as UTF8 with BOM. If I save it as UTF8 in another editor, SE always takes it is ASCII. The problem is that I've found no way to convince it to assume it is UTF8. Setting the encoding in File > File Encoding will convert the file to UTF8 with BOM from ASCII and that will ruin a perfectly encoded UTF8 file.

Is there a fix or a workaround for this?

The workaround here is using File -> Convert Text -> UTF8 to Text menu after the file is opened

Just want to clarify one point, if there is no BOM, then a file is loaded as ASCII, which is the default file format. The editor doesn't analyze file content for UTF-8 encoding, it only checks for presence of U+FEFF.

BOM use in UTF-8 files is optional, unless the files are intended to be shared with other programs, which don't know upfront which file encoding is used.

	SysOp wrote:
	Just want to clarify one point, if there is no BOM, then a file is loaded as ASCII, which is the default file format.

That's exactly my point. SE thinks the file is ASCII when it is not. It is encoded UTF8 but it doesn't have a BOM because other apps I open that file with do not require it and when concatenating multiple files into a single file, that BOM gets in the way. So it is just plain UTF8 without BOM.

The proposed workaround does not work (around), it actually wrecks the contents. There are comments in the file. Those comments are in Hungarian and contain accented characters. Many of them. When I open the file without BOM, it is considered as ASCII, and the accented characters are garbled.

These are the original comments:

	Quote:
	COMMENT ON COLUMN "TADA_DISTANCE_MATRIX_ELEMENT"."START_TARIFF_ZONE_TID" IS 'Kezdő zóna azonosító'; COMMENT ON COLUMN "TADA_DISTANCE_MATRIX_ELEMENT"."END_TARIFF_ZONE_TID" IS 'Záró zóna azonosító';

And this is what SE thinks they are (and how they are going to be added to the database object in case the code is executed):

	Quote:
	COMMENT ON COLUMN "TADA_DISTANCE_MATRIX_ELEMENT"."START_TARIFF_ZONE_TID" IS 'KezdÅ‘ zÃ³na azonosÃtÅ‘'; COMMENT ON COLUMN "TADA_DISTANCE_MATRIX_ELEMENT"."END_TARIFF_ZONE_TID" IS 'ZÃ¡rÃ³ zÃ³na azonosÃtÅ‘';

Now, if I apply the suggested workaround SE converts the file from ASCII (which it is not) to UTF8 (and save it with BOM) and that completely maims the comments. When that happens even the apps (that correctly recognized that file as UTF8) see the comments as a heap of disfigured characters and are no longer able to display them properly.

The code survives, but the comments are slaughtered and that's sort of undesired.

I don't want SE to convert the file to something. I don't even want it to analyze the contents (though that would be nice). I need a method to dissuade it to stubbornly look at that file as an ASCII file. To convince it, it should interpret the contents as UTF8 instead of ASCII.

Thank you. I will share your feedback with the team.