Java String encoding (UTF-8) -



Java String encoding (UTF-8) -

i have come across line of legacy code, trying figure out:

string newstring = new string(oldstring.getbytes("utf-8"), "utf-8"));

as far can understand, encoding & decoding using same charset.

how different following?

string newstring = oldstring;

is there scenario in 2 lines have different outputs?

p.s.: clarify, yes aware of excellent article on encoding joel spolsky !

this complicated way of doing

string newstring = new string(oldstring);

this shortens string underlying char[] used much longer.

however more checking every character can utf-8 encoded.

there "characters" can have in string cannot encoded , these turned ?

any character between \ud800 , \udfff cannot encoded , turned '?'

string oldstring = "\ud800"; string newstring = new string(oldstring.getbytes("utf-8"), "utf-8"); system.out.println(newstring.equals(oldstring));

prints

false

java string encoding

Comments

Popular posts from this blog

How do I check if an insert was successful with MySQLdb in Python? -

delphi - blogger via idHTTP : error 400 bad request -

postgresql - ERROR: operator is not unique: unknown + unknown -