Java String encoding (UTF-8) -
Java String encoding (UTF-8) -
i have come across line of legacy code, trying figure out:
string newstring = new string(oldstring.getbytes("utf-8"), "utf-8"));
as far can understand, encoding & decoding using same charset.
how different following?
string newstring = oldstring;
is there scenario in 2 lines have different outputs?
p.s.: clarify, yes aware of excellent article on encoding joel spolsky !
this complicated way of doing
string newstring = new string(oldstring);
this shortens string underlying char[] used much longer.
however more checking every character can utf-8 encoded.
there "characters" can have in string cannot encoded , these turned ?
any character between \ud800 , \udfff cannot encoded , turned '?'
string oldstring = "\ud800"; string newstring = new string(oldstring.getbytes("utf-8"), "utf-8"); system.out.println(newstring.equals(oldstring));
prints
false
java string encoding
Comments
Post a Comment