Dear software people,
Unicode is older now than ASCII was when Unicode was introduced. It’s not a weird new fad.
It’s complicated but so is the domain it represents. We recognize that we have to think about time zones and leap days and seconds, for instance. And it’s a cleaner abstraction when you aren’t halfhearted about it.
Sincerely,
Charlie
@vruba but isn't it a solved problem by now? Nobody writes new software that's not unicode-aware (and it would be hard to do, because all the systems and languages do it by default now). Converting old software is another matter of course, but that's not specific to Unicode.
@isagalaev @vruba not by default, no. What’s "🤦🏼♂️".length in your language of choice?
@isagalaev @nikitonsky At a systems programming level, people certainly care more about storage size. But at a UI level, they might want to ensure that only one emoji can be used in a certain context. So perhaps it’s better to say that there are multiple useful senses of the idea of “length” that might matter in different areas. But I think we basically agree about the important parts of this issue.
@isagalaev @nikitonsky Think of emoji reactions or status fields. They could be implemented in such a way that it is reasonable to check that the user only submits one emoji at a time, or at least that only one is displayed at a time. Or consider CSS use cases like https://developer.mozilla.org/en-US/docs/Web/CSS/::first-letter
@isagalaev @nikitonsky All I’m saying is that the length of a string as a reader would understand it (not only as a hard drive would understand it) is a useful concept that should be exposed by at least some string libraries. I strongly agree with you that it’s not worth optimizing for at the cost of, well, almost any other operation.
@nikitonsky @vruba as for the contraction with "…", UIs seem to universally converge on visual hiding with a transparency gradient, because the visual width of a string only makes sense after being rendered with a particular font on a particular device. Doing `str[:max] + '...'` was only good enough in the beginning.
@isagalaev @vruba remember when every cyrillic letter was counted as two for character limit at Twitter? Those were not the fun times
@vruba @nikitonsky but the part about "only one emoji can be used in a certain context" is interesting. What context? Emoji pickers in UI couldn't care less about `.length`, they are tables of grapheme clusters, where each emoji is a full utf-8 encoded string that gets appended to a string in a text input. Nobody cares if its length is 1 or more.
It's just not a(n important) use case. Just like it turned out that nobody needs random access to "characters" in a string by index in O(1).