**Charlie Loyd** @vruba@everything.happens.horse · Oct 6, 2023

**Charlie Loyd** @vruba@everything.happens.horse · Oct 6, 2023

Charlie Loyd @vruba@everything.happens.horse

Oct 6, 2023

Charlie Loyd @vruba@everything.happens.horse

Dear software people,

Unicode is older now than ASCII was when Unicode was introduced. It’s not a weird new fad.

It’s complicated but so is the domain it represents. We recognize that we have to think about time zones and leap days and seconds, for instance. And it’s a cleaner abstraction when you aren’t halfhearted about it.

Sincerely,
Charlie

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 6, 2023

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 6, 2023

Oct 6, 2023

Ivan Sagalaev @isagalaev@mastodon.social

@vruba but isn't it a solved problem by now? Nobody writes new software that's not unicode-aware (and it would be hard to do, because all the systems and languages do it by default now). Converting old software is another matter of course, but that's not specific to Unicode.

**Niki Tonsky** @nikitonsky@mastodon.online · Oct 6, 2023

**Niki Tonsky** @nikitonsky@mastodon.online · Oct 6, 2023

Oct 6, 2023

Niki Tonsky @nikitonsky@mastodon.online

@isagalaev @vruba not by default, no. What’s "".length in your language of choice?

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 7, 2023

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 7, 2023

Oct 7, 2023

Ivan Sagalaev @isagalaev@mastodon.social

@nikitonsky @vruba okay, okay, admittedly I meant it in a very narrow sense of "everyone thankfully uses utf-8 everywhere by now", so text data is interoperable. I didn't mean all the interesting cases are solved.

Like the length of that emoji, where the correct answer is "emojis don't have a well defined meaning of length", so nobody should assume anything in this case. But as it turn out, mostly people care about the count of utf-8-encoded bytes, for storage or memory allocation.

**Charlie Loyd** @vruba@everything.happens.horse · Oct 7, 2023

**Charlie Loyd** @vruba@everything.happens.horse · Oct 7, 2023

Oct 7, 2023

Charlie Loyd @vruba@everything.happens.horse

@isagalaev @nikitonsky At a systems programming level, people certainly care more about storage size. But at a UI level, they might want to ensure that only one emoji can be used in a certain context. So perhaps it’s better to say that there are multiple useful senses of the idea of “length” that might matter in different areas. But I think we basically agree about the important parts of this issue.

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 7, 2023

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 7, 2023

Oct 7, 2023

Ivan Sagalaev @isagalaev@mastodon.social

@vruba @nikitonsky but the part about "only one emoji can be used in a certain context" is interesting. What context? Emoji pickers in UI couldn't care less about `.length`, they are tables of grapheme clusters, where each emoji is a full utf-8 encoded string that gets appended to a string in a text input. Nobody cares if its length is 1 or more.

It's just not a(n important) use case. Just like it turned out that nobody needs random access to "characters" in a string by index in O(1).

**Charlie Loyd** @vruba@everything.happens.horse · Oct 7, 2023

**Charlie Loyd** @vruba@everything.happens.horse · Oct 7, 2023

Oct 7, 2023

Charlie Loyd @vruba@everything.happens.horse

@isagalaev @nikitonsky Think of emoji reactions or status fields. They could be implemented in such a way that it is reasonable to check that the user only submits one emoji at a time, or at least that only one is displayed at a time. Or consider CSS use cases like https://developer.mozilla.org/en-US/docs/Web/CSS/::first-letter

::first-letter - CSS: Cascading Style Sheets | MDN

The ::first-letter CSS pseudo-element applies styles to the first letter of the first line of a block-level element, but only when not preceded by other content…

developer.mozilla.org

**Charlie Loyd** @vruba@everything.happens.horse · 2023-10-07T18:08:28Z

Charlie Loyd @vruba@everything.happens.horse

@isagalaev @nikitonsky All I’m saying is that the length of a string as a reader would understand it (not only as a hard drive would understand it) is a useful concept that should be exposed by at least some string libraries. I strongly agree with you that it’s not worth optimizing for at the cost of, well, almost any other operation.

October 7, 2023 at 6:08 PM · · Web · · ·

Trending now

Resources

Developers

What is Hometown/Mastodon?

everything.happens.horse

More…