**Charlie Loyd** @vruba@everything.happens.horse · Oct 06, 2023, 20:10

**Charlie Loyd** @vruba@everything.happens.horse · Oct 06, 2023, 20:10

Charlie Loyd @vruba@everything.happens.horse

Oct 06, 2023, 20:10

Charlie Loyd @vruba@everything.happens.horse

Dear software people,

Unicode is older now than ASCII was when Unicode was introduced. It’s not a weird new fad.

It’s complicated but so is the domain it represents. We recognize that we have to think about time zones and leap days and seconds, for instance. And it’s a cleaner abstraction when you aren’t halfhearted about it.

Sincerely,
Charlie

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 06, 2023, 20:40

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 06, 2023, 20:40

Oct 06, 2023, 20:40

Ivan Sagalaev @isagalaev@mastodon.social

@vruba but isn't it a solved problem by now? Nobody writes new software that's not unicode-aware (and it would be hard to do, because all the systems and languages do it by default now). Converting old software is another matter of course, but that's not specific to Unicode.

**Niki Tonsky** @nikitonsky@mastodon.online · Oct 06, 2023, 23:01

**Niki Tonsky** @nikitonsky@mastodon.online · Oct 06, 2023, 23:01

Oct 06, 2023, 23:01

Niki Tonsky @nikitonsky@mastodon.online

@isagalaev @vruba not by default, no. What’s "🤦🏼‍♂️".length in your language of choice?

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 07, 2023, 17:06

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 07, 2023, 17:06

Oct 07, 2023, 17:06

Ivan Sagalaev @isagalaev@mastodon.social

@nikitonsky @vruba okay, okay, admittedly I meant it in a very narrow sense of "everyone thankfully uses utf-8 everywhere by now", so text data is interoperable. I didn't mean all the interesting cases are solved.

Like the length of that emoji, where the correct answer is "emojis don't have a well defined meaning of length", so nobody should assume anything in this case. But as it turn out, mostly people care about the count of utf-8-encoded bytes, for storage or memory allocation.

**Charlie Loyd** @vruba@everything.happens.horse · Oct 07, 2023, 17:21

**Charlie Loyd** @vruba@everything.happens.horse · Oct 07, 2023, 17:21

Oct 07, 2023, 17:21

Charlie Loyd @vruba@everything.happens.horse

@isagalaev @nikitonsky At a systems programming level, people certainly care more about storage size. But at a UI level, they might want to ensure that only one emoji can be used in a certain context. So perhaps it’s better to say that there are multiple useful senses of the idea of “length” that might matter in different areas. But I think we basically agree about the important parts of this issue.

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 07, 2023, 17:41

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 07, 2023, 17:41

Oct 07, 2023, 17:41

Ivan Sagalaev @isagalaev@mastodon.social

@vruba @nikitonsky but the part about "only one emoji can be used in a certain context" is interesting. What context? Emoji pickers in UI couldn't care less about `.length`, they are tables of grapheme clusters, where each emoji is a full utf-8 encoded string that gets appended to a string in a text input. Nobody cares if its length is 1 or more.

It's just not a(n important) use case. Just like it turned out that nobody needs random access to "characters" in a string by index in O(1).

**Charlie Loyd** @vruba@everything.happens.horse · 2023-10-07T17:47:03Z

Charlie Loyd @vruba@everything.happens.horse

@isagalaev @nikitonsky Think of emoji reactions or status fields. They could be implemented in such a way that it is reasonable to check that the user only submits one emoji at a time, or at least that only one is displayed at a time. Or consider CSS use cases like https://developer.mozilla.org/en-US/docs/Web/CSS/::first-letter

Oct 07, 2023, 17:47 · · Web · · ·

**Charlie Loyd** @vruba@everything.happens.horse · Oct 07, 2023, 18:08

**Charlie Loyd** @vruba@everything.happens.horse · Oct 07, 2023, 18:08

Oct 07, 2023, 18:08

Charlie Loyd @vruba@everything.happens.horse

@isagalaev @nikitonsky All I’m saying is that the length of a string as a reader would understand it (not only as a hard drive would understand it) is a useful concept that should be exposed by at least some string libraries. I strongly agree with you that it’s not worth optimizing for at the cost of, well, almost any other operation.

**Niki Tonsky** @nikitonsky@mastodon.online · Oct 07, 2023, 21:39

**Niki Tonsky** @nikitonsky@mastodon.online · Oct 07, 2023, 21:39

Oct 07, 2023, 21:39

Niki Tonsky @nikitonsky@mastodon.online

@vruba @isagalaev Even simpler. Doing this is impossible without being aware of grapheme clusters

1451ee40858a9389.png

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 08, 2023, 03:15

**Ivan Sagalaev** @isagalaev@mastodon.social · Oct 08, 2023, 03:15

Oct 08, 2023, 03:15

Ivan Sagalaev @isagalaev@mastodon.social

@nikitonsky @vruba as for the contraction with "…", UIs seem to universally converge on visual hiding with a transparency gradient, because the visual width of a string only makes sense after being rendered with a particular font on a particular device. Doing `str[:max] + '...'` was only good enough in the beginning.

**Niki Tonsky** @nikitonsky@mastodon.online · Oct 09, 2023, 06:21

**Niki Tonsky** @nikitonsky@mastodon.online · Oct 09, 2023, 06:21

Oct 09, 2023, 06:21

Niki Tonsky @nikitonsky@mastodon.online

@isagalaev @vruba remember when every cyrillic letter was counted as two for character limit at Twitter? Those were not the fun times

Trending now

Resources

Developers

What is Hometown/Mastodon?

everything.happens.horse

More…