mastodon.me.uk is one of the many independent Mastodon servers you can use to participate in the fediverse.
Open, user-supported, corporation-free social media for the UK.

Administered by:

Server stats:

504
active users

Wait, what?

❤️ is an emoji, so it is built up from surrogate pairs, right?

NOPE! Turns out it consists of U+2764 (plain ❤ symbol) and U+FE0F (Variation Selector 16)

This is why you should use Intl.Segmenter() and just deal with its abysmal performance 😭

Peter Brett

@sir_pepe There's scope for improving extended grapheme cluster segmentation performance by using vector instructions.

Since the vast majority of text has one codepoint per grapheme cluster, some applications use text data structures that break text into runs of trivial and multi-CP graphemes.

@krans Yeah, if you are shackled to JavaScript like I am, splitting the sting is the only thing that helps. Provided you know how things like ❤️ work ._.

@sir_pepe I think that JS implementations should consider implementing Unicode algorithms as intrinsics. Every non-trivial JS program needs to be able to handle text robustly and fast…

@krans Exactly what I'm thinking. On the other hand, things on the web appear mostly work. I don't know how or why.