James Smith @Floppy

**Peter Kröner** @sir_pepe@mastodon.social · Nov 26, 2024

Nov 26, 2024

Peter Kröner @sir_pepe@mastodon.social

Wait, what?

is an emoji, so it is built up from surrogate pairs, right?

NOPE! Turns out it consists of U+2764 (plain symbol) and U+FE0F (Variation Selector 16)

This is why you should use Intl.Segmenter() and just deal with its abysmal performance

#javascript #unicode #fml

Peter Brett @krans@mastodon.me.uk

@sir_pepe There's scope for improving extended grapheme cluster segmentation performance by using vector instructions.

Since the vast majority of text has one codepoint per grapheme cluster, some applications use text data structures that break text into runs of trivial and multi-CP graphemes. #Unicode

Nov 26, 2024, 08:48 AM··Moshidon

0boosts·0favorites

**Peter Kröner** @sir_pepe@mastodon.social · Nov 26, 2024

Nov 26, 2024

Peter Kröner @sir_pepe@mastodon.social

@krans Yeah, if you are shackled to JavaScript like I am, splitting the sting is the only thing that helps. Provided you know how things like work ._.

**Peter Brett** @krans · Nov 26, 2024

Nov 26, 2024

Peter Brett @krans

@sir_pepe I think that JS implementations should consider implementing Unicode algorithms as intrinsics. Every non-trivial JS program needs to be able to handle text robustly and fast…

**Peter Kröner** @sir_pepe@mastodon.social · Nov 26, 2024

Nov 26, 2024

Peter Kröner @sir_pepe@mastodon.social

@krans Exactly what I'm thinking. On the other hand, things on the web appear mostly work. I don't know how or why.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back