The character ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ (family with two women, one girl, and one boy) is encoded as such:

U+1F469 WOMAN,
โ€U+200D ZWJ,
U+1F469 WOMAN,
U+200D ZWJ,
U+1F467 GIRL,
U+200D ZWJ,
U+1F466 BOY

So it’s very interestingly-encoded; the perfect target for a unit test. However, Swift doesn’t seem to know how to treat it. Here’s what I mean:

"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ") // true
"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("๐Ÿ‘ฉ") // false
"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("\u{200D}") // false
"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("๐Ÿ‘ง") // false
"๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ".contains("๐Ÿ‘ฆ") // true

So, Swift says it contains itself (good) and a boy (good!). But it then says it does not contain a woman, girl, or zero-width joiner. What’s happening here? Why does Swift know it contains a boy but not a woman or girl? I could understand if it treated it as a single character and only recognized it containing itself, but the fact that it got one subcomponent and no others baffles me.

This does not change if I use something like "๐Ÿ‘ฉ".characters.first!.


Even more confounding is this:

let manual = "\u{1F469}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F466}"
Array(manual.characters) // ["๐Ÿ‘ฉโ€", "๐Ÿ‘ฉโ€", "๐Ÿ‘งโ€", "๐Ÿ‘ฆ"]

Even though I placed the ZWJs in there, they aren’t reflected in the character array. What followed was a little telling:

manual.contains("๐Ÿ‘ฉ") // false
manual.contains("๐Ÿ‘ง") // false
manual.contains("๐Ÿ‘ฆ") // true

So I get the same behavior with the character array… which is supremely annoying, since I know what the array looks like.

This also does not change if I use something like "๐Ÿ‘ฉ".characters.first!.

6 s
6

Leave a Reply

Your email address will not be published. Required fields are marked *