ArcanistLintEngine does not take multi-byte characters into account


Observed Behavior & Reproduction steps:
With the following file as test:


from the context of ArcanistLintEngine:

list($line, $char) = getLineAndCharFromOffset('test', 15);
echo("line=$line, char=$char");

we get the following output:

line=0, char=15

Expected Behavior:
the character should be interpreted as 1 character, so the output should be line=2, char=1

Phabricator Version:
This issue is present on d581c453b83c515f3acac963bbc117e8dd0d1ef4


A way to resolve this would be to use mb_strlen instead of strlen, as that method does count multi-byte characters as 1 character


Thanks, that looks reasonable.
I think there are some use-cases somewhere that actually depend on this being byte-offset rather than char-offset, so actually fixing it would require some work. Also, lints are very low in upstream priority right now.

I’ve created a ticket upstream as, but if you can convince your tool to output byte offset instead of char offset (Or better - line+char info), you’ll get to your goal faster than waiting for this to be resolved.