12

Can someone say if "²" is a symbol or a digit? (alt+1277, power of two)

print("²".isdigit())
# True
print("²".isnumeric())
# True

Because Python says it's a digit, but it's not actually a digit. Am I wrong? Or it's a bug?

3
  • 3
    It is a digit -- it's clearly the numeral 2. However it is not ascii. Nov 27, 2020 at 17:44
  • japanese kanji digits are digits too. It's just tthat you probably convert it to int Nov 27, 2020 at 17:44
  • 3
    The key is that neither isdigit nor isnumeric implies that a string consisting of these values can be used as an argument to, e.g., int. If you want to know if a string s represents a particular int value, don't inspect s, try it with int(s) and catch an exception if raised.
    – chepner
    Nov 27, 2020 at 17:52

2 Answers 2

17

It is explicitly documented as a digit:

str.isdigit()

Return True if all characters in the string are digits and there is at least one character, False` otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

Regarding Numeric_Type, this is defined by Unicode:

Numeric_Type=Digit

Variants of positional decimal characters (Numeric_Type=Decimal) or sequences thereof. These include super/subscripts, enclosed, or decorated by the addition of characters such as parentheses, dots, or commas.

0
4

Python is smart enough to tag unicode characters as digits, just because it's possible.

To complete this good answer, note that you can even get the floating point representation of the character:

>>> from unicodedata import numeric
>>> numeric("²")
2.0

It's float because there are unicode versions of 1/2, 3/2 ...

(see How to convert unicode numbers to ints?)

6
  • Coming from the C and C++ world, this makes me feel rather ill. Out of interest, what happens with numeric(leminscate) ?
    – Bathsheba
    Nov 27, 2020 at 17:54
  • 1
    yeah :) but you don't have to use all the features. Personally I never use unicode in my scripts. Nov 27, 2020 at 17:55
  • I'm primarily a Python developer, and it makes be a bit queasy as well :) However, keep in mind that programs that do math are not the only use case for Unicode. Just because something is considered a digit doesn't necessarily mean it should be used numerically.
    – chepner
    Nov 27, 2020 at 17:57
  • @chepner: Yes that's a good point and there are parallels in C and C++. E.g. 08 is an invalid integer constant in both languages as the leading 0 denotes octal and 8 is an invalid octal digit. But it's still a digit.
    – Bathsheba
    Nov 27, 2020 at 18:03
  • except that you can do int("08") and as the base is specified (default, 10) it will convert to 8 all right Nov 27, 2020 at 18:06

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.