|
|
Subscribe / Log in / New account

Python is (mostly) made of syntactic sugar

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jake Edge
August 29, 2023
PyCon

"Sugar" is, to a certain extent, in the eye of the beholder—at least when it comes to syntax. Programming languages are often made up of a (mostly) irreducible core, with lots of sugary constructs sprinkled on top—the syntactic sugar. No one wants to be forced to do without the extra syntax—at least not for their favorite pieces—but it is worth looking at how a language's constructs can be built from the core. That is just what Brett Cannon has been doing for Python, on his blog and in talks, including a talk at PyCon back in April (YouTube video).

[Brett Cannon]

Cannon began with a definition of "syntactic sugar" as "syntax that is not necessary to get the semantic results you want in the end". But, eliminating the sugar will not necessarily result in code that is as fast as it would be with those "optional" constructs. In fact, eliminating some of them will "abuse the way the language works" in various ways. The idea is that if a construct had not been added to the language by the core developers, the same result can be obtained, "but probably in a much messier, long-winded way".

He set some rules for himself in his investigation, including that any syntactic-sugar replacement needed to work in Python 3.8, which was "the newest hotness" at the time he started. He stuck to things that were in the language specification and avoided things that were CPython-specific. He also restricted his techniques to those that could be applied to a single Python file, as opposed to whole-program analysis; he wanted to be able to write a tool that would apply the de-sugaring to a file such that the output would be a drop-in replacement for the original.

He was willing to live with some oddities in highly dynamic parts of the runtime, such as in locals() and globals(). In fact, he said, that already happens; inside of a list comprehension, there is an "extra" entry in locals() for a "magic little variable where we construct that list for you". Since CPython cheats, he decided that he could too. Beyond that, he did not take performance into consideration at all; it was purely a theoretical exercise.

Required syntax

As it turns out, "there are only ten pieces of syntax that you cannot live without"; he was able to eliminate more than 80 sugary pieces. He thinks, though, that once attendees see what has to be done to avoid some of them, they will agree that "a lot of this syntactic sugar is really handy to have".

The first element that you need is integers, which are the building blocks of lots of other things. A float can be represented as integers, bytes are arrays of integers, and Unicode strings are just arrays of codepoints, which are ... integers. "You can build all of the data representations that we have in the language using just integers." Everyone likes string literals, though, so even though you can reduce them to integers, that's not a pleasant path; that's true for other representations, as well, of course.

Function calls are another required piece; you need some way to call code. The while statement can provide both looping and conditional constructs, which makes it more useful than either for or if; those can only handle one or the other. So while is required and the other two can be defined in terms of it.

The try/except statement is another required piece. He could actually get rid of that requirement if he was willing to do the whole-program analysis and rewrite all of the functions in the style of Rust, with return values that are either "ok" or "an exception got raised". The raise statement is also required because of the no-CPython-internals restriction he placed on his experiment.

def is required in order to create functions that provide the API that other parts of the program can call. Both return and yield are needed, though there are ways around that; he could have brought back lambda, which would have allowed eliminating them from the list. Cannon said that he was "too lazy" to do so when he learned how; it is quite a bit of work and eliminating class had already worn him out.

The final two pieces of required syntax are assignment, which is perhaps an obvious one, and nonlocal, which is decidedly less so. The original namespace picture for Python was much simpler, but less powerful; it consisted of local, global, and built-in namespaces. The addition of closures complicated that picture; Cannon needed to keep nonlocal as one of his ten core syntax pieces in order to be able to control access to other namespaces.

So, just those ten pieces of syntax are enough to reimplement every other part of Python, including import, addition, class, and more. Technically, you can write your Python code just using those pieces, "you'd just hate yourself because you'd be having to write the weirdest code in the world".

Essentially, those constructs make up a Turing machine "with some Python flourishes", Cannon said. A Turing machine is more or less "the scientific definition of a computer" that can read, write, and process data; it can also make conditional decisions based on the data. while is available to make decisions, def and function calls for processing data, assignment for reading and writing, and so on. Things like try and except add the Python flourishes.

Examples

On his blog, Cannon has unraveled over 80 other pieces of syntax, but was not able to do all of that in a talk. Instead, he would give some examples that would hopefully give attendees an idea of what it entails. In addition, he hopes that it gives people a look under the hood in order to better understand how Python operates.

He started with a simple example of attribute access for the expression X.Y. That can be replaced with:

    getattr(X, "Y")
Fundamentally, every attribute access in Python is reduced to a call to getattr(). It quickly gets tiring to write attribute accesses that way, however, especially for long chains of them; in fact, he would continue to use the sugary version for the other examples in his talk, he said.

Sometimes, the language definition makes things a bit interesting, as with handling decorators. A Python decorator wraps a function definition, generally to transform the function in some fashion:

    # pre-decorators
    def Y():
        pass
    Y = X(Y)

    # using decorators
    @X
    def Y():
        pass
The decorator syntax was added to make it clearer what was going on right at the top of the function definition; if the function is long, it may be easy to miss that it is getting wrapped and re-assigned. But, as it turns out, the unraveled version cannot be the same as the pre-decorator version above; instead it must look like the following:
    def _Y():
        pass
    Y = X(_Y)
The language specification specifically requires that the new function name (Y) not be visible until after the wrapping has occurred, which surprised him when he was working on this. He believes that the decorator syntax is far more readable and is a great example of why syntactic sugar is added to the language. He knows that Python developers do not always agree with changes of that sort, pointing to the mess around the "walrus" operator (:=) as an obvious case in point, but "we have our reasons" for doing so.

He moved on to "special methods" (or "magic methods"), which are all of the "dunder" (double-underscore) method names on objects, __add__(), __index__(), __init__(), and so on. These are an area where syntactic sugar comes up the most and is the easiest to unravel, because those methods fundamentally implement the operations of interest. For example, ~X (using the bitwise inversion operator) unravels as: type(X).__invert__(X). That may be a little surprising, but the type() call is actually an optimization mechanism; it is much faster to retrieve __invert__() from the class, than from the instance dictionary. In the C code, it just requires following a few pointers to make the call:

    (*Py_TYPE(X)->tp_as_number->nb_invert)(X);
Even though dictionary lookup is fast, the above is "stupid fast", he said.

He moved on to the "most complex example we're going to have in this whole talk: binary operators". He said that attendees may think those operators should be fairly simple to unravel, but they will be surprised. His example was a simple subtraction: X ‑ Y. The operator does not really matter as the process is the same; he chose subtraction because he wanted to show an operator where the order of the operands matters. For subtraction, the first step is:

    type(X).__sub__(X, Y)
That seems pretty straightforward, but what if X does not define __sub__() or if it returns NotImplemented for the type of Y? In that case, the second operand gets a chance to provide an implementation:
    type(Y).__rsub__(Y, X)
The language defines the __rop__() versions of the operators to handle this case. The dunder operators return NotImplemented if they are unable to work with the two objects passed. Another optimization is that if both operands have the same type, only the first dunder operator (__sub__() in this case) will be tried, since there is no point in asking again as it is already known that the two types cannot be accommodated.

But there are class hierarchies where a subclass needs to be able to override the parent class's operator, no matter which side of the operator it falls on. For example, if Y is a subclass of the class of X, the class of Y may want to ensure that subtraction always results in that type. There is a set of rules that Python follows to ensure that such a situation is properly handled: the two types must be different, the right-hand side (RHS), Y in this case, must be a subclass of the left, both operands must have an __rsub__(), and the __rsub__() for Y must be different than that of X. If all of those are true, the RHS can boss its parent around. "Got it all?", he asked with a smile.

That is quite a bit of work to do every time that a binary operator is executed. That is why various efforts are made to avoid that overhead, such as Python extensions that move the operations into C code and the work being done on specialization by the Faster CPython team.

As his last example, he deconstructed if:

    if X:
        Y

    # can use while instead
    while X:
        Y
	break
However, that while version uses break, which is not one of the ten anointed syntax constructs. But break can be unraveled as well:
    _DONE = False
    while not _DONE and X:
        Y
	_DONE = True
But that contains two more pieces of syntax (not and and) that are not on the list. not is similar to the invert operation; it is just a special method call. In the talk, he did not discuss and at all, but for this particular case, he simply avoided both operators by using exceptions:
    try:
        while X:
	    Y
	    raise _DONE
    except _DONE:
        pass
"Once again, syntactic sugar is really frickin' handy, because I sure as heck would not want to write that every time I have an if." He almost forgot to point out that pass was also not on the list, but it could be replaced here with various things, such as None or a constant. His blog posts on not and and show other ways this could be handled.

He ended his talk by thanking his wife Andrea for proofreading most of the blog posts; he stopped asking after a while, so she is not responsible for typos in the later posts, he said with a chuckle. He also thanked Guido van Rossum for proofreading the class unraveling post, "which was huge". Beyond that, Van Rossum was always happy to answer the random questions Cannon posted to the python-dev mailing list about why things are the way they are in the language. Cannon suggested that those who are interested read the blog posts, perhaps starting with the last, "MVPy: Minimum Viable Python", which outlines the goals of the project and lists his ten syntactic elements.

In answer to an inaudible question, which must have referred to names like False and None, Cannon said that those were originally simply variable names and not syntax in Python, so he went back to that mode for his project. Those names were only turned into syntax as an aid to developers so that they did not overwrite them by mistake. The final question was about the != operator used in the subtraction example, which is, of course, also not on the list. Cannon acknowledged that, and pointed attendees at the rich comparison operators post for how that can be unraveled.


Index entries for this article
ConferencePyCon/2023
PythonLanguage specification


(Log in to post comments)

Python is (mostly) made of syntactic sugar

Posted Aug 29, 2023 16:34 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

I would highly recommend reading his class blog post (linked at the end of the article). It is very interesting and gives a great explanation of classes and metaclasses in Python.

(Also, it is yet more additional evidence that zero-argument super(), despite being undeniably an improvement over classic two-argument super(), is an ugly hack internally.)

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 1:05 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

By this logic, isn't everything syntactic sugar over decrement and branch if zero? https://en.wikipedia.org/wiki/One-instruction_set_computer

A broader lesson, I think, about language design is that a "magical" standard library is a "smell". The ability to implement a language's standard library in the language itself is a sign that the language is sufficiently regular and powerful for general purpose use. The smaller the irreducible core, the better the language. C++, Java, Rust, and Python are all pretty good (if not perfect) by this metric. Go is at the other end of the spectrum, with a standard library full of magic.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 2:06 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

I agree that the definition of "sugar" used here is greatly unsatisfactory.

When Rust says that its "for" loop is just sugar, they don't mean, "You could hypothetically use a different syntax to write an equivalent loop" which isn't an interesting situation at all, they mean very literally that a de-sugaring algorithm takes your syntax and produces the hairier but semantically equivalent explicit "loop" syntax. As a result Mat Godbolt's question about C++ which led to the eventual creation of Compiler Explorer on his vanity domain would not arise in Rust, "Does the fancy modern for-each syntax give the same machine code as my bare metal approach?" Yes, it is literally transformed by the compiler before optimisation.

On the other hand I can't agree about the "magical" smell.

In Rust for example, you will find that not only does the Rust standard library and especially core _frequently_ use nightly features, it uses a lot of "nightly" features that are deliberately labelled as private internal features, these won't ever be "stabilised", and exist precisely to achieve the "smell" you don't like.

You can't implement Box yourself, it's less magical than it was ten years ago, but it's still magic, you're not allowed to do that. I have a published crate "nook" with niche types, but you can't use that in stable Rust, and you shouldn't use it even in nightly Rust because it's unsanctioned use of compiler internals, it's just a proof of concept I might return to some day.

In C++ there's no need for such explicit use of stuff that's off limits because the standard authorises standard libraries to just do whatever they want, if they have the blessing of the C++ implementer, so it's completely routine for the stdlib to be riddled with UB that has just been blessed (hopefully) by the compiler vendor as OK. Is it truly OK? Nobody cares.

In some cases the magic is just unavoidable, but it's also striking that Rust and C++ magic doesn't totally overlap. Much of MaybeUninit<T> is completely ordinary legal Rust you could have written. It's not all _safe_ Rust but it's not using any weird internals or magic. Meanwhile the equivalent C++ tricks are pure magic, the compiler vendor says it's OK for some reason to use this uninitialized memory - and everybody crosses their fingers. Given it took Rust a few years to realise their original design was wrong and invent MaybeUninit<T> who knows if those C++ compiler vendors were right.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 6:17 UTC (Wed) by rsidd (subscriber, #2582) [Link]

> When Rust says that its "for" loop is just sugar, they don't mean, "You could hypothetically use a different syntax to write an equivalent loop" which isn't an interesting situation at all, they mean very literally that a de-sugaring algorithm takes your syntax and produces the hairier but semantically equivalent explicit "loop" syntax.

Indeed, I had understood "syntactic sugar" to mean something purely to make the programmer's life easier, which is de-sugared by the compiler or interpreter into a 100% equivalent construct. Like how "a[i]" in C means "*(a+i)", and, weirdly, "i[a]" therefore means the same thing. Or, in python, list comprehensions are syntactic sugar for map + filter.

Whereas, his example of "while" being required, and "for"/"if" being definable in terms of while, would be an example of syntactic sugar only if the language itself defines "for" and "if" as convenient syntax for "while". AFAIK that's not how python handles "for" and "if", so those aren't syntactic sugar for python. They could be for some other language.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 9:30 UTC (Wed) by jwilk (subscriber, #63328) [Link]

> in python, list comprehensions are syntactic sugar for map + filter

No, they're not (in the strict sense you're poposing).

Python is (mostly) made of syntactic sugar

Posted Aug 31, 2023 6:07 UTC (Thu) by rsidd (subscriber, #2582) [Link]

I stand corrected: I was probably misled by articles using "syntactic sugar" in the loose sense. Though I don't see why it couldn't be done as syntactic sugar, ie what does it add to map/filter?

Python is (mostly) made of syntactic sugar

Posted Sep 5, 2023 17:27 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

As of Python 3.6, you can use async for in a list comprehension to process an asynchronous iterator (i.e. an iterator whose next()-equivalent has to be awaited). As far as I can tell, there is no map/filter-like API for this functionality, so it has to be written either as a comprehension or as an explicit loop.

(Of course, you could probably implement your own async map function, but I'm just focusing on the stdlib as it exists out of the box.)

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 8:47 UTC (Wed) by gmgod (subscriber, #143864) [Link]

By that definition, most languages are syntactic sugar... E.g. everything that has a LLVM frontend would be considered as "just syntactic sugar for LLVM IR", which anyone can use directly to code portable programs. In my definition of sugar, they are a convenience that does not change the guaranties and behaviour of a program (i.e. its semantics).

It's nitpicking on a title (that I find bad, obviously) but also has deeper implications: what is sugar and what is canonical is mostly subjective and by that definition, any of the vast majority of what defines any specific language is only sugar. E.g. he chose `while` as a basic construct, when someone else could have used conditionals (e.g. the functional way). E.g. #2, assignments are absolutely not required to write a python program as they can be replaces by function calls.

Actually there is prior work to this, and a lot of it. Probably the best known nowadays is the brainfuck language. It is Turing-complete with 8 simple commands (including 2 for IO). So by this standard, all can be expressed with byte/pointer increase/decrease and conditional jumps... A different set of canonical operations. Everything else is sugar since everything else is optional. (NB: the purpose of the language was to be minimal to write the smallest possible compiler, however minimalism comes with a search for the smallest and easiest-to-express set of canonical constructs in a language, which is the subject here).

At any rate, and that's the most important point, something like a `for`-loop and a `while`-loop don't have the same semantics: they don't offer the same guaranties; even more so in python where `for` uses iterators and therefore can bypass boundchecks and would be able, if it was an optimizing interpreter, to potentially vectorize most `for`-loops as well. They do similar things, ultimately one can be expressed as a subcase of the other so yes, one can be considered redundant to achieve Turing-completeness one could be implemented as sugar for the other in a specific language but semantically, they are not equivalent and it's not just a matter of "hidding away some boilerplate".

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 13:31 UTC (Wed) by farnz (subscriber, #17727) [Link]

This is perhaps why this definition of syntactic sugar (something you can implement in terms of a "core" language) is not as useful as the other definition, of syntax whose meaning is fully defined in terms of a different language construct. By this definition, Rust for loops are syntax sugar, but Python doesn't have any syntax sugar in the core language (since even comprehensions are defined as their own thing, not in terms of other language constructs).

This definition of "syntax sugar" is useful, since it allows you to identify the parts of the language that cannot add new semantics (since their semantics are defined by the language constructs you use to desugar them). In theory, for example, there could be something you can express with a list comprehension in Python that cannot be expressed any other way, and you'd have to look into the details of how they're defined to be sure that they don't add to the semantics of the language.

Note, too, that even under this definition of syntax sugar, the implementation does not have to desugar before processing a sugared statement - a Rust compiler can directly process a for loop, as long as it's equivalent to desugaring a for loop into the equivalent defined by the language and then processing the desugared form. Some people reserve the term "syntax sugar" for things that are "elaborated" into the core language before processing by the implementation (i.e. things that are effectively preprocessed before compilation or interpretation), but I find that definition too strict; it would, for example, exclude the Rust for loop, since while this is equivalent semantically to a desugared form, the implementation doesn't necessarily desugar then compile.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 14:02 UTC (Wed) by gmgod (subscriber, #143864) [Link]

I think we are basically saying the same thing in your two first paragraphs.

> the implementation does not have to desugar before processing a sugared statement

That's where we don't agree. If you tell me

for i in range(10):
print(i)

desugars into

i = 0
while i < 10:
print(i)
i += 1

it has to do exactly that so that the sugared version and the unsugared ones are strictly considered the same by the language, i.e. swapping one for the other does the same, feature for feature, bug for bug.

So in Python, the interpreter does not have to explicitly convert one to the other, but the generated AST has to be the same, always, guaranteed... otherwise it's not "sugar". If one version is faster or more memory efficient or processing things in other order or blatantly doing something similar than the other, even if "in spirit" they are the same thing, it's not syntax sugar as generally accepted. It's a different implementation with different properties and there is no "desugaring" relationship between them, i.e. one implementation can evolve differently from the other.

What the guy doing this work originally expressed is the fact that Turing-completeness in Python is achievable with fewer constructs than what is made available since redundant construct *can* be implemented as others. All I was saying is that I don't like his definition of "syntactic sugar" because if you look carefully at it, it is useless as anything beyond the 8 instructions defined in BF (among others) are "sugar" and the set he is choosing as canonical is eventually arbitrary.

It's not a critique of the work itself: it's pretty fun to read actually! It's just that when you start with a wobbly dichotomy that you use constantly for classification, it just gets wobblier.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 15:01 UTC (Wed) by farnz (subscriber, #17727) [Link]

I don't think your disagreement is of any value; you cannot observe the intermediate ASTs between the fully sugared one that you start with, and the bytecode that the Python interpreter executes, and as such, it's not relevant whether or not there exists an intermediate AST where the desugaring has happened, or whether the implementation acts as-if such an AST exists.

What matters is that if I tell you that:


for i in range(10):
    print(i)

desugars into


i = 0
while i < 10:
    print(i)
    i += 1

then there must be no way for you to observe a difference between the two forms - no performance difference, no behaviour difference, nothing.

Saying that there has to be an intermediate AST that is in the desugared form simply forces the implementation down a lower performance path even if there's routes to generating the output code directly from the sugary form that's faster but semantically the same.

And it's the semantics that matter - if I say that "for" desugars like the above, what I'm saying is that the semantics of a for loop are exactly the same as the semantics of the while loop it desugars into - there's no distinction between the two, bar readability.

His definition has the problem that the semantics are close, but not necessarily the same. It also has the problem that there are other ways to minimise the Python language that come to a different set of "required" constructs; he seems to be doing a two-step thing of defining a minimal Python subset, and then showing that you can implement the rest of Python (more or less) as syntatic sugar atop that minimal subset.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 15:35 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

IMHO if you're going to require this level of equivalence, then the only way to desugar Python is to talk of the C API (every statement in Python can ultimately be rewritten as a bunch of C calls that the bytecode interpreter otherwise would've made on your behalf). But then you're not really speaking of "sugar" so much as the entire interpreter frontend.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 15:09 UTC (Wed) by Wol (subscriber, #4433) [Link]

Makes me think of both FORTRAN, and DataBasic.

The first FORTRAN compiler I met basically compiled everything into a load of function calls. I think the only stuff it actually compiled into machine code was the control structures, IF and GOTO. (And presumably basic arithmetic.)

So pretty much the entire rest of the language could be described as syntactic sugar, defined in terms of the underlying function calls.

That was a Pr1me compiler, so I guess the people who wrote InfoBasic (the Pr1me version of Pick) knew all about this, and the language was defined as "what these function calls do". Over time, the core function calls were converted into statements the compiler recognised, so yes, I'd probably describe almost all of DataBasic as syntactic sugar, because it's defined as "what the underlying function calls do". It's just that a statement is much easier to understand what on earth is going on.

Cheers,
Wol

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 15:28 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> E.g. he chose `while` as a basic construct, when someone else could have used conditionals (e.g. the functional way).

That doesn't work. Python has no tail recursion and a fixed maximum call stack depth. You're giving up Turing completeness if you convert everything to recursion.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 15:39 UTC (Wed) by mb (subscriber, #50428) [Link]

>Python has no tail recursion and a fixed maximum call stack depth.
>You're giving up Turing completeness if you convert everything to recursion.

By that definition no recursive algorithm is turing complete on any real machine.
All machines and languages have some fixed maximum call stack depth limit, because memory is limited.

Real turing completeness requires infinite memory. And because we don't have that, yet, we usually ignore memory limits.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 17:48 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

It's not that easy.

When we speak of "limited" memory in the context of Turing completeness, we usually mean limited *heap* memory. Heap memory is only strictly bounded by two things:

1. The amount of physical memory (plus swap) in the system.
2. The size of the virtual address space.

In practice, we don't see this as a "real" limit in the context of Turing completeness, because (1) can be remedied by adding more RAM (or swap), and (2) is huge on 64-bit systems.

However, the Python recursion limit is bounded by the size of the C stack.[1] If you try to increase it beyond what the stack size can handle, it'll overflow and cause UB. To some extent, it is possible to increase the size of the stack at runtime, but most modern operating systems are designed under the assumption that your stack size is no larger than (say) several megabytes or so. If you want to use a multi-gigabyte stack, I imagine that at least some operating systems will let you do that, but I find it hard to believe that they will perform gracefully under those conditions.

The other problem, of course, is that Python does not offer a standard facility for increasing the size of its C stack. You have to go monkeying around with ctypes or something and call into an OS-specific C API, so you can argue that "make a bigger C stack" is not within the competency of the language proper. In this interpretation, the Python recursion limit can only be safely adjusted up to whatever the default stack size will accommodate, and then it's not a matter of "limited memory" at all (the default stack size is a fixed value that does not depend on the amount of physical memory installed).

[1]: https://docs.python.org/3/library/sys.html#sys.setrecursi...

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 18:48 UTC (Wed) by mb (subscriber, #50428) [Link]

What's your point? I know how Python and C work w.r.t. the stack.

It's been said, that recursive Python cannot be turing complete, because recursion is limited.
But that would mean that no recursive language could ever be turing complete, because they all hit a limit at some point.
Therefore, I don't consider the original statement as valid.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 19:18 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> But that would mean that no recursive language could ever be turing complete, because they all hit a limit at some point.

Most if not all "recursive" languages (as you call them) do tail call elimination as a mandatory language feature (i.e. the language explicitly specifies that tail calls will not cause stack overflow), which is enough to give you access to the equivalent of an unrestricted while loop. The fact that you can also write non-tail recursion is irrelevant, because you could always rewrite it as tail recursion with a separate stack that is passed as an explicit parameter.

Python, on the other hand, not only does not provide tail call elimination as a language feature, it does not even provide it as an optimization.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 19:48 UTC (Wed) by mb (subscriber, #50428) [Link]

>because you could always rewrite it as tail recursion with a separate stack that is passed as an explicit parameter.

Which will eventually be full, just like the Python call stack.
There's no fundamental difference. It's just an implementation detail.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 19:51 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

The "separate stack" would be a heap-allocated object, so as I already described upthread, its limit is much closer to the "we can just shove more RAM in the computer as needed" model than the "there is a static limit that must not be exceeded" model.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 15:30 UTC (Wed) by mb (subscriber, #50428) [Link]

I don't think the way syntactic sugar is defined here is very useful.
If we can define anything that results in the same thing as sugar of the other thing, then all languages are just syntactic sugar of assembly code. And assembly code is just syntactic sugar of microcode, and so on, until we hit quantum physics limits.

IMO a much more useful definition would be:
Syntactic sugar is, if a compiler during compilation *actually* transforms one language construct into another language construct of the same language *and* that second language construct could instead have been written by the developer just like that in the first place. If both of these conditions are met, then the first construct is sugar.

Everything else is just an alternative or an abstraction, but not sugar.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 15:51 UTC (Wed) by Wol (subscriber, #4433) [Link]

I'd go along with that. Just phrase it differently:

Syntactic sugar is when certain language syntax is defined in terms of other features *of the same language*.

Actually, I don't think our definitions are quite the same, in that your definition is subject to compiler bugs, and mine to implementation bugs, but in a perfect world they're identical.

But it's the inclusion of "using the same language" that is crucial to the definition of sugar - as soon as you appeal to a different language it can't be sugar.

Cheers,
Wol

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 15:57 UTC (Wed) by mb (subscriber, #50428) [Link]

>But it's the inclusion of "using the same language" that is crucial

Yep, but it's also crucial to include either my "the compiler actually does..." or your "the language actually defines...".

It's not enough if the language *could* implement/define it as such, because then you and up with claims that 'if' is just sugar of 'while'. Which is just as arbitrary as defining it the other way around with recursion.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 16:55 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

How would you classify Haskell Core?

https://serokell.io/blog/haskell-to-core

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 19:55 UTC (Wed) by Wol (subscriber, #4433) [Link]

Looking at that article, it appears Haskell is actually defined in terms of Core.

So imho none of Haskell is syntactic sugar, in that no features are defined in terms of other *Haskell* features. But Haskell itself is a sugar layer over Core.

Bit like git - you have your core (the plumbing) and then the porcelain on top. The porcelain here IS syntactic sugar, because it's defined in terms of the underlying plumbing, but it's all git, it's all fundamentally the same language.

And yep, I'm well aware that this (taken to its logical conclusion) defines pretty much any language as a sugar layer over assembly.

But, certainly at first glance and not knowing Haskell or Core, they look rather different. What would happen if you fed pure Core into a Haskell compiler? Would it complain? If the answer is "yes", then Haskell is not Core Syntactic Sugar. If "no", and the Haskell constructs are defined using Core, then it is.

Cheers,
Wol

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 17:35 UTC (Wed) by farnz (subscriber, #17727) [Link]

I think a more charitable explanation is that he's done two things: first, defined a minimal language that's a subset of Python. Then, he's demonstrated that he can define the rest of Python as syntactic sugar atop his minimal language.

This is an interesting exercise, but it carries with it the issue that his minimal language is not the only minimal subset of Python in which you can define the rest of the language as syntactic sugar.

And I disagree with your definition of syntactic sugar - the second half is good, but the first half is unnecessarily implementation-focused. I prefer the definition of syntax sugar where A is syntax sugar for B if you could have written B instead of A, and without looking at the source code, there's no form of A that can be distinguished from B in terms of the implementation's output (by behaviour, performance, or inspecting object code or bytecode), which permits the compiler room for as-if implementations where it (e.g.) directly lowers A into the same IR as it lowers B into, without ever converting A into B.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 18:52 UTC (Wed) by mb (subscriber, #50428) [Link]

>which permits the compiler room for as-if implementations where it
>(e.g.) directly lowers A into the same IR as it lowers B into, without ever converting A into B.

With that definition every optimization input is syntactic sugar.

Python is (mostly) made of syntactic sugar

Posted Aug 31, 2023 9:25 UTC (Thu) by farnz (subscriber, #17727) [Link]

No, because that's not part of the definition of syntactic sugar; that's something syntactic sugar defined the way I prefer it to be defined permits (in that it permits the compiler to do something optimal instead of having to go the slow route).

The definition is that A is syntactic sugar for B if I could have written B instead of A in my source code, and there is no way to tell whether I wrote A or B without inspecting the source code; I accidentally omitted that A's semantics are defined in terms of its rewrite to B, not in their own right.

Python is (mostly) made of syntactic sugar

Posted Aug 30, 2023 21:16 UTC (Wed) by dvdeug (subscriber, #10998) [Link]

"The first element that you need is integers, which are the building blocks of lots of other things. A float can be represented as integers..."

Integers can be represented as booleans, can't they? A float could be represented as integers, but at a hardware level, unless you've got floating-point emulation, they're very different from integers, and there's no practical way that a compiler can turn actions on integers into floating point code. (That's an interesting definition of syntactic sugar, somewhere between "will it always run the same code" (proposed in these comments by tialaramex) and the "theoretically equivalent", used by the article, that is "could a good optimizing compiler produce the same code".

Python is (mostly) made of syntactic sugar

Posted Sep 4, 2023 7:29 UTC (Mon) by cpitrat (subscriber, #116459) [Link]

Didn't go far enough. All you need is NAND gates, the rest is just syntactic sugar.


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds