PEP 204 - Range Literals: Getting closure

Almost 23 years ago PEP 204 was rejected. It is arguably the single most requested feature over the years, appearing in python ideas mail list more frequently than I can remember. It is about the infamous “Range Literals”.

The final resolution was:

After careful consideration, and a period of meditation, this proposal has been rejected. The open issues, as well as some confusion between ranges and slice syntax, raised enough questions for Guido not to accept it for Python 2.0, and later to reject the proposal altogether. The new syntax and its intentions were deemed not obvious enough.

This proposal is over two decades old and wanted to reuse slice syntax standalone to generate a list the same as the old range function would do. With a few caveats and undefined behaviors.

Under the light of all the changes Python received since then (latest version was Python 1.5 at the time), we can see that the concept was really poor and thus it made sense to be rejected. It wasn’t until two years later that iterators got properly introduced, for example (PEP 234). Can you even write code without iterators, nowadays?

But the one thing got me to write this here was at the very end, it says:

[ TBD: Guido, amend/confirm this, please. Preferably both; this is a PEP, it should contain all the reasons for rejection and/or reconsideration, for future reference. ]

Can we get either of:

  1. Some closure. Have GvR or someone from the council to edit this with all the motive and put a nail in the coffin.
  2. Create a new proposal using all the modern technology and knowledge acquired by the collective Python community to think of a range literal syntax and functionality and then that one be rejected (or accepted, who knows) formally so this question won’t again pop up.

Best regards

3 Likes

+1 This is worth a look with fresh eyes. We’ve had almost a quarter century of evolution since this was first considered.

3 Likes

If you want to drive a fresh look at this, please create a new PEP. Don’t reopen 204. The new PEP can refer to 204 though.

1 Like

Personally, I would like very much to see a new PEP regarding range literals and I (like probably everyone else in the community) do have some opinionated views on what range literals could be so I wish for maybe some dialogue before a PEP is written. Am I right?

In any case, here is my take on what range literals would be if we reuse slice syntax for consistency, but lifting the several limitations from the original document:

Notice how the square brackets aren’t part of the slice, so there is no requirement the range literals have them even if we are going to follow slice syntax.

mylist[start:stop:step]
mylist[slice(start, stop, step)]

Also multiple slices are already accepted separated by commas, so even with slice syntax, they may not be surrounded by square brackets directly.

mylist[1:2, 3:4, 5:6]

Going crazy for a minute:

class GimmeSlice:
     def __getitem__(self, thing):
         return thing

>>> GimmeSlice()[1:lambda x: 3: lambda y: 5, [x for x in range(10)]:[]:f'{1+1}']
(
	slice(1, <function <lambda> at 0x116cb2ca0>, <function <lambda> at 0x116cb19e0>),
 	slice([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [], '2')
)

We can already write arbitrary code separated by colons with some set precedence (see the unparenthesized lambdas). This shows there isn’t much separating us from using slices anywhere, same to what happened to the Ellipsis syntax that was reserved for getitem only.

That way it would be ok to make a range literal something like:

foo = 1:10
bar = :len(foo):2

for x in 1:2:
    for y in 1:2::
        print('The last colon is the one requested by the for block.')

my_dict = {'a': (1:5)} # Parenthesis could be required here

my_range = 1 : 10 : 2  # spaces allowed same as getitem syntax.

With a simple class like this we can cover all of the use cases of the range functions and raising an error on invalid use cases, such non integers and missing stop:

class Range:
    def __getitem__(self, a_slice):
        if not isinstance(a_slice, slice):
            return range(a_slice)
        return range(
        	a_slice.start if a_slice.start is not None else 0, 
        	a_slice.stop, 
        	a_slice.step if a_slice.step is not None else 1
        )

Range()[1:10]
Range()[3:15:2]
Range()[:10]  # (or also Range()[10])
Range()[1:] # Error it could also be an infinite generator, such as itertools.count

But now here is an important question. Why slices are just a 3 item struct with no functionality other than a method called indices which the docs say “You probably do not want to use this function.”. That’s right, no one does.

Can’t the slices class be extended to generate useful things? For example, why can’t we reverse the roles and instead of a class object handle a slice, the slice handle the class object, like itertools.islice.

For example, the iter requests the class to return an object and they could all be objects, in fact there are some low level classes for each of the data types defined in C:

>>> iter([])
<list_iterator at 0x116f34760>
>>> iter('')
<str_ascii_iterator at 0x116e69420>
>>> iter(b'')
<bytes_iterator at 0x1181abf10>
>>> iter(())
<dict_keyiterator at 0x116c49300>
>>> iter(range(1))
<range_iterator at 0x1181abea0>

Among other types like tuples, sets, dict values, dict items and so on.

But what if we want more, what if we could pack everything range, slice, islice, iterators into a single powerful entity and then define a special syntax for it.

class int:
	def __range__(cls, start, stop, step):
		if start is None:
		    start = 0
		if step is None:
		    step = 1
		if stop is None:
			return itertools.count(start, step)
		return range(start, stop, step)

class str:
	def __range__(cls, start, stop, step):
	    if not start or not stop:
	    	raise ValueError('start and stop required')
	    start = ord(start)
	    stop = ord(stop) + 1  # closed range
	    if step is None:
	    	step = 1
	    return map(chr, range(start, stop, step))

class float:
    def __range__(cls, start, stop, step):
        return np.arange(start, stop, step)  # replace numpy with a similar implementation

class slice:
	def range(self)
		dtype = None
		if self.start is not None:
		    dtype = type(s.start)
		if self.stop is not None:
			if dtype and dtype is not type(self.stop):
			    raise ValueError('start and stop must be of same type')  # Maybe we could check if they share the same __slice__ method as in both subclass a same class
		    dtype = type(self.stop)
		if dtype is None:
		    dtype = int # arbitrary for integer range compatibility

	    return dtype.__range__(self.start, self.stop, self.step)

Then list(1:5) would execute as:

list(slice(1, 5, None).range())
list(int.__range__(1, 5, None))
list(range(1, 5))
[1, 2, 3, 4]

And this 'a':'f' would execute as :

slice('a', 'f', None).range()
str.__range__('a', 'f')
<generator at 0x123456>

Which would become 'abcdef' with the help of ''.join()

I realize there could be some clash with the annotation syntax, but might be possible to overcome with parenthesis whenever they can appear together.

4 Likes

I like your “going crazy” @jbo !

I don’t fully understand why str ranges would be inclusive ['a', 'f'], while int ranges are open [1, 5). Please elaborate.

Note slice has fewer methods than range. For one, slice is unhashable, Also slice doesn’t have e.g. __contains__.
For slice and range to be used interchangeably, their API needs consideration.

I’d also welcome Cartesian products of ranges: (3, 4) in 0 : 100 @ 0 : 100.
And when continuing: annotations with ranges to document a function’s (co)domain, e.g. in module math: def asin(x: -1.0 : +1.0) -> -pi/2 : +pi/2: ....

I see many challenges and objections (parsing by humans and computers is one of them), but if some of this would become real, I’d definitely use it.

1 Like

While ranges and slices appear similar, their usage and behaviour is quite different, so, as mentioned on the rejection of the PEP, I think it’s confusing to reuse the slice syntax for ranges. I’d expect that the syntax would make a slice, but a slice isn’t iterable so that wouldn’t work. Why would the same syntax make a slice in one place and a range in another?

Slices have the benefit of flexibility in interpretation of their meaining (e.g. … In numpy array indexing, or negative values in start/end for indexing from the end of a sequence), whereas ranges are much more constrained, which allows them to be iterable. I can’t see how these things would be combined. When passing an arbitrary objects into the new syntax, what decides how iteration works?

The conflicts with the current uses of colons also seems pretty undesirable, especially the clash with colons starting blocks, although the PEPs usage of brackets fixes this (although has the issue that it looks like a list containing a slice)

An alternative syntax would be using .., which I believe would almost work without conflict, other than in the case 1..end_var, which is currently accessing an attribute on a float literal, a pattern that’s probably (and rightly) rarely used, but would be a breaking change nonetheless. If this was like a current range, it would only work for integers, which is a bit of a shame.

To allow flexibility, maybe a new dunder method would be needed. __range__ could be called on the start object, then fall back to __mrange__ (middle :P) and __rrange__ for stop and step!

Thanks for raising this, it’s an interesting proposal to revisit.

I don’t remember range literals appearing in Python-Ideas even once, so it has appeared more times than I remember too! :wink:

A very quick and in no way exhaustive google suggests that there was a lot of discussion on Python-Dev in 2000 and 2001 when the PEP was written, a short discussion again in 2006 on the Python-List and Python-3000 mailing lists, and very little apart from that.

When I started learning Python, I carried the Pascal-like iteration idiom into my code:

for i in range(len(mylist)):
    item = mylist[i]
    process(item)

but even in Python 1.5 days that was unpythonic code unless the item is being modified and re-written back to the sequence. In modern Python, even that no longer requires range:

for index, item in enumerate(mylist):
    mylist[index] = process(item)

Are there still common use-cases for range that are frequent enough to justify saving typing a few characters with a literal?

I admit to being almost partial to the Ruby range syntax, except that they get the dots backwards. Clearly 1..10 should be exclusive of 10 and 1...10 should be inclusive of 10. The extra dot should indicate that the range extends further, not less.

I do sometimes use range, but usually interactively in the REPL, when I need to exhaustively search some problem space. Two recent examples:

  1. iterating over a range of floats, using the struct module to cast a 64-bit int into an IEEE-754 float.
  2. iterating over Unicode code points.

In neither case was I particularly interested in the range object itself, or the integer counter except as a means to get something else.

In the first case, looking back now, I could have written an iterator:

def floats(start=0.0):
    x = start
    while True:
        yield x
        x = math.nextafter(x, math.inf)

In the second case, I could have used character range syntax for char in '\0'...'\U0010FFFF' if it existed. But since it doesn’t, I don’t know that my use-case would justify adding it.

I am NOT a fan of being subtly different from another language while having identical syntax. Also, the “two dots or three” thing isn’t particularly clear. If we want an “inclusive or exclusive” syntax, we could just have another character in there:

1^..10 # doesn't include 1, does include 10
1..^10 # does include 1, doesn't include 10
1..10 # includes both
1^..^10 # includes neither

(Pike uses the < character to indicate that a slice should be counted from the end (like negative indexing, but avoiding the confusion of negative zero), which may be worth using for slices in the future, so I’d rather avoid using that character for ranges. ^ should be safe.)

To me, these seem like two independent asks which each should be considered separately.

This would be up to @guido or @thomas . IMO, it would be nice to have if they remember and have the time to share, but it’s been two decades and its really up to them if they want to, given this is purely of historical value at this point.

IMO, if there’s interest, motivation and some amount of consensus, the PEP should put its best foot forward on its own merits rather than existing primarily or solely to get rejected.

I really don’t recall what my reasons were for rejecting PEP 204 at the time other than what was written, and enough has happened since then that it doesn’t matter, really. I presume that Thomas doesn’t recall either. The only way to get the SC’s attention is to submit a new PEP, and this discussion is a good start!

6 Likes

The reason integer ranges are open at the end is because Python (and C) indexes array datatypes starting with zero and therefore the size of the array cannot be an index, which makes it very useful to not include it. String ranges, on the other hand, would be based on real world usage such as regex ranges ([a-zA-Z]) and they use the arbitrary unicode codepoints where you don’t necessarily need to know what is the character after Z or Ω to be able to reference it at the end of your range.

As of Python 3.12, you can hash slices as long as start, stop and step are all hashable: See here

The way I described, any class would be able to define a magic __range__ method, so you may be able to do define that yourself or wrap the existing range objects in your own class, for example. But I think the best way right now for your tuple check might be Structural Pattern Matching

The syntax would build a slice in both places, as you can see at the bottom of my comment. But we would have to change the slice object to be either iterable or return the range when accessed outside the __getitem__ syntax. It would also benefit users of __getitem__.

To be honest, I do think Ruby has a nice looking range literal except the syntax for the step which doesn’t feel in place. The only issue I see for the triple dot version is the clash with Ellipsis object, since you might not be able to treat it as an operator (separator?) and a name depending on the context. Also ......... (aka 9 dots) would mean range(Ellipsis, Elipsis)?

That is a good argument against the ruby syntax that arguably inverts .. and ..., but my other point is that we rarely, if ever, need the distinction of closed/open ranges as, for integers is quite easy to skip/add one element and strings, as said above, are likely to be used with the exact literals. Even for floats, you might not even care as float precision might forbid you from even getting the element you want.

I believe each datatype can decide for itself what range should be output based on the real world usage. For integers, we have an open interval range class already. For strings, close interval regex ranges, for floats, we have numpy.arange which is also an open interval. None of these even allow you to flag if you want the end or not (let alone the start which is always included).

+1.

1 Like

Yes, that’s correct, I don’t. Yesterday I couldn’t even remember something that happen in 2021, let alone 2001 :stuck_out_tongue:

For a new PEP on the same subject, I would strongly recommend focusing on what the proposal would allow that’s currently hard, hard to read, hard to get right or inefficient. Twenty years ago I don’t think we had any such examples.

1 Like

Iteration on greed by

for x, y in 1:10 @ 1:10:

instead of

for x range(1, 10):
    for y range(1, 10):

would be jolly nice

I think that’s a good approach.

Since those days, range() has turned into an iterator and we also have enumerate(), plus a number of specialized iterator or range-like functions for other purposes (e.g. in numpy et al.), so having a proposal for literal with limited functionality would have to make a really good case, to warrant the complexity addition to the language.

I once came up with the idea of adding slicing to iterators (similar to itertools.islice()), but then dropped the idea again, since it would probably cause more confusion than do good, with iterators keeping state:

People would likely start using them with slices as ready-to-use source of numbers, assuming they get reset after each such use, e.g.

integers = itertools.count()
for i in integers[:5]:
   print (i)
for i in integers[:5]:
   print (i)

giving:

0
1
2
3
4
5
6
7
8
9

and not

0
1
2
3
4
0
1
2
3
4

Which is generally not recommended. due to fp imprecision, it can lead to surprises. numpy.linspace() is usually a better ides (closed interval, totally different specification).

Which to me really weakens this proposal – the requirements for a “range-like” for different types are different (as you said for strings), it would be more confusing than helpful to have the same API for all of them.

All that being said, I think a range/linspace function for floats would be useful in the stdlib, so this would be a way to add that :slight_smile:

There was a proposal a couple years ago for that in the python-ideas list – it petered out, but I’d like to revive it some day.

1 Like

I’d quite like [a:b:c] to be short for range(a, b, c) and have whatever else supported by range itself (e.g. the @ outer product would be nice to have, and range(start, None, step) as a synonym of itertools.count(start, step) being [start::step] would make sense).

2 Likes

That’s fun to do. But when it comes time for a PEP, the best chance of success will come from covering basic needs and not looking weird to most Pythonistas. Some of the speculations in the previous post might doom the PEP and range literals for another two decades :wink:

Look back at other PEPs to see how much contention was faced by simply adding a thousands-separator or adding assignment expressions. Recently, the subinterpreters PEP was also pared down to a minimum viable proposal. Grammar changes such as this one have an even higher hill to climb. Perhaps focus on nicer way to spell: for i in range(10, 20):.

I suggest not being distracted by itertools.islice, generic slicing, or numpy.linspace. Being all things to all people is at odds with being simple, obvious, elegant, and easy to learn. It’s likely that most people will look at the proposal for less than thirty seconds before they decide whether they like it or not. So make sure there is a good “elevator pitch”.

3 Likes

If all this proposal does is duplicate the existing range functionality using fewer characters, is there even any point? We’re not using Python 1.5 any more, iterating over range objects is not as common as it once was.

2 Likes

Decorators come to mind. And I think these would be nice to spell:

for i in [10:20]:  # 10, 11, ...
    ...

for i, j, k in [:2, :2, :2]:  # (0, 0, 0), (0, 0, 1), ...
    ...

You could even go as far as borrowing from numpy’s mgrid/ogrid shorthand for linspaces without introducing that much noise:

for x in [-1:1:5j]:  # -1, -0.5, 0, 0.5, 1
    ...

None of these are life-changing, but all of them are life-improving.

In what way? What sort of decorator uses a range object?

I think your examples of range objects [10:20] and [:2, :2, :2] are ugly and hard to understand. But putting that aside, under what circumstances are you iterating over a range object in 2023?

I also think people have forgotten what it is like to be a beginner to Python and having to face slice notation for the first time. The first time I tried to read Python code, back in 1.5 days or even older, I was so confused by all these for obj in mylist[1:] and new = old[:] slice notations, I just gave up and put Python away for two or three years. This experience has made me very aware that slice notation is not beginner friendly. Powerful and compact, yes, but also cryptic.

Admittedly this was before the first days of the public internet. My Python interpreter was on a CD of Macintosh software that came with a magazine made of actual dead tree. I had no internet connection, there were no mailing lists I could ask for help, no website tutorial I could follow, no StackOverflow. The good old days :slight_smile:

3 Likes