« Hillary's Bosnia Trip and the Fate of the World | Main | Video debunking McCain ads -- spread it. »

April 10, 2008

Splitting a Python list into sublists

[Edited Dec 6, 2010 to mention another solution based on zip and iter.]

Suppose you want to divide a Python list into sublists of approximately equal size. Since the number of desired sublists may not evenly divide the length of the original list, this task is (just) a tad more complicated than one might at first assume.

One Python Cookbook entry is:

def slice_it(li, cols=2):
    start = 0
    for i in xrange(cols):
        stop = start + len(li[i::cols])
        yield li[start:stop]
        start = stop

which gives the exact number of subsequences, while varying the length of the subsequences a bit if necessary. It uses Python's slicing feature to get the lengths.

That was written in response to an earlier cookbook entry which had the following one-liner:

[seq[i:i+size] for i  in range(0, len(seq), size)]

I like that it's a one-liner but don't like a couple of things about it. If your goal isn't a particular sublist length but rather to divide the list up into pieces, you need another line to compute the size. And then it doesn't turn out too well. Suppose you want to divide a string of length 10 into 4 substrings:

>>> size=10/4
>>> size
2
>>> seq = [1,2,3,4,5,6,7,8,9,10]
>>> [seq[i:i+size] for i  in range(0, len(seq), size)]
[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

This leaves us with one substring more than desired.

Try setting size to 3 to get fewer substrings:

>>> size=3
>>> [seq[i:i+size] for i  in range(0, len(seq), size)]
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]] 

This leaves us with dissimilar lengths.

Here's a briefer one-liner using the slice idea, which doesn't require you to compute the length in advance, and does give the exact number of subsequences you want and with lengths that are more appropriately divided:

[seq[i::num] for i in range(num)]

The drawback here is that the subsequences are not actually subsequences of seq; seq is sliced and diced. But, in many situations that doesn't matter. In any case, all the elements are in the output and the subsequences are as close as possible to the same length:

>>> seq = [1,2,3,4,5,6,7,8,9,10]
>>> [seq[i::num] for i in range(num)]
[[1, 5, 9], [2, 6, 10], [3, 7], [4, 8]]

 

Update: I just read about a clever and interesting solution involving zip and tier that works in Python 2.7:

>>> items, chunk = [1,2,3,4,5,6,7,8,9], 3
>>> zip(*[iter(items)]*chunk)
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]

Read a full explanation at Go deh! The disadvantage from a practical point of view is that the if the list of items is not evenly divisible into chunks, some items get left out. But I still like it because it's illuminating about the nuances of iterators.

April 10, 2008 in Python | Permalink

Comments

Thanks for posting this... I was looking for an elegant way of partitioning python lists.

Posted by: Dave Dash at Oct 17, 2008 4:38:35 PM

Thanks for the list comprehension info. I was looking for that solution exactly, thanks for posting it :)

Posted by: Eric Pavey at Sep 28, 2009 1:26:58 PM

Very nice.

Posted by: Dale at Aug 17, 2010 7:18:14 PM

Elegant and uses the language the way it was meant to be used. I looked at the other items in the ActiveState cookbook and thought, "clunky! I just want a way to parse a long list into smaller lists of a given length."

Thanks for the insight.

mp

Posted by: Michael Powe at Aug 24, 2010 6:27:39 PM

As I've tried saying in comment section of Go Deh! blog (and failed), you can get behaviour of slicing and not loosing any items by using zip_longest function instead of zip.

Posted by: nagisa at Dec 6, 2012 2:43:10 PM

As I've tried saying in comment section of Go Deh! blog (and failed), you can get behaviour of slicing and not loosing any items by using zip_longest function instead of zip.

Posted by: nagisa at Dec 6, 2012 2:43:12 PM

>>> seq = [1,2,3,4,5,6,7,8,9,10]
>>> pieces = 4
>>> m = float(len(seq))/pieces
>>> [seq[int(m*i):int(m*(i+1))] for i in range(pieces)]
[[1, 2], [3, 4, 5], [6, 7], [8, 9, 10]]

Posted by: Tom Lynn at Dec 6, 2012 2:50:44 PM

nagisa, thanks for the note about zip_longest, but I'm not sure what it is. Do you mean izip_longest from itertools? That gives a different result, because it puts the remainder items into a separate tuple, filled out out with Nones. Could be useful in some cases, so it's good to know about.


>>> items, chunk = [1,2,3,4,5,6,7,8,9, 10], 3
>>> list(izip_longest(*[iter(items)]*chunk))
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, None, None)]

Posted by: Gary at Dec 6, 2012 3:07:19 PM

Or an integer equivalent:


>>> n = len(seq)
>>> [seq[n*i//pieces:n*(i+1)//pieces] for i in range(pieces)]
[[1, 2], [3, 4, 5], [6, 7], [8, 9, 10]]

Posted by: Tom Lynn at Dec 6, 2012 3:21:37 PM

wonderful solution. thanks for sharing.

Posted by: rodney at Feb 20, 2013 7:49:49 AM

Post a comment