April 10, 2008
Splitting a Python list into sublists
[Edited Dec 6, 2010 to mention another solution based on zip and iter.]
Suppose you want to divide a Python list into sublists of approximately equal size. Since the number of desired sublists may not evenly divide the length of the original list, this task is (just) a tad more complicated than one might at first assume.
def slice_it(li, cols=2): start = 0 for i in xrange(cols): stop = start + len(li[i::cols]) yield li[start:stop] start = stop
which gives the exact number of subsequences, while varying the length of the subsequences a bit if necessary. It uses Python's slicing feature to get the lengths.
That was written in response to an earlier cookbook entry which had the following one-liner:
[seq[i:i+size] for i in range(0, len(seq), size)]
I like that it's a one-liner but don't like a couple of things about it. If your goal isn't a particular sublist length but rather to divide the list up into pieces, you need another line to compute the size. And then it doesn't turn out too well. Suppose you want to divide a string of length 10 into 4 substrings:
>>> size=10/4 >>> size 2 >>> seq = [1,2,3,4,5,6,7,8,9,10] >>> [seq[i:i+size] for i in range(0, len(seq), size)] [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
This leaves us with one substring more than desired.
Try setting size to 3 to get fewer substrings:
>>> size=3 >>> [seq[i:i+size] for i in range(0, len(seq), size)] [[1, 2, 3], [4, 5, 6], [7, 8, 9], ]
This leaves us with dissimilar lengths.
Here's a briefer one-liner using the slice idea, which doesn't require you to compute the length in advance, and does give the exact number of subsequences you want and with lengths that are more appropriately divided:
[seq[i::num] for i in range(num)]
The drawback here is that the subsequences are not actually subsequences of seq; seq is sliced and diced. But, in many situations that doesn't matter. In any case, all the elements are in the output and the subsequences are as close as possible to the same length:
>>> seq = [1,2,3,4,5,6,7,8,9,10] >>> [seq[i::num] for i in range(num)] [[1, 5, 9], [2, 6, 10], [3, 7], [4, 8]]
Update: I just read about a clever and interesting solution involving zip and tier that works in Python 2.7:
>>> items, chunk = [1,2,3,4,5,6,7,8,9], 3
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
Read a full explanation at Go deh! The disadvantage from a practical point of view is that the if the list of items is not evenly divisible into chunks, some items get left out. But I still like it because it's illuminating about the nuances of iterators.