Performance and Alternatives

intspan piggybacks Python’s set type. inspanlist piggybacks list. So they both store every integer individually. Unlike Perl’s Set::IntSpan these types are not optimized for long contiguous runs. For sets of several hundred or even thousands of members, you’ll probably never notice the difference.

But if you’re doing extensive processing of large sets (e.g. with 100K, 1M, or more elements), or doing numerous set operations on them (e.g. union or intersection), a data structure based on lists of ranges, run length encoding, or Judy arrays might perform and scale better. Horses for courses.

There are several modules you might want to consider as alternatives or supplements. AFAIK, none of them provide the convenient integer span specification intspan does, but they have other virtues:

  • cowboy provides generalized ranges and multi-ranges. Bonus points for the tagline: “It works on ranges”
  • spans provides several different kinds of ranges and then sets for those ranges. Includes nice datetime based intervals similar to PostgreSQL time intervals, and float ranges/sets. More ambitious and general than intspan, but lacks truly convenient input or output methods akin to intspan.
  • ranger is a generalized range and range set module. It supports open and closed ranges, and includes mapping objects that attach one or more objects to range sets.
  • rangeset is a generalized range set module. It also supports infinite ranges.
  • judy a Python wrapper around Judy arrays that are implemented in C. No docs or tests to speak of.
  • RoaringBitmap, a hybrid array and bitmap structure designed for efficient compression and fast operations on sets of 32-bit integers.