Performance and Alternatives¶
intspan piggybacks Python’s
list. So they both store every integer individually. Unlike Perl’s
Set::IntSpan these types arenot optimized for long contiguous runs. For sets of
several hundred or even thousands of members, you’ll probably never
notice the difference.
But if you’re doing extensive processing of large sets (e.g. with 100K, 1M, or more elements), or doing numerous set operations on them (e.g. union or intersection), a data structure based on lists of ranges, run length encoding, or Judy arrays might perform and scale better. Horses for courses.
There are several modules you might want to consider as alternatives or
supplements. AFAIK, none of them provide the convenient integer span
intspan does, but they have other virtues:
- cowboy provides generalized ranges and multi-ranges. Bonus points for the package tagline: “It works on ranges”
- spans provides several different
kinds of ranges and then sets for those ranges. Includes nice
datetimebased intervals similar to PostgreSQL time intervals, and
floatranges/sets. More ambitious and general than
intspan, but lacks truly convenient input or output methods akin to
- ranger is a generalized range and range set module. It supports open and closed ranges, and includes mapping objects that attach one or more objects to range sets.
- rangeset is a generalized range set module. It also supports infinite ranges.
- judy a Python wrapper around Judy arrays that are implemented in C. No docs or tests to speak of.
- RoaringBitmap, a hybrid array and bitmap structure designed for efficient compression and fast operations on sets of 32-bit integers.