Python "set" with duplicate/repeated elements
This depends on how you define a set. One may assume that to the OP
- order does not matter (whether ordered or unordered)
- replicates/repeated elements (a.k.a. multiplicities) are permitted
Given these assumptions, the options reduce to two abstract types: a list or a multiset. In Python, these type usually translate to a list
and Counter
respectively. See the Details on some subtleties to observe.
Given
import random
import collections as ct
random.seed(123)
elems = [random.randint(1, 11) for _ in range(10)]
elems
# [1, 5, 2, 7, 5, 2, 1, 7, 9, 9]
Code
A list of replicate elements:
list(elems)
# [1, 5, 2, 7, 5, 2, 1, 7, 9, 9]
A "multiset" of replicate elements:
ct.Counter(elems)
# Counter({1: 2, 5: 2, 2: 2, 7: 2, 9: 2})
Details
On Data Structures
We have a mix of terms here that easily get confused. To clarify, here are some basic mathematical data structures compared to ones in Python.
Type |Abbr|Order|Replicates| Math* | Python | Implementation
------------|----|-----|----------|-----------|-------------|----------------
Set |Set | n | n | {2 3 1} | {2, 3, 1} | set(el)
Ordered Set |Oset| y | n | {1, 2, 3} | - | list(dict.fromkeys(el)
Multiset |Mset| n | y | [2 1 2] | - | <see `mset` below>
List |List| y | y | [1, 2, 2] | [1, 2, 2] | list(el)
From the table, one can deduce the definition of each type. Example: a set is a container that ignores order and rejects replicate elements. In contrast, a list is a container that preserves order and permits replicate elements.
Also from the table, we can see:
- Both an ordered set and a multiset are not explicitly implemented in Python
- "Order" is a contrary term to a random arrangement of elements, e.g. sorted or insertion order
- Sets and multisets are not strictly ordered. They can be ordered, but order does not matter.
- Multisets permit replicates, thus they are not strict sets (the term "set" is indeed confusing).
On Multisets
Some may argue that collections.Counter
is a multiset. You are safe in many cases to treat it as such, but be aware that Counter
is simply a dict (a mapping) of key-multiplicity pairs. It is a map of multiplicities. See an example of elements in a flattened multiset:
mset = [x for k, v in ct.Counter(elems).items() for x in [k]*v]
mset
# [1, 1, 5, 5, 2, 2, 7, 7, 9, 9]
Notice there is some residual ordering, which may be surprising if you expect disordered results. However, disorder does not preclude order. Thus while you can generate a multiset from a Counter
, be aware of the following provisos on residual ordering in Python:
- replicates get grouped together in the mapping, introducing some degree of order
- in Python 3.6, dict's preserve insertion order
Summary
In Python, a multiset can be translated to a map of multiplicities, i.e. a Counter
, which is not randomly unordered like a pure set. There can be some residual ordering, which in most cases is ok since order does not generally matter in multisets.
See Also
*Mathematically, (according to N. Wildberger, we express braces {}
to imply a set and brackets []
to imply a list, as seen in Python. Unlike Python, commas ,
to imply order.