Warum sind Standard-Iterator-Bereiche [begin, end) instead of [begin, end]?

Question 1

Warum definiert der Standard end() als am Ende vorbei, statt am eigentlichen Ende?

Question 2

Das beste Argument ist einfach das von Dijkstra selbst:

Sie möchten, dass die Größe des Bereichs ein einfacher Unterschied ist Ende − Start;

Das Einbeziehen der unteren Grenze ist “natürlicher”, wenn Folgen zu leeren degenerieren, und auch weil die Alternative (ausschließlich die untere Grenze) würde die Existenz eines “Eins-vor-dem-Anfang”-Kennwerts erfordern.

Sie müssen immer noch begründen, warum Sie bei Null und nicht bei Eins zu zählen beginnen, aber das war nicht Teil Ihrer Frage.

Die Weisheit hinter dem [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n₀, n₁)[n₁, n₂)[n₂,n₃). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you’d handle empty ranges.

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural “beginning” so that you can write the range as [0, N), without any awkward offsets or corrections.

In a nutshell: the fact that we don’t see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

Question 3

Tatsächlich macht eine Menge iteratorbezogener Dinge plötzlich viel mehr Sinn, wenn Sie bedenken, dass die Iteratoren nicht zeigen bei die Elemente der Sequenz aber zwischen, wobei die Dereferenzierung auf das nächste Element direkt daneben zugreift. Dann macht der “one past end”-Iterator plötzlich sofort Sinn:

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^               ^
   |               |
 begin            end

Offensichtlich begin zeigt auf den Anfang der Sequenz, und end zeigt auf das Ende derselben Sequenz. Dereferenzierung begin greift auf das Element zu Aund Dereferenzierung end macht keinen Sinn, weil es kein richtiges Element gibt. Außerdem wird ein Iterator hinzugefügt i in der Mitte gibt

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
 begin     i      end

und Sie sehen sofort, dass die Palette der Elemente aus begin zu i enthält die Elemente A und B während die Palette der Elemente aus i zu end enthält die Elemente C und D. Dereferenzierung i ergibt das Element rechts davon, also das erste Element der zweiten Folge.

Sogar das “off-by-one” für Reverse-Iteratoren wird auf diese Weise plötzlich offensichtlich: Die Umkehrung dieser Sequenz ergibt:

   +---+---+---+---+
   | D | C | B | A |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
rbegin     ri     rend
 (end)    (i)   (begin)

Ich habe die entsprechenden nicht umgekehrten (Basis-) Iteratoren unten in Klammern geschrieben. Sie sehen, der Reverse-Iterator gehört dazu i (die ich benannt habe ri) still Punkte zwischen Elementen B und C. Aufgrund der Umkehrung der Reihenfolge, jetzt Element B steht rechts daneben.

Question 4

Warum definiert der Standard end() als am Ende vorbei, statt am eigentlichen Ende?

Weil:

Es vermeidet eine spezielle Handhabung für leere Bereiche. Für leere Bereiche, begin() ist gleich
end() &

Es macht das Endekriterium für Schleifen, die über die Elemente iterieren, einfach: Die Schleifen werden einfach fortgesetzt, solange end() wird nicht erreicht.

Question 5

Weil dann

size() == end() - begin()   // For iterators for whom subtraction is valid

und Sie müssen nicht tun unangenehm Dinge wie

// Never mind that this is INVALID for input iterators...
bool empty() { return begin() == end() + 1; }

und Sie werden nicht versehentlich schreiben fehlerhafter Code wie

bool empty() { return begin() == end() - 1; }    // a typo from the first version
                                                 // of this post
                                                 // (see, it really is confusing)

bool empty() { return end() - begin() == -1; }   // Signed/unsigned mismatch
// Plus the fact that subtracting is also invalid for many iterators

Ebenfalls: Was würde find() zurückgeben, wenn end() zeigte auf ein gültiges Element?

Tust du Ja wirklich will Ein weiterer Mitglied angerufen invalid() was einen ungültigen Iterator zurückgibt?!
Zwei Iteratoren sind schon schmerzhaft genug …

Oh und sehen Das dazugehöriger Beitrag.

Ebenfalls:

Wenn die end war vor dem letzten Element, wie würden Sie insert() Am wahren Ende?!

Question 6

Das Iterator-Idiom halbgeschlossener Bereiche [begin(), end()) is originally based on pointer arithmetic for plain arrays. In that mode of operation, you would have functions that were passed an array and a size.

void func(int* array, size_t size)

Converting to half-closed ranges [begin, end) is very simple when you have that information:

int* begin;
int* end = array + size;

for (int* it = begin; it < end; ++it) { ... }

To work with fully-closed ranges, it’s harder:

int* begin;
int* end = array + size - 1;

for (int* it = begin; it <= end; ++it) { ... }

Since pointers to arrays are iterators in C++ (and the syntax was designed to allow this), it’s much easier to call std::find(array, array + size, some_value) than it is to call std::find(array, array + size - 1, some_value).

Plus, if you work with half-closed ranges, you can use the != operator to check for the end condition, becuase (if your operators are defined correctly) < implies !=.

for (int* it = begin; it != end; ++ it) { ... }

However there’s no easy way to do this with fully-closed ranges. You’re stuck with <=.

The only kind of iterator that supports < and > operations in C++ are random-access iterators. If you had to write a <= operator for every iterator class in C++, you’d have to make all of your iterators fully comparable, and you’d fewer choices for creating less capable iterators (such as the bidirectional iterators on std::list, or the input iterators that operate on iostreams) if C++ used fully-closed ranges.

Question 7

With the end() pointing one past the end, it is easy to iterate a collection with a for loop:

for (iterator it = collection.begin(); it != collection.end(); it++)
{
    DoStuff(*it);
}

With end() pointing to the last element, a loop would be more complex:

iterator it = collection.begin();
while (!collection.empty())
{
    DoStuff(*it);

    if (it == collection.end())
        break;

    it++;
}

Question 8

If a container is empty, begin() == end().
C++ Programmers tend to use != instead of < (less than) in loop conditions, therefore
having end() pointing to a position one off-the-end is convenient.