Wie erstelle ich einen Generator/Iterator mit der Python-C-API?

Lesezeit: 8 Minuten

Michaels Benutzeravatar
Michael

Wie repliziere ich den folgenden Python-Code mit der Python-C-API?

class Sequence():
    def __init__(self, max):
        self.max = max
    def data(self):
        i = 0
        while i < self.max:
            yield i
            i += 1

Bisher habe ich das hier:

#include  #include  /* Definiere eine neue Objektklasse, Sequence.  */ typedef struct { PyObject_HEAD size_t max;  } SequenceObject;  /* Instanzvariablen */ statische PyMemberDef Sequence_members[] = { {"max", T_UINT, offsetof(SequenceObject, max), 0, NULL}, {NULL} /* Sentinel */ };  static int Sequence_Init(SequenceObject *self, PyObject *args, PyObject *kwds) { if (!PyArg_ParseTuple(args, "k", &(self->max))) { return -1;  } Rückgabe 0;  } statisches PyObject *Sequence_data(SequenceObject *self, PyObject *args);  /* Methoden */ statische PyMethodDef Sequence_methods[] = { {"data", (PyCFunction)Sequence_data, METH_NOARGS, "sequence.data() -> iterator object\n" "Gibt den Iterator des Bereichs zurück [0, sequence.max)."},
    {NULL} /* Sentinel */
};

/* Define new object type */
PyTypeObject Sequence_Type = {
   PyObject_HEAD_INIT(NULL)
   0,                         /* ob_size */
   "Sequence",                /* tp_name */
   sizeof(SequenceObject),    /* tp_basicsize */
   0,                         /* tp_itemsize */
   0,                         /* tp_dealloc */
   0,                         /* tp_print */
   0,                         /* tp_getattr */
   0,                         /* tp_setattr */
   0,                         /* tp_compare */
   0,                         /* tp_repr */
   0,                         /* tp_as_number */
   0,                         /* tp_as_sequence */
   0,                         /* tp_as_mapping */
   0,                         /* tp_hash */
   0,                         /* tp_call */
   0,                         /* tp_str */
   0,                         /* tp_getattro */
   0,                         /* tp_setattro */
   0,                         /* tp_as_buffer */
   Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags*/
   "Test generator object",   /* tp_doc */
   0,                         /* tp_traverse */
   0,                         /* tp_clear */
   0,                         /* tp_richcompare */
   0,                         /* tp_weaklistoffset */
   0,                         /* tp_iter */
   0,                         /* tp_iternext */
   0,                         /* tp_methods */
   Sequence_members,          /* tp_members */
   0,                         /* tp_getset */
   0,                         /* tp_base */
   0,                         /* tp_dict */
   0,                         /* tp_descr_get */
   0,                         /* tp_descr_set */
   0,                         /* tp_dictoffset */
   (initproc)Sequence_init,   /* tp_init */
   0,                         /* tp_alloc */
   PyType_GenericNew,         /* tp_new */
};

static PyObject *Sequence_data(SequenceObject *self, PyObject *args)
{
    /* Now what? */
}

But I’m not sure where to go next. Could anyone offer some suggestions?

Edit

I suppose the main problem I’m having with this is simulating the yield statement. As I understand it, it is a pretty simple-looking, but in reality complex, statement — it creates a generator with its own __iter__() and next() methods which are called automatically. Searching through the docs, it seems to be associated with the PyGenObject; however, how to create a new instance of this object is unclear. PyGen_New() takes as its argument a PyFrameObject, the only reference to which I can find is PyEval_GetFrame(), which doesn’t seem to be what I want (or am I mistaken?). Does anyone have any experience with this they can share?

Further Edit

I found this to be clearer when I (essentially) expanded what Python was doing behind the scenes:

class IterObject():
    def __init__(self, max):
        self.max = max
    def __iter__(self):
        self.i = 0
        return self
    def next(self):
        if self.i >= self.max:
            raise StopIteration
        self.i += 1
        return self.i

class Sequence():
    def __init__(self, max):
        self.max = max
    def data(self):
        return IterObject(self.max)

Technically the sequence is off by one but you get the idea.

The only problem with this is it’s very annoying to create a new object every time one needs a generator — even more so in Python than C because of the required monstrosity that comes with defining a new type. And there can be no yield statement in C because C has no closures. What I did instead (since I couldn’t find it in the Python API — please point me to a standard object if it already exists!) was create a simple, generic generator object class that called back a C function for every next() method call. Here it is (note that I have not yet tried compiling this because it is not complete — see below):

#include <Python/Python.h>
#include <Python/structmember.h>
#include <stdlib.h>

/* A convenient, generic generator object. */

typedef PyObject *(*callback)(PyObject *callee, void *info) PyGeneratorCallback;

typedef struct {
    PyObject HEAD
    PyGeneratorCallback callback;
    PyObject *callee;
    void *callbackInfo; /* info to be passed along to callback function. */
    bool freeInfo; /* true if |callbackInfo| should be free'()d when object
                    * dealloc's, false if not. */
} GeneratorObject;

static PyObject *Generator_iter(PyObject *self, PyObject *args)
{
    Py_INCREF(self);
    return self;
}

static PyObject *Generator_next(PyObject *self, PyObject *args)
{
    return self->callback(self->callee, self->callbackInfo);
}

static PyMethodDef Generator_methods[] = { {"__iter__", (PyCFunction)Generator_iter, METH_NOARGS, NULL}, {"next", (PyCFunction)Generator_next, METH_NOARGS, NULL}, {NULL} /* Sentinel */ };  static void Generator_dealloc(GenericEventObject *self) { if (self->freeInfo && self->callbackInfo != NULL) { free(self->callbackInfo);  } self->ob_type->tp_free((PyObject *)self);  } PyTypeObject Generator_Type = { PyObject_HEAD_INIT(NULL) 0, /* ob_size */ "Generator", /* tp_name */ sizeof(GeneratorObject), /* tp_basicsize */ 0, /* tp_itemsize */ Generator_dealloc, /* tp_dealloc */ 0 , /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_compare */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT |  Py_TPFLAGS_BASETYPE, /* tp_flags*/ 0, /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0 , /* tp_iternext */ 0, /* tp_methods */ 0, /* tp_members */ 0, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ 0, /* tp_init */ 0, /* tp_alloc */ PyType_GenericNew, /* tp_new */ };  /* Gibt ein neues Generatorobjekt mit der gegebenen Callback-Funktion * und Argumenten zurück.  */ PyObject *Generator_New(PyObject *callee, void *info, bool freeInfo, PyGeneratorCallback callback) { GeneratorObject *generator = (GeneratorObject *)_PyObject_New(&Generator_Type);  if (generator == NULL) NULL zurückgeben;  Generator->angerufener = Angerufener;  Generator->info = info;  Generator->Rückruf = Rückruf;  self->freeInfo = kostenloseInfo;  return (PyObject *)generator;  } /* Ende der Generatordefinition.  */ /* Definiere eine neue Objektklasse, Sequence.  */ typedef struct { PyObject_HEAD size_t max;  } SequenceObject;  /* Instanzvariablen */ statische PyMemberDef Sequence_members[] = { {"max", T_UINT, offsetof(SequenceObject, max), 0, NULL}, {NULL} /* Sentinel */ } static int Sequence_Init(SequenceObject *self, PyObject *args, PyObject *kwds) { if (! PyArg_ParseTuple(args, "k", &self->max)) { return -1;  } Rückgabe 0;  } statisches PyObject *Sequence_data(SequenceObject *self, PyObject *args);  /* Methoden */ statische PyMethodDef Sequence_methods[] = { {"data", (PyCFunction)Sequence_data, METH_NOARGS, "sequence.data() -> iterator object\n" "Gibt den Bereichsgenerator zurück [0, sequence.max)."},
    {NULL} /* Sentinel */
};

/* Define new object type */
PyTypeObject Sequence_Type = {
   PyObject_HEAD_INIT(NULL)
   0,                         /* ob_size */
   "Sequence",                /* tp_name */
   sizeof(SequenceObject),    /* tp_basicsize */
   0,                         /* tp_itemsize */
   0,                         /* tp_dealloc */
   0,                         /* tp_print */
   0,                         /* tp_getattr */
   0,                         /* tp_setattr */
   0,                         /* tp_compare */
   0,                         /* tp_repr */
   0,                         /* tp_as_number */
   0,                         /* tp_as_sequence */
   0,                         /* tp_as_mapping */
   0,                         /* tp_hash */
   0,                         /* tp_call */
   0,                         /* tp_str */
   0,                         /* tp_getattro */
   0,                         /* tp_setattro */
   0,                         /* tp_as_buffer */
   Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags*/
   "Test generator object",   /* tp_doc */
   0,                         /* tp_traverse */
   0,                         /* tp_clear */
   0,                         /* tp_richcompare */
   0,                         /* tp_weaklistoffset */
   0,                         /* tp_iter */
   0,                         /* tp_iternext */
   0,                         /* tp_methods */
   Sequence_members,          /* tp_members */
   0,                         /* tp_getset */
   0,                         /* tp_base */
   0,                         /* tp_dict */
   0,                         /* tp_descr_get */
   0,                         /* tp_descr_set */
   0,                         /* tp_dictoffset */
   (initproc)Sequence_init,   /* tp_init */
   0,                         /* tp_alloc */
   PyType_GenericNew,         /* tp_new */
};

static PyObject *Sequence_data(SequenceObject *self, PyObject *args)
{
    size_t *info = malloc(sizeof(size_t));
    if (info == NULL) return NULL;
    *info = 0;

    /* |info| will be free'()d by the returned generator object. */
    GeneratorObject *ret = Generator_New(self, info, true,
                                         &Sequence_data_next_callback);
    if (ret == NULL) {
        free(info); /* Watch out for memory leaks! */
    }
    return ret;
}

PyObject *Sequence_data_next_callback(PyObject *self, void *info)
{
    size_t i = info;
    if (i > self->max) {
        return NULL; /* TODO: How do I raise StopIteration here? I can't seem to find
                      *       a standard exception. */
    } else {
        return Py_BuildValue("k", i++);
    }
}

However, unfortunately, I’m still not finished. The only question I have left is: How do I raise a StopIteration exception with the C API? I can’t seem to find it listed in the Standard Exceptions. Also, perhaps more importantly, is this the correct way to approach this problem?

Thanks to anyone that’s still following this.

  • You are aware that this is xrange( max ) ?

    – Jochen Ritzel

    Nov 29, 2009 at 15:33

  • Yes, but this is just a simple example. I do have a practical use for this.

    – Michael

    Nov 29, 2009 at 19:36

Tomek Szpakowicz's user avatar
Tomek Szpakowicz

Below is a simple implementation of module spam with one function myiter(int) returning iterator:

import spam
for i in spam.myiter(10):
    print i

prints numbers from 0 to 9.

It is simpler then your case but shows main points: defining object with standard __iter__() and next() methods, and implementing iterator behaviour including raising StopIteration when appropriate.

In your case iterator object needs to hold reference to Sequence (so you’ll need deallocator method for it to Py_DECREF it).
The sequence itself needs to implement __iter()__ and create an iterator inside it.


Structure containing state of iterator.
(In your version instead of m, it would have reference to Sequence.)

typedef struct {
  PyObject_HEAD
  long int m;
  long int i;
} spam_MyIter;

Iterator’s __iter__() method.
It always simply returns self.
It allows for both iterator and collection to be treated the same
in constructs like for ... in ....

PyObject* spam_MyIter_iter(PyObject *self)
{
  Py_INCREF(self);
  return self;
}

Implementation of our iteration: next() method.

PyObject* spam_MyIter_iternext(PyObject *self)
{
  spam_MyIter *p = (spam_MyIter *)self;
  if (p->i < p->m) {
    PyObject *tmp = Py_BuildValue("l", p->i);
    (p->i)++;
    return tmp;
  } else {
    /* Raising of standard StopIteration exception with empty value. */
    PyErr_SetNone(PyExc_StopIteration);
    return NULL;
  }
}

We need extended version of PyTypeObject structure to provide Python with
information about __iter__() and next().
We want them to be called efficiently, so no name-based lookup in dictionary.

static PyTypeObject spam_MyIterType = {
    PyObject_HEAD_INIT(NULL)
    0,                         /*ob_size*/
    "spam._MyIter",            /*tp_name*/
    sizeof(spam_MyIter),       /*tp_basicsize*/
    0,                         /*tp_itemsize*/
    0,                         /*tp_dealloc*/
    0,                         /*tp_print*/
    0,                         /*tp_getattr*/
    0,                         /*tp_setattr*/
    0,                         /*tp_compare*/
    0,                         /*tp_repr*/
    0,                         /*tp_as_number*/
    0,                         /*tp_as_sequence*/
    0,                         /*tp_as_mapping*/
    0,                         /*tp_hash */
    0,                         /*tp_call*/
    0,                         /*tp_str*/
    0,                         /*tp_getattro*/
    0,                         /*tp_setattro*/
    0,                         /*tp_as_buffer*/
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_ITER,
      /* tp_flags: Py_TPFLAGS_HAVE_ITER tells python to
         use tp_iter and tp_iternext fields. */
    "Internal myiter iterator object.",           /* tp_doc */
    0,  /* tp_traverse */
    0,  /* tp_clear */
    0,  /* tp_richcompare */
    0,  /* tp_weaklistoffset */
    spam_MyIter_iter,  /* tp_iter: __iter__() method */
    spam_MyIter_iternext  /* tp_iternext: next() method */
};

myiter(int) function creates iterator.

static PyObject *
spam_myiter(PyObject *self, PyObject *args)
{
  long int m;
  spam_MyIter *p;

  if (!PyArg_ParseTuple(args, "l", &m))  return NULL;

  /* I don't need python callable __init__() method for this iterator,
     so I'll simply allocate it as PyObject and initialize it by hand. */

  p = PyObject_New(spam_MyIter, &spam_MyIterType);
  if (!p) return NULL;

  /* I'm not sure if it's strictly necessary. */
  if (!PyObject_Init((PyObject *)p, &spam_MyIterType)) {
    Py_DECREF(p);
    return NULL;
  }

  p->m = m;
  p->i = 0;
  return (PyObject *)p;
}

The rest is pretty boring…

static PyMethodDef SpamMethods[] = { {"myiter", spam_myiter, METH_VARARGS, "Iterate from i=0 while i

Im Sequence_datamüssen Sie entweder eine neue PyInt-Instanz zurückgeben oder eine auslösen StopIteration Ausnahme, die dem Code außerhalb mitteilt, dass es keine weiteren Werte gibt. Sehen PEP 255 für Details u 9.10 Generatoren.

Sehen Iterator-Protokoll für Hilfsfunktionen in der Python/C-API.

  • Hier wird beschrieben, wie eine next()-Methode erstellt wird, aber nicht, wie ein Generatorobjekt erstellt und zurückgegeben wird, das die Methode enthält.

    – Michael

    29. November 2009 um 18:54 Uhr

  • Es ist sehr schwer, "Ertrag" in C zu simulieren; mach das Sequence Klasse stattdessen einen Iterator.

    – Aaron Digulla

    30. November 2009 um 8:27 Uhr

1407920cookie-checkWie erstelle ich einen Generator/Iterator mit der Python-C-API?

This website is using cookies to improve the user-friendliness. You agree by using the website further.

Privacy policy