| line | stmt | bran | cond | sub | pod | time | code | 
| 1 |  |  |  |  |  |  | /* | 
| 2 |  |  |  |  |  |  | * libev linux aio fd activity backend | 
| 3 |  |  |  |  |  |  | * | 
| 4 |  |  |  |  |  |  | * Copyright (c) 2019 Marc Alexander Lehmann | 
| 5 |  |  |  |  |  |  | * All rights reserved. | 
| 6 |  |  |  |  |  |  | * | 
| 7 |  |  |  |  |  |  | * Redistribution and use in source and binary forms, with or without modifica- | 
| 8 |  |  |  |  |  |  | * tion, are permitted provided that the following conditions are met: | 
| 9 |  |  |  |  |  |  | * | 
| 10 |  |  |  |  |  |  | *   1.  Redistributions of source code must retain the above copyright notice, | 
| 11 |  |  |  |  |  |  | *       this list of conditions and the following disclaimer. | 
| 12 |  |  |  |  |  |  | * | 
| 13 |  |  |  |  |  |  | *   2.  Redistributions in binary form must reproduce the above copyright | 
| 14 |  |  |  |  |  |  | *       notice, this list of conditions and the following disclaimer in the | 
| 15 |  |  |  |  |  |  | *       documentation and/or other materials provided with the distribution. | 
| 16 |  |  |  |  |  |  | * | 
| 17 |  |  |  |  |  |  | * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED | 
| 18 |  |  |  |  |  |  | * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER- | 
| 19 |  |  |  |  |  |  | * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO | 
| 20 |  |  |  |  |  |  | * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE- | 
| 21 |  |  |  |  |  |  | * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, | 
| 22 |  |  |  |  |  |  | * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; | 
| 23 |  |  |  |  |  |  | * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | 
| 24 |  |  |  |  |  |  | * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH- | 
| 25 |  |  |  |  |  |  | * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED | 
| 26 |  |  |  |  |  |  | * OF THE POSSIBILITY OF SUCH DAMAGE. | 
| 27 |  |  |  |  |  |  | * | 
| 28 |  |  |  |  |  |  | * Alternatively, the contents of this file may be used under the terms of | 
| 29 |  |  |  |  |  |  | * the GNU General Public License ("GPL") version 2 or any later version, | 
| 30 |  |  |  |  |  |  | * in which case the provisions of the GPL are applicable instead of | 
| 31 |  |  |  |  |  |  | * the above. If you wish to allow the use of your version of this file | 
| 32 |  |  |  |  |  |  | * only under the terms of the GPL and not to allow others to use your | 
| 33 |  |  |  |  |  |  | * version of this file under the BSD license, indicate your decision | 
| 34 |  |  |  |  |  |  | * by deleting the provisions above and replace them with the notice | 
| 35 |  |  |  |  |  |  | * and other provisions required by the GPL. If you do not delete the | 
| 36 |  |  |  |  |  |  | * provisions above, a recipient may use your version of this file under | 
| 37 |  |  |  |  |  |  | * either the BSD or the GPL. | 
| 38 |  |  |  |  |  |  | */ | 
| 39 |  |  |  |  |  |  |  | 
| 40 |  |  |  |  |  |  | /* | 
| 41 |  |  |  |  |  |  | * general notes about linux aio: | 
| 42 |  |  |  |  |  |  | * | 
| 43 |  |  |  |  |  |  | * a) at first, the linux aio IOCB_CMD_POLL functionality introduced in | 
| 44 |  |  |  |  |  |  | *    4.18 looks too good to be true: both watchers and events can be | 
| 45 |  |  |  |  |  |  | *    batched, and events can even be handled in userspace using | 
| 46 |  |  |  |  |  |  | *    a ring buffer shared with the kernel. watchers can be canceled | 
| 47 |  |  |  |  |  |  | *    regardless of whether the fd has been closed. no problems with fork. | 
| 48 |  |  |  |  |  |  | *    ok, the ring buffer is 200% undocumented (there isn't even a | 
| 49 |  |  |  |  |  |  | *    header file), but otherwise, it's pure bliss! | 
| 50 |  |  |  |  |  |  | * b) ok, watchers are one-shot, so you have to re-arm active ones | 
| 51 |  |  |  |  |  |  | *    on every iteration. so much for syscall-less event handling, | 
| 52 |  |  |  |  |  |  | *    but at least these re-arms can be batched, no big deal, right? | 
| 53 |  |  |  |  |  |  | * c) well, linux as usual: the documentation lies to you: io_submit | 
| 54 |  |  |  |  |  |  | *    sometimes returns EINVAL because the kernel doesn't feel like | 
| 55 |  |  |  |  |  |  | *    handling your poll mask - ttys can be polled for POLLOUT, | 
| 56 |  |  |  |  |  |  | *    POLLOUT|POLLIN, but polling for POLLIN fails. just great, | 
| 57 |  |  |  |  |  |  | *    so we have to fall back to something else (hello, epoll), | 
| 58 |  |  |  |  |  |  | *    but at least the fallback can be slow, because these are | 
| 59 |  |  |  |  |  |  | *    exceptional cases, right? | 
| 60 |  |  |  |  |  |  | * d) hmm, you have to tell the kernel the maximum number of watchers | 
| 61 |  |  |  |  |  |  | *    you want to queue when initialising the aio context. but of | 
| 62 |  |  |  |  |  |  | *    course the real limit is magically calculated in the kernel, and | 
| 63 |  |  |  |  |  |  | *    is often higher then we asked for. so we just have to destroy | 
| 64 |  |  |  |  |  |  | *    the aio context and re-create it a bit larger if we hit the limit. | 
| 65 |  |  |  |  |  |  | *    (starts to remind you of epoll? well, it's a bit more deterministic | 
| 66 |  |  |  |  |  |  | *    and less gambling, but still ugly as hell). | 
| 67 |  |  |  |  |  |  | * e) that's when you find out you can also hit an arbitrary system-wide | 
| 68 |  |  |  |  |  |  | *    limit. or the kernel simply doesn't want to handle your watchers. | 
| 69 |  |  |  |  |  |  | *    what the fuck do we do then? you guessed it, in the middle | 
| 70 |  |  |  |  |  |  | *    of event handling we have to switch to 100% epoll polling. and | 
| 71 |  |  |  |  |  |  | *    that better is as fast as normal epoll polling, so you practically | 
| 72 |  |  |  |  |  |  | *    have to use the normal epoll backend with all its quirks. | 
| 73 |  |  |  |  |  |  | * f) end result of this train wreck: it inherits all the disadvantages | 
| 74 |  |  |  |  |  |  | *    from epoll, while adding a number on its own. why even bother to use | 
| 75 |  |  |  |  |  |  | *    it? because if conditions are right and your fds are supported and you | 
| 76 |  |  |  |  |  |  | *    don't hit a limit, this backend is actually faster, doesn't gamble with | 
| 77 |  |  |  |  |  |  | *    your fds, batches watchers and events and doesn't require costly state | 
| 78 |  |  |  |  |  |  | *    recreates. well, until it does. | 
| 79 |  |  |  |  |  |  | * g) all of this makes this backend use almost twice as much code as epoll. | 
| 80 |  |  |  |  |  |  | *    which in turn uses twice as much code as poll. and that#s not counting | 
| 81 |  |  |  |  |  |  | *    the fact that this backend also depends on the epoll backend, making | 
| 82 |  |  |  |  |  |  | *    it three times as much code as poll, or kqueue. | 
| 83 |  |  |  |  |  |  | * h) bleah. why can't linux just do kqueue. sure kqueue is ugly, but by now | 
| 84 |  |  |  |  |  |  | *    it's clear that whatever linux comes up with is far, far, far worse. | 
| 85 |  |  |  |  |  |  | */ | 
| 86 |  |  |  |  |  |  |  | 
| 87 |  |  |  |  |  |  | #include  /* actually linux/time.h, but we must assume they are compatible */ | 
| 88 |  |  |  |  |  |  | #include | 
| 89 |  |  |  |  |  |  | #include | 
| 90 |  |  |  |  |  |  |  | 
| 91 |  |  |  |  |  |  | /*****************************************************************************/ | 
| 92 |  |  |  |  |  |  | /* syscall wrapdadoop - this section has the raw api/abi definitions */ | 
| 93 |  |  |  |  |  |  |  | 
| 94 |  |  |  |  |  |  | #include  /* no glibc wrappers */ | 
| 95 |  |  |  |  |  |  |  | 
| 96 |  |  |  |  |  |  | /* aio_abi.h is not versioned in any way, so we cannot test for its existance */ | 
| 97 |  |  |  |  |  |  | #define IOCB_CMD_POLL 5 | 
| 98 |  |  |  |  |  |  |  | 
| 99 |  |  |  |  |  |  | /* taken from linux/fs/aio.c. yup, that's a .c file. | 
| 100 |  |  |  |  |  |  | * not only is this totally undocumented, not even the source code | 
| 101 |  |  |  |  |  |  | * can tell you what the future semantics of compat_features and | 
| 102 |  |  |  |  |  |  | * incompat_features are, or what header_length actually is for. | 
| 103 |  |  |  |  |  |  | */ | 
| 104 |  |  |  |  |  |  | #define AIO_RING_MAGIC                  0xa10a10a1 | 
| 105 |  |  |  |  |  |  | #define EV_AIO_RING_INCOMPAT_FEATURES   0 | 
| 106 |  |  |  |  |  |  | struct aio_ring | 
| 107 |  |  |  |  |  |  | { | 
| 108 |  |  |  |  |  |  | unsigned id;    /* kernel internal index number */ | 
| 109 |  |  |  |  |  |  | unsigned nr;    /* number of io_events */ | 
| 110 |  |  |  |  |  |  | unsigned head;  /* Written to by userland or by kernel. */ | 
| 111 |  |  |  |  |  |  | unsigned tail; | 
| 112 |  |  |  |  |  |  |  | 
| 113 |  |  |  |  |  |  | unsigned magic; | 
| 114 |  |  |  |  |  |  | unsigned compat_features; | 
| 115 |  |  |  |  |  |  | unsigned incompat_features; | 
| 116 |  |  |  |  |  |  | unsigned header_length;  /* size of aio_ring */ | 
| 117 |  |  |  |  |  |  |  | 
| 118 |  |  |  |  |  |  | struct io_event io_events[0]; | 
| 119 |  |  |  |  |  |  | }; | 
| 120 |  |  |  |  |  |  |  | 
| 121 |  |  |  |  |  |  | inline_size | 
| 122 |  |  |  |  |  |  | int | 
| 123 | 0 |  |  |  |  |  | evsys_io_setup (unsigned nr_events, aio_context_t *ctx_idp) | 
| 124 |  |  |  |  |  |  | { | 
| 125 | 0 |  |  |  |  |  | return ev_syscall2 (SYS_io_setup, nr_events, ctx_idp); | 
| 126 |  |  |  |  |  |  | } | 
| 127 |  |  |  |  |  |  |  | 
| 128 |  |  |  |  |  |  | inline_size | 
| 129 |  |  |  |  |  |  | int | 
| 130 | 0 |  |  |  |  |  | evsys_io_destroy (aio_context_t ctx_id) | 
| 131 |  |  |  |  |  |  | { | 
| 132 | 0 |  |  |  |  |  | return ev_syscall1 (SYS_io_destroy, ctx_id); | 
| 133 |  |  |  |  |  |  | } | 
| 134 |  |  |  |  |  |  |  | 
| 135 |  |  |  |  |  |  | inline_size | 
| 136 |  |  |  |  |  |  | int | 
| 137 | 0 |  |  |  |  |  | evsys_io_submit (aio_context_t ctx_id, long nr, struct iocb *cbp[]) | 
| 138 |  |  |  |  |  |  | { | 
| 139 | 0 |  |  |  |  |  | return ev_syscall3 (SYS_io_submit, ctx_id, nr, cbp); | 
| 140 |  |  |  |  |  |  | } | 
| 141 |  |  |  |  |  |  |  | 
| 142 |  |  |  |  |  |  | inline_size | 
| 143 |  |  |  |  |  |  | int | 
| 144 | 0 |  |  |  |  |  | evsys_io_cancel (aio_context_t ctx_id, struct iocb *cbp, struct io_event *result) | 
| 145 |  |  |  |  |  |  | { | 
| 146 | 0 |  |  |  |  |  | return ev_syscall3 (SYS_io_cancel, ctx_id, cbp, result); | 
| 147 |  |  |  |  |  |  | } | 
| 148 |  |  |  |  |  |  |  | 
| 149 |  |  |  |  |  |  | inline_size | 
| 150 |  |  |  |  |  |  | int | 
| 151 | 0 |  |  |  |  |  | evsys_io_getevents (aio_context_t ctx_id, long min_nr, long nr, struct io_event *events, struct timespec *timeout) | 
| 152 |  |  |  |  |  |  | { | 
| 153 | 0 |  |  |  |  |  | return ev_syscall5 (SYS_io_getevents, ctx_id, min_nr, nr, events, timeout); | 
| 154 |  |  |  |  |  |  | } | 
| 155 |  |  |  |  |  |  |  | 
| 156 |  |  |  |  |  |  | /*****************************************************************************/ | 
| 157 |  |  |  |  |  |  | /* actual backed implementation */ | 
| 158 |  |  |  |  |  |  |  | 
| 159 |  |  |  |  |  |  | ecb_cold | 
| 160 |  |  |  |  |  |  | static int | 
| 161 | 0 |  |  |  |  |  | linuxaio_nr_events (EV_P) | 
| 162 |  |  |  |  |  |  | { | 
| 163 |  |  |  |  |  |  | /* we start with 16 iocbs and incraese from there | 
| 164 |  |  |  |  |  |  | * that's tiny, but the kernel has a rather low system-wide | 
| 165 |  |  |  |  |  |  | * limit that can be reached quickly, so let's be parsimonious | 
| 166 |  |  |  |  |  |  | * with this resource. | 
| 167 |  |  |  |  |  |  | * Rest assured, the kernel generously rounds up small and big numbers | 
| 168 |  |  |  |  |  |  | * in different ways (but doesn't seem to charge you for it). | 
| 169 |  |  |  |  |  |  | * The 15 here is because the kernel usually has a power of two as aio-max-nr, | 
| 170 |  |  |  |  |  |  | * and this helps to take advantage of that limit. | 
| 171 |  |  |  |  |  |  | */ | 
| 172 |  |  |  |  |  |  |  | 
| 173 |  |  |  |  |  |  | /* we try to fill 4kB pages exactly. | 
| 174 |  |  |  |  |  |  | * the ring buffer header is 32 bytes, every io event is 32 bytes. | 
| 175 |  |  |  |  |  |  | * the kernel takes the io requests number, doubles it, adds 2 | 
| 176 |  |  |  |  |  |  | * and adds the ring buffer. | 
| 177 |  |  |  |  |  |  | * the way we use this is by starting low, and then roughly doubling the | 
| 178 |  |  |  |  |  |  | * size each time we hit a limit. | 
| 179 |  |  |  |  |  |  | */ | 
| 180 |  |  |  |  |  |  |  | 
| 181 | 0 |  |  |  |  |  | int requests   = 15 << linuxaio_iteration; | 
| 182 | 0 |  |  |  |  |  | int one_page   =  (4096 | 
| 183 |  |  |  |  |  |  | / sizeof (struct io_event)    ) / 2; /* how many fit into one page */ | 
| 184 | 0 |  |  |  |  |  | int first_page = ((4096 - sizeof (struct aio_ring)) | 
| 185 |  |  |  |  |  |  | / sizeof (struct io_event) - 2) / 2; /* how many fit into the first page */ | 
| 186 |  |  |  |  |  |  |  | 
| 187 |  |  |  |  |  |  | /* if everything fits into one page, use count exactly */ | 
| 188 | 0 | 0 |  |  |  |  | if (requests > first_page) | 
| 189 |  |  |  |  |  |  | /* otherwise, round down to full pages and add the first page */ | 
| 190 | 0 |  |  |  |  |  | requests = requests / one_page * one_page + first_page; | 
| 191 |  |  |  |  |  |  |  | 
| 192 | 0 |  |  |  |  |  | return requests; | 
| 193 |  |  |  |  |  |  | } | 
| 194 |  |  |  |  |  |  |  | 
| 195 |  |  |  |  |  |  | /* we use out own wrapper structure in case we ever want to do something "clever" */ | 
| 196 |  |  |  |  |  |  | typedef struct aniocb | 
| 197 |  |  |  |  |  |  | { | 
| 198 |  |  |  |  |  |  | struct iocb io; | 
| 199 |  |  |  |  |  |  | /*int inuse;*/ | 
| 200 |  |  |  |  |  |  | } *ANIOCBP; | 
| 201 |  |  |  |  |  |  |  | 
| 202 |  |  |  |  |  |  | inline_size | 
| 203 |  |  |  |  |  |  | void | 
| 204 | 0 |  |  |  |  |  | linuxaio_array_needsize_iocbp (ANIOCBP *base, int offset, int count) | 
| 205 |  |  |  |  |  |  | { | 
| 206 | 0 | 0 |  |  |  |  | while (count--) | 
| 207 |  |  |  |  |  |  | { | 
| 208 |  |  |  |  |  |  | /* TODO: quite the overhead to allocate every iocb separately, maybe use our own allocator? */ | 
| 209 | 0 |  |  |  |  |  | ANIOCBP iocb = (ANIOCBP)ev_malloc (sizeof (*iocb)); | 
| 210 |  |  |  |  |  |  |  | 
| 211 |  |  |  |  |  |  | /* full zero initialise is probably not required at the moment, but | 
| 212 |  |  |  |  |  |  | * this is not well documented, so we better do it. | 
| 213 |  |  |  |  |  |  | */ | 
| 214 | 0 |  |  |  |  |  | memset (iocb, 0, sizeof (*iocb)); | 
| 215 |  |  |  |  |  |  |  | 
| 216 | 0 |  |  |  |  |  | iocb->io.aio_lio_opcode = IOCB_CMD_POLL; | 
| 217 | 0 |  |  |  |  |  | iocb->io.aio_fildes     = offset; | 
| 218 |  |  |  |  |  |  |  | 
| 219 | 0 |  |  |  |  |  | base [offset++] = iocb; | 
| 220 |  |  |  |  |  |  | } | 
| 221 | 0 |  |  |  |  |  | } | 
| 222 |  |  |  |  |  |  |  | 
| 223 |  |  |  |  |  |  | ecb_cold | 
| 224 |  |  |  |  |  |  | static void | 
| 225 | 0 |  |  |  |  |  | linuxaio_free_iocbp (EV_P) | 
| 226 |  |  |  |  |  |  | { | 
| 227 | 0 | 0 |  |  |  |  | while (linuxaio_iocbpmax--) | 
| 228 | 0 |  |  |  |  |  | ev_free (linuxaio_iocbps [linuxaio_iocbpmax]); | 
| 229 |  |  |  |  |  |  |  | 
| 230 | 0 |  |  |  |  |  | linuxaio_iocbpmax = 0; /* next resize will completely reallocate the array, at some overhead */ | 
| 231 | 0 |  |  |  |  |  | } | 
| 232 |  |  |  |  |  |  |  | 
| 233 |  |  |  |  |  |  | static void | 
| 234 | 0 |  |  |  |  |  | linuxaio_modify (EV_P_ int fd, int oev, int nev) | 
| 235 |  |  |  |  |  |  | { | 
| 236 | 0 | 0 |  |  |  |  | array_needsize (ANIOCBP, linuxaio_iocbps, linuxaio_iocbpmax, fd + 1, linuxaio_array_needsize_iocbp); | 
| 237 | 0 |  |  |  |  |  | ANIOCBP iocb = linuxaio_iocbps [fd]; | 
| 238 | 0 |  |  |  |  |  | ANFD *anfd = &anfds [fd]; | 
| 239 |  |  |  |  |  |  |  | 
| 240 | 0 | 0 |  |  |  |  | if (ecb_expect_false (iocb->io.aio_reqprio < 0)) | 
| 241 |  |  |  |  |  |  | { | 
| 242 |  |  |  |  |  |  | /* we handed this fd over to epoll, so undo this first */ | 
| 243 |  |  |  |  |  |  | /* we do it manually because the optimisations on epoll_modify won't do us any good */ | 
| 244 | 0 |  |  |  |  |  | epoll_ctl (backend_fd, EPOLL_CTL_DEL, fd, 0); | 
| 245 | 0 |  |  |  |  |  | anfd->emask = 0; | 
| 246 | 0 |  |  |  |  |  | iocb->io.aio_reqprio = 0; | 
| 247 |  |  |  |  |  |  | } | 
| 248 | 0 | 0 |  |  |  |  | else if (ecb_expect_false (iocb->io.aio_buf)) | 
| 249 |  |  |  |  |  |  | { | 
| 250 |  |  |  |  |  |  | /* iocb active, so cancel it first before resubmit */ | 
| 251 |  |  |  |  |  |  | /* this assumes we only ever get one call per fd per loop iteration */ | 
| 252 |  |  |  |  |  |  | for (;;) | 
| 253 |  |  |  |  |  |  | { | 
| 254 |  |  |  |  |  |  | /* on all relevant kernels, io_cancel fails with EINPROGRESS on "success" */ | 
| 255 | 0 | 0 |  |  |  |  | if (ecb_expect_false (evsys_io_cancel (linuxaio_ctx, &iocb->io, (struct io_event *)0) == 0)) | 
| 256 | 0 |  |  |  |  |  | break; | 
| 257 |  |  |  |  |  |  |  | 
| 258 | 0 | 0 |  |  |  |  | if (ecb_expect_true (errno == EINPROGRESS)) | 
| 259 | 0 |  |  |  |  |  | break; | 
| 260 |  |  |  |  |  |  |  | 
| 261 |  |  |  |  |  |  | /* the EINPROGRESS test is for nicer error message. clumsy. */ | 
| 262 | 0 | 0 |  |  |  |  | if (errno != EINTR) | 
| 263 |  |  |  |  |  |  | { | 
| 264 |  |  |  |  |  |  | assert (("libev: linuxaio unexpected io_cancel failed", errno != EINTR && errno != EINPROGRESS)); | 
| 265 | 0 |  |  |  |  |  | break; | 
| 266 |  |  |  |  |  |  | } | 
| 267 | 0 |  |  |  |  |  | } | 
| 268 |  |  |  |  |  |  |  | 
| 269 |  |  |  |  |  |  | /* increment generation counter to avoid handling old events */ | 
| 270 | 0 |  |  |  |  |  | ++anfd->egen; | 
| 271 |  |  |  |  |  |  | } | 
| 272 |  |  |  |  |  |  |  | 
| 273 | 0 |  |  |  |  |  | iocb->io.aio_buf = (nev & EV_READ  ? POLLIN  : 0) | 
| 274 | 0 | 0 |  |  |  |  | | (nev & EV_WRITE ? POLLOUT : 0); | 
| 275 |  |  |  |  |  |  |  | 
| 276 | 0 | 0 |  |  |  |  | if (nev) | 
| 277 |  |  |  |  |  |  | { | 
| 278 | 0 |  |  |  |  |  | iocb->io.aio_data = (uint32_t)fd | ((__u64)(uint32_t)anfd->egen << 32); | 
| 279 |  |  |  |  |  |  |  | 
| 280 |  |  |  |  |  |  | /* queue iocb up for io_submit */ | 
| 281 |  |  |  |  |  |  | /* this assumes we only ever get one call per fd per loop iteration */ | 
| 282 | 0 |  |  |  |  |  | ++linuxaio_submitcnt; | 
| 283 | 0 | 0 |  |  |  |  | array_needsize (struct iocb *, linuxaio_submits, linuxaio_submitmax, linuxaio_submitcnt, array_needsize_noinit); | 
| 284 | 0 |  |  |  |  |  | linuxaio_submits [linuxaio_submitcnt - 1] = &iocb->io; | 
| 285 |  |  |  |  |  |  | } | 
| 286 | 0 |  |  |  |  |  | } | 
| 287 |  |  |  |  |  |  |  | 
| 288 |  |  |  |  |  |  | static void | 
| 289 | 0 |  |  |  |  |  | linuxaio_epoll_cb (EV_P_ struct ev_io *w, int revents) | 
| 290 |  |  |  |  |  |  | { | 
| 291 | 0 |  |  |  |  |  | epoll_poll (EV_A_ 0); | 
| 292 | 0 |  |  |  |  |  | } | 
| 293 |  |  |  |  |  |  |  | 
| 294 |  |  |  |  |  |  | inline_speed | 
| 295 |  |  |  |  |  |  | void | 
| 296 | 0 |  |  |  |  |  | linuxaio_fd_rearm (EV_P_ int fd) | 
| 297 |  |  |  |  |  |  | { | 
| 298 | 0 |  |  |  |  |  | anfds [fd].events = 0; | 
| 299 | 0 |  |  |  |  |  | linuxaio_iocbps [fd]->io.aio_buf = 0; | 
| 300 | 0 |  |  |  |  |  | fd_change (EV_A_ fd, EV_ANFD_REIFY); | 
| 301 | 0 |  |  |  |  |  | } | 
| 302 |  |  |  |  |  |  |  | 
| 303 |  |  |  |  |  |  | static void | 
| 304 | 0 |  |  |  |  |  | linuxaio_parse_events (EV_P_ struct io_event *ev, int nr) | 
| 305 |  |  |  |  |  |  | { | 
| 306 | 0 | 0 |  |  |  |  | while (nr) | 
| 307 |  |  |  |  |  |  | { | 
| 308 | 0 |  |  |  |  |  | int fd       = ev->data & 0xffffffff; | 
| 309 | 0 |  |  |  |  |  | uint32_t gen = ev->data >> 32; | 
| 310 | 0 |  |  |  |  |  | int res      = ev->res; | 
| 311 |  |  |  |  |  |  |  | 
| 312 |  |  |  |  |  |  | assert (("libev: iocb fd must be in-bounds", fd >= 0 && fd < anfdmax)); | 
| 313 |  |  |  |  |  |  |  | 
| 314 |  |  |  |  |  |  | /* only accept events if generation counter matches */ | 
| 315 | 0 | 0 |  |  |  |  | if (ecb_expect_true (gen == (uint32_t)anfds [fd].egen)) | 
| 316 |  |  |  |  |  |  | { | 
| 317 |  |  |  |  |  |  | /* feed events, we do not expect or handle POLLNVAL */ | 
| 318 | 0 |  |  |  |  |  | fd_event ( | 
| 319 |  |  |  |  |  |  | EV_A_ | 
| 320 |  |  |  |  |  |  | fd, | 
| 321 | 0 | 0 |  |  |  |  | (res & (POLLOUT | POLLERR | POLLHUP) ? EV_WRITE : 0) | 
| 322 | 0 |  |  |  |  |  | | (res & (POLLIN | POLLERR | POLLHUP) ? EV_READ : 0) | 
| 323 |  |  |  |  |  |  | ); | 
| 324 |  |  |  |  |  |  |  | 
| 325 |  |  |  |  |  |  | /* linux aio is oneshot: rearm fd. TODO: this does more work than strictly needed */ | 
| 326 | 0 |  |  |  |  |  | linuxaio_fd_rearm (EV_A_ fd); | 
| 327 |  |  |  |  |  |  | } | 
| 328 |  |  |  |  |  |  |  | 
| 329 | 0 |  |  |  |  |  | --nr; | 
| 330 | 0 |  |  |  |  |  | ++ev; | 
| 331 |  |  |  |  |  |  | } | 
| 332 | 0 |  |  |  |  |  | } | 
| 333 |  |  |  |  |  |  |  | 
| 334 |  |  |  |  |  |  | /* get any events from ring buffer, return true if any were handled */ | 
| 335 |  |  |  |  |  |  | static int | 
| 336 | 0 |  |  |  |  |  | linuxaio_get_events_from_ring (EV_P) | 
| 337 |  |  |  |  |  |  | { | 
| 338 | 0 |  |  |  |  |  | struct aio_ring *ring = (struct aio_ring *)linuxaio_ctx; | 
| 339 |  |  |  |  |  |  | unsigned head, tail; | 
| 340 |  |  |  |  |  |  |  | 
| 341 |  |  |  |  |  |  | /* the kernel reads and writes both of these variables, */ | 
| 342 |  |  |  |  |  |  | /* as a C extension, we assume that volatile use here */ | 
| 343 |  |  |  |  |  |  | /* both makes reads atomic and once-only */ | 
| 344 | 0 |  |  |  |  |  | head = *(volatile unsigned *)&ring->head; | 
| 345 | 0 |  |  |  |  |  | ECB_MEMORY_FENCE_ACQUIRE; | 
| 346 | 0 |  |  |  |  |  | tail = *(volatile unsigned *)&ring->tail; | 
| 347 |  |  |  |  |  |  |  | 
| 348 | 0 | 0 |  |  |  |  | if (head == tail) | 
| 349 | 0 |  |  |  |  |  | return 0; | 
| 350 |  |  |  |  |  |  |  | 
| 351 |  |  |  |  |  |  | /* parse all available events, but only once, to avoid starvation */ | 
| 352 | 0 | 0 |  |  |  |  | if (ecb_expect_true (tail > head)) /* normal case around */ | 
| 353 | 0 |  |  |  |  |  | linuxaio_parse_events (EV_A_ ring->io_events + head, tail - head); | 
| 354 |  |  |  |  |  |  | else /* wrapped around */ | 
| 355 |  |  |  |  |  |  | { | 
| 356 | 0 |  |  |  |  |  | linuxaio_parse_events (EV_A_ ring->io_events + head, ring->nr - head); | 
| 357 | 0 |  |  |  |  |  | linuxaio_parse_events (EV_A_ ring->io_events, tail); | 
| 358 |  |  |  |  |  |  | } | 
| 359 |  |  |  |  |  |  |  | 
| 360 | 0 |  |  |  |  |  | ECB_MEMORY_FENCE_RELEASE; | 
| 361 |  |  |  |  |  |  | /* as an extension to C, we hope that the volatile will make this atomic and once-only */ | 
| 362 | 0 |  |  |  |  |  | *(volatile unsigned *)&ring->head = tail; | 
| 363 |  |  |  |  |  |  |  | 
| 364 | 0 |  |  |  |  |  | return 1; | 
| 365 |  |  |  |  |  |  | } | 
| 366 |  |  |  |  |  |  |  | 
| 367 |  |  |  |  |  |  | inline_size | 
| 368 |  |  |  |  |  |  | int | 
| 369 | 0 |  |  |  |  |  | linuxaio_ringbuf_valid (EV_P) | 
| 370 |  |  |  |  |  |  | { | 
| 371 | 0 |  |  |  |  |  | struct aio_ring *ring = (struct aio_ring *)linuxaio_ctx; | 
| 372 |  |  |  |  |  |  |  | 
| 373 | 0 |  |  |  |  |  | return ecb_expect_true (ring->magic == AIO_RING_MAGIC) | 
| 374 | 0 | 0 |  |  |  |  | && ring->incompat_features == EV_AIO_RING_INCOMPAT_FEATURES | 
| 375 | 0 | 0 |  |  |  |  | && ring->header_length == sizeof (struct aio_ring); /* TODO: or use it to find io_event[0]? */ | 
|  |  | 0 |  |  |  |  |  | 
| 376 |  |  |  |  |  |  | } | 
| 377 |  |  |  |  |  |  |  | 
| 378 |  |  |  |  |  |  | /* read at least one event from kernel, or timeout */ | 
| 379 |  |  |  |  |  |  | inline_size | 
| 380 |  |  |  |  |  |  | void | 
| 381 | 0 |  |  |  |  |  | linuxaio_get_events (EV_P_ ev_tstamp timeout) | 
| 382 |  |  |  |  |  |  | { | 
| 383 |  |  |  |  |  |  | struct timespec ts; | 
| 384 |  |  |  |  |  |  | struct io_event ioev[8]; /* 256 octet stack space */ | 
| 385 | 0 |  |  |  |  |  | int want = 1; /* how many events to request */ | 
| 386 | 0 |  |  |  |  |  | int ringbuf_valid = linuxaio_ringbuf_valid (EV_A); | 
| 387 |  |  |  |  |  |  |  | 
| 388 | 0 | 0 |  |  |  |  | if (ecb_expect_true (ringbuf_valid)) | 
| 389 |  |  |  |  |  |  | { | 
| 390 |  |  |  |  |  |  | /* if the ring buffer has any events, we don't wait or call the kernel at all */ | 
| 391 | 0 | 0 |  |  |  |  | if (linuxaio_get_events_from_ring (EV_A)) | 
| 392 | 0 |  |  |  |  |  | return; | 
| 393 |  |  |  |  |  |  |  | 
| 394 |  |  |  |  |  |  | /* if the ring buffer is empty, and we don't have a timeout, then don't call the kernel */ | 
| 395 | 0 | 0 |  |  |  |  | if (!timeout) | 
| 396 | 0 |  |  |  |  |  | return; | 
| 397 |  |  |  |  |  |  | } | 
| 398 |  |  |  |  |  |  | else | 
| 399 |  |  |  |  |  |  | /* no ringbuffer, request slightly larger batch */ | 
| 400 | 0 |  |  |  |  |  | want = sizeof (ioev) / sizeof (ioev [0]); | 
| 401 |  |  |  |  |  |  |  | 
| 402 |  |  |  |  |  |  | /* no events, so wait for some | 
| 403 |  |  |  |  |  |  | * for fairness reasons, we do this in a loop, to fetch all events | 
| 404 |  |  |  |  |  |  | */ | 
| 405 |  |  |  |  |  |  | for (;;) | 
| 406 |  |  |  |  |  |  | { | 
| 407 |  |  |  |  |  |  | int res; | 
| 408 |  |  |  |  |  |  |  | 
| 409 | 0 | 0 |  |  |  |  | EV_RELEASE_CB; | 
| 410 |  |  |  |  |  |  |  | 
| 411 | 0 |  |  |  |  |  | EV_TS_SET (ts, timeout); | 
| 412 | 0 |  |  |  |  |  | res = evsys_io_getevents (linuxaio_ctx, 1, want, ioev, &ts); | 
| 413 |  |  |  |  |  |  |  | 
| 414 | 0 | 0 |  |  |  |  | EV_ACQUIRE_CB; | 
| 415 |  |  |  |  |  |  |  | 
| 416 | 0 | 0 |  |  |  |  | if (res < 0) | 
| 417 | 0 | 0 |  |  |  |  | if (errno == EINTR) | 
| 418 |  |  |  |  |  |  | /* ignored, retry */; | 
| 419 |  |  |  |  |  |  | else | 
| 420 | 0 |  |  |  |  |  | ev_syserr ("(libev) linuxaio io_getevents"); | 
| 421 | 0 | 0 |  |  |  |  | else if (res) | 
| 422 |  |  |  |  |  |  | { | 
| 423 |  |  |  |  |  |  | /* at least one event available, handle them */ | 
| 424 | 0 |  |  |  |  |  | linuxaio_parse_events (EV_A_ ioev, res); | 
| 425 |  |  |  |  |  |  |  | 
| 426 | 0 | 0 |  |  |  |  | if (ecb_expect_true (ringbuf_valid)) | 
| 427 |  |  |  |  |  |  | { | 
| 428 |  |  |  |  |  |  | /* if we have a ring buffer, handle any remaining events in it */ | 
| 429 | 0 |  |  |  |  |  | linuxaio_get_events_from_ring (EV_A); | 
| 430 |  |  |  |  |  |  |  | 
| 431 |  |  |  |  |  |  | /* at this point, we should have handled all outstanding events */ | 
| 432 | 0 |  |  |  |  |  | break; | 
| 433 |  |  |  |  |  |  | } | 
| 434 | 0 | 0 |  |  |  |  | else if (res < want) | 
| 435 |  |  |  |  |  |  | /* otherwise, if there were fewere events than we wanted, we assume there are no more */ | 
| 436 | 0 |  |  |  |  |  | break; | 
| 437 |  |  |  |  |  |  | } | 
| 438 |  |  |  |  |  |  | else | 
| 439 | 0 |  |  |  |  |  | break; /* no events from the kernel, we are done */ | 
| 440 |  |  |  |  |  |  |  | 
| 441 | 0 |  |  |  |  |  | timeout = EV_TS_CONST (0.); /* only wait in the first iteration */ | 
| 442 | 0 |  |  |  |  |  | } | 
| 443 |  |  |  |  |  |  | } | 
| 444 |  |  |  |  |  |  |  | 
| 445 |  |  |  |  |  |  | inline_size | 
| 446 |  |  |  |  |  |  | int | 
| 447 | 0 |  |  |  |  |  | linuxaio_io_setup (EV_P) | 
| 448 |  |  |  |  |  |  | { | 
| 449 | 0 |  |  |  |  |  | linuxaio_ctx = 0; | 
| 450 | 0 |  |  |  |  |  | return evsys_io_setup (linuxaio_nr_events (EV_A), &linuxaio_ctx); | 
| 451 |  |  |  |  |  |  | } | 
| 452 |  |  |  |  |  |  |  | 
| 453 |  |  |  |  |  |  | static void | 
| 454 | 0 |  |  |  |  |  | linuxaio_poll (EV_P_ ev_tstamp timeout) | 
| 455 |  |  |  |  |  |  | { | 
| 456 |  |  |  |  |  |  | int submitted; | 
| 457 |  |  |  |  |  |  |  | 
| 458 |  |  |  |  |  |  | /* first phase: submit new iocbs */ | 
| 459 |  |  |  |  |  |  |  | 
| 460 |  |  |  |  |  |  | /* io_submit might return less than the requested number of iocbs */ | 
| 461 |  |  |  |  |  |  | /* this is, afaics, only because of errors, but we go by the book and use a loop, */ | 
| 462 |  |  |  |  |  |  | /* which allows us to pinpoint the erroneous iocb */ | 
| 463 | 0 | 0 |  |  |  |  | for (submitted = 0; submitted < linuxaio_submitcnt; ) | 
| 464 |  |  |  |  |  |  | { | 
| 465 | 0 |  |  |  |  |  | int res = evsys_io_submit (linuxaio_ctx, linuxaio_submitcnt - submitted, linuxaio_submits + submitted); | 
| 466 |  |  |  |  |  |  |  | 
| 467 | 0 | 0 |  |  |  |  | if (ecb_expect_false (res < 0)) | 
| 468 | 0 | 0 |  |  |  |  | if (errno == EINVAL) | 
| 469 |  |  |  |  |  |  | { | 
| 470 |  |  |  |  |  |  | /* This happens for unsupported fds, officially, but in my testing, | 
| 471 |  |  |  |  |  |  | * also randomly happens for supported fds. We fall back to good old | 
| 472 |  |  |  |  |  |  | * poll() here, under the assumption that this is a very rare case. | 
| 473 |  |  |  |  |  |  | * See https://lore.kernel.org/patchwork/patch/1047453/ to see | 
| 474 |  |  |  |  |  |  | * discussion about such a case (ttys) where polling for POLLIN | 
| 475 |  |  |  |  |  |  | * fails but POLLIN|POLLOUT works. | 
| 476 |  |  |  |  |  |  | */ | 
| 477 | 0 |  |  |  |  |  | struct iocb *iocb = linuxaio_submits [submitted]; | 
| 478 | 0 |  |  |  |  |  | epoll_modify (EV_A_ iocb->aio_fildes, 0, anfds [iocb->aio_fildes].events); | 
| 479 | 0 |  |  |  |  |  | iocb->aio_reqprio = -1; /* mark iocb as epoll */ | 
| 480 |  |  |  |  |  |  |  | 
| 481 | 0 |  |  |  |  |  | res = 1; /* skip this iocb - another iocb, another chance */ | 
| 482 |  |  |  |  |  |  | } | 
| 483 | 0 | 0 |  |  |  |  | else if (errno == EAGAIN) | 
| 484 |  |  |  |  |  |  | { | 
| 485 |  |  |  |  |  |  | /* This happens when the ring buffer is full, or some other shit we | 
| 486 |  |  |  |  |  |  | * don't know and isn't documented. Most likely because we have too | 
| 487 |  |  |  |  |  |  | * many requests and linux aio can't be assed to handle them. | 
| 488 |  |  |  |  |  |  | * In this case, we try to allocate a larger ring buffer, freeing | 
| 489 |  |  |  |  |  |  | * ours first. This might fail, in which case we have to fall back to 100% | 
| 490 |  |  |  |  |  |  | * epoll. | 
| 491 |  |  |  |  |  |  | * God, how I hate linux not getting its act together. Ever. | 
| 492 |  |  |  |  |  |  | */ | 
| 493 | 0 |  |  |  |  |  | evsys_io_destroy (linuxaio_ctx); | 
| 494 | 0 |  |  |  |  |  | linuxaio_submitcnt = 0; | 
| 495 |  |  |  |  |  |  |  | 
| 496 |  |  |  |  |  |  | /* rearm all fds with active iocbs */ | 
| 497 |  |  |  |  |  |  | { | 
| 498 |  |  |  |  |  |  | int fd; | 
| 499 | 0 | 0 |  |  |  |  | for (fd = 0; fd < linuxaio_iocbpmax; ++fd) | 
| 500 | 0 | 0 |  |  |  |  | if (linuxaio_iocbps [fd]->io.aio_buf) | 
| 501 | 0 |  |  |  |  |  | linuxaio_fd_rearm (EV_A_ fd); | 
| 502 |  |  |  |  |  |  | } | 
| 503 |  |  |  |  |  |  |  | 
| 504 | 0 |  |  |  |  |  | ++linuxaio_iteration; | 
| 505 | 0 | 0 |  |  |  |  | if (linuxaio_io_setup (EV_A) < 0) | 
| 506 |  |  |  |  |  |  | { | 
| 507 |  |  |  |  |  |  | /* TODO: rearm all and recreate epoll backend from scratch */ | 
| 508 |  |  |  |  |  |  | /* TODO: might be more prudent? */ | 
| 509 |  |  |  |  |  |  |  | 
| 510 |  |  |  |  |  |  | /* to bad, we can't get a new aio context, go 100% epoll */ | 
| 511 | 0 |  |  |  |  |  | linuxaio_free_iocbp (EV_A); | 
| 512 | 0 |  |  |  |  |  | ev_io_stop (EV_A_ &linuxaio_epoll_w); | 
| 513 | 0 |  |  |  |  |  | ev_ref (EV_A); | 
| 514 | 0 |  |  |  |  |  | linuxaio_ctx = 0; | 
| 515 |  |  |  |  |  |  |  | 
| 516 | 0 |  |  |  |  |  | backend        = EVBACKEND_EPOLL; | 
| 517 | 0 |  |  |  |  |  | backend_modify = epoll_modify; | 
| 518 | 0 |  |  |  |  |  | backend_poll   = epoll_poll; | 
| 519 |  |  |  |  |  |  | } | 
| 520 |  |  |  |  |  |  |  | 
| 521 | 0 |  |  |  |  |  | timeout = EV_TS_CONST (0.); | 
| 522 |  |  |  |  |  |  | /* it's easiest to handle this mess in another iteration */ | 
| 523 | 0 |  |  |  |  |  | return; | 
| 524 |  |  |  |  |  |  | } | 
| 525 | 0 | 0 |  |  |  |  | else if (errno == EBADF) | 
| 526 |  |  |  |  |  |  | { | 
| 527 |  |  |  |  |  |  | assert (("libev: event loop rejected bad fd", errno != EBADF)); | 
| 528 | 0 |  |  |  |  |  | fd_kill (EV_A_ linuxaio_submits [submitted]->aio_fildes); | 
| 529 |  |  |  |  |  |  |  | 
| 530 | 0 |  |  |  |  |  | res = 1; /* skip this iocb */ | 
| 531 |  |  |  |  |  |  | } | 
| 532 | 0 | 0 |  |  |  |  | else if (errno == EINTR) /* not seen in reality, not documented */ | 
| 533 | 0 |  |  |  |  |  | res = 0; /* silently ignore and retry */ | 
| 534 |  |  |  |  |  |  | else | 
| 535 |  |  |  |  |  |  | { | 
| 536 | 0 |  |  |  |  |  | ev_syserr ("(libev) linuxaio io_submit"); | 
| 537 | 0 |  |  |  |  |  | res = 0; | 
| 538 |  |  |  |  |  |  | } | 
| 539 |  |  |  |  |  |  |  | 
| 540 | 0 |  |  |  |  |  | submitted += res; | 
| 541 |  |  |  |  |  |  | } | 
| 542 |  |  |  |  |  |  |  | 
| 543 | 0 |  |  |  |  |  | linuxaio_submitcnt = 0; | 
| 544 |  |  |  |  |  |  |  | 
| 545 |  |  |  |  |  |  | /* second phase: fetch and parse events */ | 
| 546 |  |  |  |  |  |  |  | 
| 547 | 0 |  |  |  |  |  | linuxaio_get_events (EV_A_ timeout); | 
| 548 |  |  |  |  |  |  | } | 
| 549 |  |  |  |  |  |  |  | 
| 550 |  |  |  |  |  |  | inline_size | 
| 551 |  |  |  |  |  |  | int | 
| 552 | 0 |  |  |  |  |  | linuxaio_init (EV_P_ int flags) | 
| 553 |  |  |  |  |  |  | { | 
| 554 |  |  |  |  |  |  | /* would be great to have a nice test for IOCB_CMD_POLL instead */ | 
| 555 |  |  |  |  |  |  | /* also: test some semi-common fd types, such as files and ttys in recommended_backends */ | 
| 556 |  |  |  |  |  |  | /* 4.18 introduced IOCB_CMD_POLL, 4.19 made epoll work, and we need that */ | 
| 557 | 0 | 0 |  |  |  |  | if (ev_linux_version () < 0x041300) | 
| 558 | 0 |  |  |  |  |  | return 0; | 
| 559 |  |  |  |  |  |  |  | 
| 560 | 0 | 0 |  |  |  |  | if (!epoll_init (EV_A_ 0)) | 
| 561 | 0 |  |  |  |  |  | return 0; | 
| 562 |  |  |  |  |  |  |  | 
| 563 | 0 |  |  |  |  |  | linuxaio_iteration = 0; | 
| 564 |  |  |  |  |  |  |  | 
| 565 | 0 | 0 |  |  |  |  | if (linuxaio_io_setup (EV_A) < 0) | 
| 566 |  |  |  |  |  |  | { | 
| 567 | 0 |  |  |  |  |  | epoll_destroy (EV_A); | 
| 568 | 0 |  |  |  |  |  | return 0; | 
| 569 |  |  |  |  |  |  | } | 
| 570 |  |  |  |  |  |  |  | 
| 571 | 0 |  |  |  |  |  | ev_io_init  (&linuxaio_epoll_w, linuxaio_epoll_cb, backend_fd, EV_READ); | 
| 572 | 0 |  |  |  |  |  | ev_set_priority (&linuxaio_epoll_w, EV_MAXPRI); | 
| 573 | 0 |  |  |  |  |  | ev_io_start (EV_A_ &linuxaio_epoll_w); | 
| 574 | 0 |  |  |  |  |  | ev_unref (EV_A); /* watcher should not keep loop alive */ | 
| 575 |  |  |  |  |  |  |  | 
| 576 | 0 |  |  |  |  |  | backend_modify = linuxaio_modify; | 
| 577 | 0 |  |  |  |  |  | backend_poll   = linuxaio_poll; | 
| 578 |  |  |  |  |  |  |  | 
| 579 | 0 |  |  |  |  |  | linuxaio_iocbpmax = 0; | 
| 580 | 0 |  |  |  |  |  | linuxaio_iocbps = 0; | 
| 581 |  |  |  |  |  |  |  | 
| 582 | 0 |  |  |  |  |  | linuxaio_submits = 0; | 
| 583 | 0 |  |  |  |  |  | linuxaio_submitmax = 0; | 
| 584 | 0 |  |  |  |  |  | linuxaio_submitcnt = 0; | 
| 585 |  |  |  |  |  |  |  | 
| 586 | 0 |  |  |  |  |  | return EVBACKEND_LINUXAIO; | 
| 587 |  |  |  |  |  |  | } | 
| 588 |  |  |  |  |  |  |  | 
| 589 |  |  |  |  |  |  | inline_size | 
| 590 |  |  |  |  |  |  | void | 
| 591 | 0 |  |  |  |  |  | linuxaio_destroy (EV_P) | 
| 592 |  |  |  |  |  |  | { | 
| 593 | 0 |  |  |  |  |  | epoll_destroy (EV_A); | 
| 594 | 0 |  |  |  |  |  | linuxaio_free_iocbp (EV_A); | 
| 595 | 0 |  |  |  |  |  | evsys_io_destroy (linuxaio_ctx); /* fails in child, aio context is destroyed */ | 
| 596 | 0 |  |  |  |  |  | } | 
| 597 |  |  |  |  |  |  |  | 
| 598 |  |  |  |  |  |  | ecb_cold | 
| 599 |  |  |  |  |  |  | static void | 
| 600 | 0 |  |  |  |  |  | linuxaio_fork (EV_P) | 
| 601 |  |  |  |  |  |  | { | 
| 602 | 0 |  |  |  |  |  | linuxaio_submitcnt = 0; /* all pointers were invalidated */ | 
| 603 | 0 |  |  |  |  |  | linuxaio_free_iocbp (EV_A); /* this frees all iocbs, which is very heavy-handed */ | 
| 604 | 0 |  |  |  |  |  | evsys_io_destroy (linuxaio_ctx); /* fails in child, aio context is destroyed */ | 
| 605 |  |  |  |  |  |  |  | 
| 606 | 0 |  |  |  |  |  | linuxaio_iteration = 0; /* we start over in the child */ | 
| 607 |  |  |  |  |  |  |  | 
| 608 | 0 | 0 |  |  |  |  | while (linuxaio_io_setup (EV_A) < 0) | 
| 609 | 0 |  |  |  |  |  | ev_syserr ("(libev) linuxaio io_setup"); | 
| 610 |  |  |  |  |  |  |  | 
| 611 |  |  |  |  |  |  | /* forking epoll should also effectively unregister all fds from the backend */ | 
| 612 | 0 |  |  |  |  |  | epoll_fork (EV_A); | 
| 613 |  |  |  |  |  |  | /* epoll_fork already did this. hopefully */ | 
| 614 |  |  |  |  |  |  | /*fd_rearm_all (EV_A);*/ | 
| 615 |  |  |  |  |  |  |  | 
| 616 | 0 |  |  |  |  |  | ev_io_stop  (EV_A_ &linuxaio_epoll_w); | 
| 617 | 0 |  |  |  |  |  | ev_io_set   (EV_A_ &linuxaio_epoll_w, backend_fd, EV_READ); | 
| 618 | 0 |  |  |  |  |  | ev_io_start (EV_A_ &linuxaio_epoll_w); | 
| 619 | 0 |  |  |  |  |  | } | 
| 620 |  |  |  |  |  |  |  |