Find a way to ignore Errors within a chord
task_reject_on_worker_lost
configuration should solve the problem
But for that to work, enabling task_acks_late
seem to be mandatory.
This way, if the worker is killed externally, the running task at the time will be moved back in the queue to be run later. And the chord will have all of it expected results
See :
- https://docs.celeryq.dev/en/stable/userguide/configuration.html#task-reject-on-worker-lost
- https://docs.celeryq.dev/en/stable/userguide/configuration.html#std-setting-task_acks_late
Sentry Issue: WAVEQC-W
ChordError: Dependency 4ed627ec-7d38-482c-ba36-2e2e128d733a raised WorkerLostError('Worker exited prematurely: signal 11 (SIGSEGV) Job: 4596.')
File "celery/backends/redis.py", line 528, in on_chord_part_return
resl = [unpack(tup, decode) for tup in resl]
File "celery/backends/redis.py", line 528, in <listcomp>
resl = [unpack(tup, decode) for tup in resl]
File "celery/backends/redis.py", line 434, in _unpack_chord_result
raise ChordError(f'Dependency {tid} raised {retval!r}')
Chord "93a7605a-84b6-447c-81e2-97a07cf34d33" raised: "ChordError(\"Dependency 4ed627ec-7d38-482c-ba36-2e2e128d733a raised WorkerLostError('Worker exited prematurely: signal 11 (SIGSEGV) Job: 4596.')\")"
Edited by Simon Panay