Troubleshoot stream (fanout) queues =================================== Beginning with Yaook operator version ``0.20250227.0``, all queues which are created by OpenStack and contain the suffix ``_fanout`` are of type stream, unless the overwrite ``rabbit_stream_fanout=false`` is set inside the OpenStack manifests. You can confirm the type by running: .. code:: bash rabbitmqctl list_queues name type | grep If you want to know more about what RabbitmMQ streams are, head to the `RabbitMQ documentation `_. Commands provided in this guide need to be run inside the ``rabbitmq`` Kubernetes container. Repair the RabbitMQ stream coordinator -------------------------------------- If you encounter repeated OpenStack error messages like the following, the stream coordinator [1]_ is possibly not working properly: .. code:: "message": "Failed to consume message from queue: (0, 0): (541) INTERNAL_ERROR", To confirm that the stream coordinator is unhealthy, run: .. code:: bash rabbitmq-diagnostics coordinator_status With a healthy coordinator cluster, the command will print all cluster nodes with exactly one of them being in "Raft State" leader. If it returns an error or does not terminate, the coordinator cluster is broken and needs to be rebuilt. Single ``noproc`` members will not cause stream operations to fail as long as there is still a leader, even though a quorum loss may be more likely in that case. If the coordinator is functional, but you still see the error message above, read the next section. The probably easiest and most reliable way to achieve a restart of the coordinator is by resetting the coordinator process on all nodes and triggering the coordinator start by declaring a nonexistent stream queue. .. code:: bash nodes=$(rabbitmqadmin list nodes -f bash -u "${RABBITMQ_DEFAULT_USER}" -p "${RABBITMQ_DEFAULT_PASS}") reset_command="" for node in $nodes; do reset_command+="ra:force_delete_server(coordination, {rabbit_stream_coordinator, '${node}'})," done reset_command="${reset_command%?}." rabbitmqctl eval "${reset_command}" queue_name="dont-mind-me-just-triggering-a-coordinator-start" rabbitmqadmin declare queue name="${queue_name}" durable=true auto_delete=false arguments='{"x-queue-type": "stream"}' -u "${RABBITMQ_DEFAULT_USER}" -p "${RABBITMQ_DEFAULT_PASS}" rabbitmqctl delete_queue "${queue_name}" Again, but this time confirm that the coordinator is running: .. code:: bash rabbitmq-diagnostics coordinator_status If the coordinator is still unhealthy, we recommend to rebuild the entire RabbitMQ cluster [2]_. Otherwise, you should confirm that you see no more ``Failed to consume message from queue`` messages inside the logs. If this is still the case keep reading the next section. .. [1] The stream coordinator is a set of multiple coordinator processes that run on each RabbitMQ node and use the Raft algorithm. .. [2] If resolving the issue is not a matter of urgency, we instead recommend to find a fix for your case and contribute to the documentation. Resolve "stream_not_found" errors --------------------------------- This problem displays the same ``Failed to consume message from queue`` OpenStack logs as they appear in case of a dysfunctional stream coordinator, but the following messages can also be encountered in the RabbitMQ logs: .. code:: bash errorContext: child_terminated reason: {{stream_not_found, {resource,<<"/">>,queue,<<"barbican.workers_fanout">>}}, If you already tried to delete the queue using ``rabbitmqctl delete_queue``, you might have noticed that this does not work. In that case, using an internal delete call should still be functional and delete the queue data on all nodes. Afterwards, we will have to recreate the queue along with its exchange and binding manually: .. code:: bash stream="insert-your-queue-name-here" if [[ ${stream} != *fanout ]]; then echo "Error: The recreation is only supported for stream queues created by OpenStack with the suffix '_fanout.' Not doing anything." echo "If this queue was created by OpenStack, the documentation you consulted may be out of date." else rabbitmqctl eval 'rabbit_db_queue:delete({resource,<<"/">>,queue,<<"'${stream}'">>},normal).' rabbitmqadmin declare queue name="${stream}" durable=true auto_delete=false arguments='{"x-queue-type": "stream"}' -u "${RABBITMQ_DEFAULT_USER}" -p "${RABBITMQ_DEFAULT_PASS}" rabbitmqadmin declare exchange name="${stream}" type=fanout durable=true auto_delete=true -u "${RABBITMQ_DEFAULT_USER}" -p "${RABBITMQ_DEFAULT_PASS}" rabbitmqadmin declare binding source="${stream}" destination="${stream}" routing_key="${stream%_fanout}" -u "${RABBITMQ_DEFAULT_USER}" -p "${RABBITMQ_DEFAULT_PASS}" fi If you want to recreate all stream queues, you can run: .. code:: bash streams=$(rabbitmqctl eval "Q=rabbit_db_queue:list()."| cut -d "\"" -f 4 | grep "fanout$") for stream in $streams; do if [[ ${stream} != *fanout ]]; then echo "Error: The recreation is only supported for stream queues created by OpenStack with the suffix '_fanout.' Skipping queue '${stream}'." echo "If this queue was created by OpenStack, the documentation you consulted may be out of date." else rabbitmqctl eval 'rabbit_db_queue:delete({resource,<<"/">>,queue,<<"'${stream}'">>},normal).' rabbitmqadmin declare queue name="${stream}" durable=true auto_delete=false arguments='{"x-queue-type": "stream"}' -u "${RABBITMQ_DEFAULT_USER}" -p "${RABBITMQ_DEFAULT_PASS}" rabbitmqadmin declare exchange name="${stream}" type=fanout durable=true auto_delete=true -u "${RABBITMQ_DEFAULT_USER}" -p "${RABBITMQ_DEFAULT_PASS}" rabbitmqadmin declare binding source="${stream}" destination="${stream}" routing_key="${stream%_fanout}" -u "${RABBITMQ_DEFAULT_USER}" -p "${RABBITMQ_DEFAULT_PASS}" fi done This problem is caused by inconsistent information inside the RabbitMQ database, e.g. the queue might still exist inside ``MNESIA_DURABLE_TABLE`` but be missing inside the ``MNESIA_TABLE``, there can, however, also be other causes. Because stream queues are not critical for OpenStack API operations, they can be deleted and recreated without backing up messages.