Description
We have noticed a situation where EM becomes unusable (unable to install/uninstall extensions on the main wiki or the affected subwiki) after a replay plan was used on a subwiki to uninstall some XAR extensions.
In the Extension History of the affected wiki, a replay plan is reported to be ongoing, but the job log shows a "Successfully applied [UNINSTALL] for extension [ludovicdubost:todo-macro] ..." last entry, while never finishing.
Upon doing a thread dump, we can see that, in fact, an underlying job from that plan is stuck waiting for an answer to a delete pages question:
"extension/wiki:subwiki job group daemon thread - org.xwiki.extension.job.history.internal.ReplayJob@f2193f3([extension, history, 1652692189767-830])" #10694 java.lang.Thread.State: WAITING at jdk.internal.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@23b59026 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2081) at org.xwiki.job.AbstractJobStatus.ask(AbstractJobStatus.java:364) at org.xwiki.job.AbstractJobStatus.ask(AbstractJobStatus.java:331) at org.xwiki.extension.xar.internal.handler.XarExtensionJobFinishedListener.maybeDeletePages(XarExtensionJobFinishedListener.java:265) at org.xwiki.extension.xar.internal.handler.XarExtensionJobFinishedListener.onEvent(XarExtensionJobFinishedListener.java:155) at org.xwiki.observation.internal.DefaultObservationManager.notify(DefaultObservationManager.java:320) at org.xwiki.observation.internal.DefaultObservationManager.notify(DefaultObservationManager.java:285) at org.xwiki.job.AbstractJob.jobFinished(AbstractJob.java:294) at org.xwiki.extension.job.internal.AbstractExtensionJob.jobFinished(AbstractExtensionJob.java:134) at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:248) at org.xwiki.job.AbstractJob.run(AbstractJob.java:220) at org.xwiki.extension.job.history.internal.ReplayJob.replay(ReplayJob.java:137) at org.xwiki.extension.job.history.internal.ReplayJob.runInternal(ReplayJob.java:120) at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:243) at org.xwiki.job.AbstractJob.run(AbstractJob.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.lang.Thread.run(Thread.java:829)
Using the Active Jobs Application (https://extensions.xwiki.org/xwiki/bin/view/Extension/Active%20Jobs%20Application/) to look at the Jobs Module, and active job is reported as Running but with no question active:
( extra debugging added: )
println indentation + "* " + jobStatus.getCurrentRecord() println indentation + "* " + jobStatus.getCurrentRecord().id; println indentation + "* " + jobStatus.getCurrentRecord().answers; println indentation + "* " + jobStatus.getCurrentRecord().answers.values()[0]; println indentation + "* " + jobStatus.getCurrentRecord().answers.values()[0].class; println indentation + "* " + jobStatus.getCurrentRecord().answers.values()[0].answers; println indentation + "* " + jobStatus.getCurrentRecord().request; println indentation + "* " + jobStatus.getQuestion();
outputs:
[Group] extension/wiki:documentation executor: org.xwiki.job.internal.DefaultJobExecutor$JobGroupExecutor@36950378[Running, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 25] [Job] extension/history/1652692189767-830 (org.xwiki.extension.job.history.internal.ReplayJob) Started at: Mon May 16 11:09:49 CEST 2022 *debug begin:* org.xwiki.extension.job.history.ExtensionJobHistoryRecord@7d88cab3 1652688797164-uninstall-wiki:documentation [org.xwiki.extension.xar.question.CleanPagesQuestion:org.xwiki.extension.xar.internal.question.CleanPagesQuestionRecorder@26bb2a34] org.xwiki.extension.xar.internal.question.CleanPagesQuestionRecorder@26bb2a34 class org.xwiki.extension.xar.internal.question.CleanPagesQuestionRecorder [documentation:XWiki.TodoMacroService():true, documentation:XWiki.TodoListMacro():true, documentation:XWiki.TodoMacroInstall():true, documentation:XWiki.TodoMacroGroovy():true, documentation:XWiki.TodoMacro():true, documentation:Todo.ListDemo():true, documentation:Todo.AdvancedDemo():true, documentation:Todo.SimpleDemo():true, documentation:Todo.WebHome():true] org.xwiki.extension.job.UninstallRequest@9a083e74 null *debug end.* Cancelable: false Queue size: 1 extension/plan/org.xwiki.contrib:application-task-ui/wiki:documentation (org.xwiki.extension.job.internal.InstallPlanJob)
In the job status xml on the filesystem was only showing the success of the todo-macro uninstalled XAR extension, similar to the UI.
It seems like the question corresponds to a different subtask/subjob than the actually reported one (which is finished).
According to tmortagne, any question inside a subtask of the ReplayJob should have been bubbled up to the top job, to be displayed in the UI and even in the Active Jobs extension, but it was not happening in our case, which suggests a bug.
If more details and reproduction steps become available, I will provide them in the comments. So far, that is all we have.
Attachments
Issue Links
- relates to
-
XWIKI-19728 Unanswered job questions can block the wiki
- Open
-
XWIKI-21120 The Replay action gets stuck upon finding a conflict on a page, regardless of whether an answer is stored for that page or not
- Closed