Post-Planning Optimisation using Bandit-Controlled Local Optimisation
Fazlul Siddiqui
ARTIFICIAL INTELLIGENCE SEMINAR PhD monitoringDATE: 2013-11-19
TIME: 13:30:00 - 14:00:00
LOCATION: RSISE Seminar Room, ground floor, building 115, cnr. North and Daley Roads, ANU
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
Continuing plan quality improvement is very popular in automated planning due to its ability of striking a balance between solution quality and solver efficiency. The anytime search approach has been quite successful so far to meet this goal especially in the early time scale, but quickly reaches a limit where it becomes unable to find further improvements. Post-processing is a good alternative to continue the plan quality improvement even when the current best planners stop improving. However, typically in anytime search, the lowest-cost solution found so far is used to constrain the search. This bounding approach can harm a planner's performance since the bound may prevent the search from ever finding additional plans for the post-processor to improve. This work is a new way of post-planning optimisation, which repeatedly decomposing a given plan into subplans and optimising each subplan locally. This strategy generates a diverse set of subplans for the post-processor to work on. Besides, it makes use of simple multi-armed bandit learning techniques to take constructive top level decisions from many possible alternative decisions which in turn control the underlying post-planning optimisation process. I am looking forward to make an effective use of transfer learning over the decomposed plan to get quick and further improvement of the plan quality.
BIO:





