We use
a distributed shared dynamic world model for team coordination.
Sharing the same world model enables the team members to reason
not only about their own actions, but they can also predict
the next action a team member will take. The world model is
maintained locally which can cause small differences between
the world models of each robot due to network lag. Communication
on the team coordination level is used to improve robustness
and is almost a necessity since the team is heterogeneous
both in hardware and software. The Utrecht Pioneer 2 robot
uses a subsumption architecture while the Amsterdam/Delft
Nomad Scouts operate on a hybrid architecture. Our team uses
a global team behavior (a strategy) from which each robot
derives its individual behavior (also known as role).
Our teamskills module determines the current team behavior
using a simple finite state machine.
Each player has its individual behavior defined by the team
behavior. There are about half a dozen different individual
behaviors available, all of which have an attacking, defending
or intercepting purpose. Utility functions based on potential
fields \cite{latombe91,khatib86} are used to distribute the
roles among the team members. This technique is similar to
the ones used by other participants in the middle size league
\cite{art00,freiburg01}.
Each robot calculates its own scores for the individual behaviors
useful at the moment, communicates them and then calculates
the scores of its team mates. Incoming scores from one of
your team mates overwrite the scores you calculated for him.
This introduces a level of redundancy which can improve the
team performance. When everybody knows the scores for the
individual behaviors of each team member the roles can be
easily distributed.
The exact
specification of an individual behavior is not (and can not
be) the same on both architectures.
The following describes the Amsterdam/Delft approach which
is similar to the one Tambe \cite{tambe00persistent} described
for use in the simulation league.
A Markov decision process is used to search through state
space of a robot's future states. First a number of possible
actions and their parameters with respect to the current situation
and individual behavior are generated. For each of these actions
the probability of success is calculated. The impact on the
world of an action and the expected concurrent actions of
team mates and enemy players is evaluated.
Modelling team mate actions is relatively easily compared
to modelling your opponent with reasonable accuracy. The evaluation
criteria both include a role evaluation (using potential fields)
as well as concepts like ball possession. Now the action with
the maximum expected utility can be chosen and the process
can be repeated to look further ahead. One of the challenges
of this approach is to calculate the probability a certain
action will succeed; after an estimation by hand we want to
use a technique like reinforcement learning to adjust the
probabilities.
|
The Pioneer
2 robot of Utrecht University uses an extended version of
the subsumption architecture \cite{Brooks}in which particular
behaviors such as {\em Get\_Ball, Dribble}, and {\em Score}
compete for controlling the robot. The competition is modelled
by using preconditions for the behaviors and a priority scheme
for selecting a possible behavior; a behavior such as {\em
Emergency\_Brake} has
the highest priority.
Learning
is integrated as part of the behavior-based control architecture
of the robot. This is done by providing the robot with different
implementations of the same behavior. E.g. we use three {\em
Get\_Ball} behaviors instead of just one. Particular behaviors
may implement fast play while allowing for
errors, whereas another behavior may implement slow, but precise
play. These behaviors are tested during online play against
an opponent. If a particular behavior is successful against
an opponent team (e.g. the {\em Get\_Ball} succeeds in getting
the ball within 10 seconds), the behavior gets strengthened,
and is therefore used more often. This allows us to learn
a sequence of behaviors which has the highest expected probability
of success against an opponent. Since there are not too many
alternative behaviors,
a single RoboCup game could be sufficient to test enough behaviors
so that the best behaviors can be executed during the final
part of the game. To guarantee the best behavior is found,
there is an exploration function which tests alternative,
currently thought sub-optimal, behaviors.
Although
the opponent is not explicitly modelled, which would require
a lot of training data and games, our method can deal with
different opponents by quickly learning the best functioning
sequence of behaviors. This learning behavior may provide
the robot player with additional strength, possibly leading
to quite attractive play during the RoboCup games.
|