ics 806-Week 10

8/14/2019 ics 806-Week 10

1/21

November 2009 School of Computing and

Informatics

MSC COMPUTER SCIENCE

Multi-agent systems learningSession Topics

1. Defining Learning

2. How Humans Learn

3. Machine Learning

4. Reinforcement Learning

5. Q-Learning

6. Learning in Multi-Agent Systems

7. End of Session Exercises

8/14/2019 ics 806-Week 10

2/21


Informatics

MSC COMPUTER SCIENCEMulti-agent systems learning

Learning by humans - Educational perspective (Wikipedia)

Learning is the process of gaining understanding that

leads to the modification of attitudes and behaviors

through the acquisition of knowledge, skills and values,

through study and experience.

Learning causes a change of behavior that is persistent,measurable, and specified or allows an individual to

formulate a new mental construct or revise a prior mental

construct (conceptual knowledge such as attitudes orvalues).

It depends on experience and leads to long-term changes

in behavior potential.

8/14/2019 ics 806-Week 10

3/21


Informatics

ICS 806 - MULTI-AGENT SYSTEMS

Multi-agent systems learning

Learning by humans - Educational perspective (Wikipedia)Basic learning processes

Bloom's Taxonomydivides the learning process into a six-level hierarchy,

where knowledge is the lowest order ofcognitionand evaluation the

highest:

1. Knowledgeis the memory of previously-learned materials such as facts,terms, basic concepts, and answers.

2. Comprehensionis the understanding of facts and ideas by organization,comparison, translation, interpretation, and description.

3. Applicationis the use of new knowledge to solve problems.4. Analysisis the examination and division of information into parts by

identifying motives or causes. A person can analyze by making

inferences and finding evidence to support generalizations.

5. Synthesisis the compilation of information in a new way by combining

elements into patterns or proposing alternative solutions.

6. Evaluationis the presentation and defense of opinions by making judgments

about information, validity of ideas, or quality of work based on a set of criteria.

8/14/2019 ics 806-Week 10

4/21


Informatics


Ways by which human learning may take place Learning by example. Imitation of a role modelis a natural mechanismforinfantsand children, when learning from experience.

Learning by worked examples. Worked examples in books showing

exactly how the author solved problems, step by step, for example, inmathematics. Such examples may help explain methods in different

ways

Learning by teaching. Teachers share new lesson content with groupsof students who may prepare on their part in order to teach this

content to the rest of the class. Learning which alternative methods exist. Sometimes different methodscan be

applied to solve a particular problem. These methods need to be pointed out by

the teacher, in which case the student should also be made aware of how to

select the "best" method from among those available, and which textbooksare

likely to be especially helpful.

Learning which shortcuts exist to solve specific problems. Sometimesshortcutsexist that can reduce by many hours the solution of practical

problems. For example, Maxima and minimaof functions can be obtained "thehard way" by a whole series of numerical calculations, while the use of

calculus is often a shortcut.

8/14/2019 ics 806-Week 10

5/21


Informatics


Overview of Learning in Artificial IntelligenceMachine learning (Wikipedia)

Machine learning is a subfield of artificial intelligence. It focuses on the

development of algorithms and techniques that allow computers to"learn". At a general level, there are two types of learning: inductive, anddeductive. Inductive machine learning methods create computer

programs by extracting rules and patterns out of massive data sets.

Application areas - several and including natural language processing,search engines, medical diagnosis, bioinformaticsand cheminformatics,detecting credit card fraud, stock marketanalysis, classifying DNAsequences, speechand handwriting recognition, object recognitionin

computer vision, game playingand robot locomotion.

8/14/2019 ics 806-Week 10

6/21


Informatics

Type of machine learning algorithms

Machine learning algorithmsare commonly grouped as follows: supervised learning--- where the algorithm generates a function that

maps inputs to desired outputs. One standard formulation of the

supervised learning task is the classificationproblem: the learner is

required to learn (to approximate the behavior of) a function which mapsa vector into one of several classes by looking at several input-output

examples of the function. unsupervised learning--- which models a set of inputs: labeled examples

are not available.

semi-supervised learning--- which combines both labeled and unlabeledexamples to generate an appropriate function or classifier.

reinforcement learning--- where the algorithm learns a policy of how to act givenan observation of the world. Every action has some impact in the environment,

and the environment provides feedback that guides the learning algorithm. transduction--- similar to supervised learning, but does not explicitly construct afunction: instead, tries to predict new outputs based on training inputs, training

outputs, and new inputs.

learning to learn--- where the algorithm learns its own inductive biasbased on

previous experience.

8/14/2019 ics 806-Week 10

7/21


Informatics

Reinforcement learning

Reinforcement learning refers to a class of problems inmachine learningwhich postulate an agent exploring anenvironment in which the agent perceives its current state

and takes actions. The environment, in return, provides areward (which can be positive or negative).

Reinforcement learning algorithms attempt to find apolicy for maximizing cumulative reward for the agent

over the course of solving a problem.

8/14/2019 ics 806-Week 10

8/21


Informatics

Basic reinforcement learning model:

1.a set of environment states S;2.a set of actions A; and

3.a set of scalar "rewards" in .

At each time t, the agent perceives its state st

Sand the set of possible actions A(st). It chooses anaction a A(st) and receives from the environment

the new state st+1 and a reward rt+1. Based on theseinteractions, the reinforcement learning agent must

develop a policy :SA which maximizes the

quantity R=r0+r1+...+rn .

8/14/2019 ics 806-Week 10

9/21


Informatics

(From Sutton and Barto- Reinforcement learning, an introduction)

8/14/2019 ics 806-Week 10

10/21


Informatics

Q-learningQ-learning is a reinforcement learning technique. It involves learning an

action-value function that gives the expected utility of taking a givenaction in a given state and following a fixed policy thereafter. An

advantage of Q-learning is that it is able to compare the expected utility of

the available actions without requiring a model of the environment.

where: s - current state, s' - next state, a - action, a' - action of the nextstate, r - immediate reward, - learning rate, - discount factor,

Q(s,a) - expected discounted reinforcement of taking action a in state s.

8/14/2019 ics 806-Week 10

11/21


Informatics

Q-Learning AlgorithmThe core of the algorithm is a value iteration update.

For each state, s, from the state set S, and for each action, A,we can calculate an update to its expected discounted reward with

the following expression:

where ris an observed real reward, is a convergence rate such that 0 < < 1, and is a discount rate such that 0 < < 1.

8/14/2019 ics 806-Week 10

12/21


Informatics

Begin

For each s, a, initialize table entry Q(s,a)

8/14/2019 ics 806-Week 10

13/21


Informatics

Russel & Novigs Q-learning algorithm P.614

Function Q-Learning- Agent (e) returns an action

static Q, a table of action valuesN, a table of state-action values

a, the last action taken

i, the previous state visited

r, the reward received in state i

j STATE[e]If i is non-null then

N[a,i] N[a,i] +1Q[a,i] Q[a,i] + a(r + maxd Q[a, j] Q[a,i])

If TERMINAL?[e] then

i nullelse

i j; r REWARD[e]; a arg maxd f(Q[a,j], N[a, j])return a

8/14/2019 ics 806-Week 10

14/21


Informatics

Multi-agent systems learning(Stone & Veloso(1997))ML techniques can be directly applied in multi-agent

systems;Multi-agent learning is more concerned with learning

issues that arise because of the multiagent aspect of a

given domain.Multi-agent learning is learning that is done by several

agents and that becomes possible only because

several agents are present. For example in football itinvolves learning those items that are related to groupactions. Or individual actions for the sake of thegroup(stone&veloso(1997)).

Classification of agents(Stone and Veloso (1997)) homogeneous non-communicating; heterogeneous non-communicating and,

heterogeneous communicating agents

8/14/2019 ics 806-Week 10

15/21


Informatics

Homogeneous non-communicating agents

IssuesReactive vs. deliberative agents

Local or global perspective

Modeling of other agents states

How to affect others

Learning opportunities

Enable others actions

Sensor data

Other agents sensor data

Learning Techniques and areas

Reactive Behaviors for Formation maintenance. Balch[7]

Local knowledge sometimes better. Roychowdhury[67] (limited) Recursive Modeling Method (RMM). Durfee[24]

Dont model othersjust pay attention to reward. Schmidhuber[77] Stigmergy. Holland/Goldman and Rosenschein[39, 31] Qlearning for behaviors like foraging, homing, etc. Mataric[52]

8/14/2019 ics 806-Week 10

16/21


Informatics

Heterogeneous non-communicating

agents

Issues

Benevolence vs. competitiveness Stable vs. evolving agents (arms race, credit/blame)

Modeling of others goals, actions, and knowledge

Resource management (interdependent actions) Social conventions

Roles


Credit/blame in competitive scenarios

Behaviors that blend well with team

Prediction of others actions

Dynamic role assumption

8/14/2019 ics 806-Week 10

17/21


Informatics

Heterogeneous non-communicating agents

Learning Techniques and areasGame theory, iterative play. By Mor and Rosenschein/Sandholm and CritesMinimaxQ.

Competitive coevolution. By Haynes and Sen/Grefenstette and Daley/Rosin andBelewDeduce intentions through observation. By Huber and DurfeeAutoepistemic reasoning (ignorance). By PermpoontanalarpModel as a team (individual role). By TambeSocial reasoning: depend on others for goal (game theory). By Sichman and

DemazeauGAs to deal with Braes paradox (more resource ,worse). By Arora and SenMultiagent RL for adaptive Load Balancing. By Schaerf, Shoham, andTennenholtzFocal points/Emergent conventions. By Fenster et al./Walker and WooldridgeDesign agents play different roles. By Prasad et al.

8/14/2019 ics 806-Week 10

18/21


Informatics

Heterogeneous communicating agents

Issues

Understanding each other

Planning communicative acts

Benevolence vs. competitiveness Resource management (schedule coordination)

Commitment/decommitment

Truth in communication


Evolving language

Effects of speech acts on global dynamics

Communication utility and truthfulness

Commitment utility

8/14/2019 ics 806-Week 10

19/21


Informatics

Heterogeneous communicating agentsLearning and related Techniques

Language protocols: KIF for content (Genesreth and Fikes, KQML for

message format by Finin et al. Speech acts. By Cohen and Levesque/Lux and Steiner Learning social behaviors. By Mataric Bayesian learning in negotiation: model others. By Zeng and Sycara

Multiagent Qlearning. By Weiss Training other agents Qfunctions (track driving). By Clouse Minimize the need for training. By Potter and Grefenstette

Cooperative coevolution. By Bull et al. Contract nets for electronic commerce. By Sandholm and Lesser Marketbased systems. By Huberman and Clearwater Generalized Partial Global Planning (GPGP). By Decker Internal, Social, and Collective (role) commitments. By Castelfranchi Commitment states (potential, pre, and actual) as planning states.

ByHaddadi Belief/Desire/Intention (BDI) model: OASIS. By Rao and Georgeff BDI commitments only over intentions. By Rao and Georgeff Coalitions. By Zlotkin and Rosenschein/Shehory and Kraus/Sandholm

and Lesser Reasoning about truthfulness. By Sandholm and Lesser/ Rosenschein

8/14/2019 ics 806-Week 10

20/21


Informatics

CRITIC how well

agent is doing

LEARNING ELEMENT

make improvements

PROBLEM GENERATOR

suggest actions that maylead to new experiences

PERFORMANCE

ELEMENT select external

action

Environment

Feedback

Sensors

Changes

Learning goals

Future adjustmentsEffectorsKnowledge

A learning agent: Adapted from Russel & Novig P.526

8/14/2019 ics 806-Week 10

21/21


Informatics

END OF SESSION(WEEK 11) EXERCISES

1. Define the term learning

2. Describe how human beings learn

3. What is machine learning?4. Outline various types of machine learning

algorithms

5. What is reinforcement learning?

6. What is Q-learning

7. Formulate algorithms for basic reinforcementleaning and Q-learning

8. Describe learning in Multi-Agent Systems

ics 806-Week 10

Documents

Transcript of ics 806-Week 10