ics 806-Week 10

download ics 806-Week 10

of 21

Transcript of ics 806-Week 10

  • 8/14/2019 ics 806-Week 10

    1/21

    November 2009 School of Computing and

    Informatics

    MSC COMPUTER SCIENCE

    Multi-agent systems learningSession Topics

    1. Defining Learning

    2. How Humans Learn

    3. Machine Learning

    4. Reinforcement Learning

    5. Q-Learning

    6. Learning in Multi-Agent Systems

    7. End of Session Exercises

  • 8/14/2019 ics 806-Week 10

    2/21

    November 2009 School of Computing and

    Informatics

    MSC COMPUTER SCIENCEMulti-agent systems learning

    Learning by humans - Educational perspective (Wikipedia)

    Learning is the process of gaining understanding that

    leads to the modification of attitudes and behaviors

    through the acquisition of knowledge, skills and values,

    through study and experience.

    Learning causes a change of behavior that is persistent,measurable, and specified or allows an individual to

    formulate a new mental construct or revise a prior mental

    construct (conceptual knowledge such as attitudes orvalues).

    It depends on experience and leads to long-term changes

    in behavior potential.

  • 8/14/2019 ics 806-Week 10

    3/21

    November 2009 School of Computing and

    Informatics

    ICS 806 - MULTI-AGENT SYSTEMS

    Multi-agent systems learning

    Learning by humans - Educational perspective (Wikipedia)Basic learning processes

    Bloom's Taxonomydivides the learning process into a six-level hierarchy,

    where knowledge is the lowest order ofcognitionand evaluation the

    highest:

    1. Knowledgeis the memory of previously-learned materials such as facts,terms, basic concepts, and answers.

    2. Comprehensionis the understanding of facts and ideas by organization,comparison, translation, interpretation, and description.

    3. Applicationis the use of new knowledge to solve problems.4. Analysisis the examination and division of information into parts by

    identifying motives or causes. A person can analyze by making

    inferences and finding evidence to support generalizations.

    5. Synthesisis the compilation of information in a new way by combining

    elements into patterns or proposing alternative solutions.

    6. Evaluationis the presentation and defense of opinions by making judgments

    about information, validity of ideas, or quality of work based on a set of criteria.

  • 8/14/2019 ics 806-Week 10

    4/21

    November 2009 School of Computing and

    Informatics

    ICS 806 - MULTI-AGENT SYSTEMS

    Ways by which human learning may take place Learning by example. Imitation of a role modelis a natural mechanismforinfantsand children, when learning from experience.

    Learning by worked examples. Worked examples in books showing

    exactly how the author solved problems, step by step, for example, inmathematics. Such examples may help explain methods in different

    ways

    Learning by teaching. Teachers share new lesson content with groupsof students who may prepare on their part in order to teach this

    content to the rest of the class. Learning which alternative methods exist. Sometimes different methodscan be

    applied to solve a particular problem. These methods need to be pointed out by

    the teacher, in which case the student should also be made aware of how to

    select the "best" method from among those available, and which textbooksare

    likely to be especially helpful.

    Learning which shortcuts exist to solve specific problems. Sometimesshortcutsexist that can reduce by many hours the solution of practical

    problems. For example, Maxima and minimaof functions can be obtained "thehard way" by a whole series of numerical calculations, while the use of

    calculus is often a shortcut.

  • 8/14/2019 ics 806-Week 10

    5/21

    November 2009 School of Computing and

    Informatics

    ICS 806 - MULTI-AGENT SYSTEMS

    Overview of Learning in Artificial IntelligenceMachine learning (Wikipedia)

    Machine learning is a subfield of artificial intelligence. It focuses on the

    development of algorithms and techniques that allow computers to"learn". At a general level, there are two types of learning: inductive, anddeductive. Inductive machine learning methods create computer

    programs by extracting rules and patterns out of massive data sets.

    Application areas - several and including natural language processing,search engines, medical diagnosis, bioinformaticsand cheminformatics,detecting credit card fraud, stock marketanalysis, classifying DNAsequences, speechand handwriting recognition, object recognitionin

    computer vision, game playingand robot locomotion.

  • 8/14/2019 ics 806-Week 10

    6/21

    November 2009 School of Computing and

    Informatics

    Type of machine learning algorithms

    Machine learning algorithmsare commonly grouped as follows: supervised learning--- where the algorithm generates a function that

    maps inputs to desired outputs. One standard formulation of the

    supervised learning task is the classificationproblem: the learner is

    required to learn (to approximate the behavior of) a function which mapsa vector into one of several classes by looking at several input-output

    examples of the function. unsupervised learning--- which models a set of inputs: labeled examples

    are not available.

    semi-supervised learning--- which combines both labeled and unlabeledexamples to generate an appropriate function or classifier.

    reinforcement learning--- where the algorithm learns a policy of how to act givenan observation of the world. Every action has some impact in the environment,

    and the environment provides feedback that guides the learning algorithm. transduction--- similar to supervised learning, but does not explicitly construct afunction: instead, tries to predict new outputs based on training inputs, training

    outputs, and new inputs.

    learning to learn--- where the algorithm learns its own inductive biasbased on

    previous experience.

  • 8/14/2019 ics 806-Week 10

    7/21

    November 2009 School of Computing and

    Informatics

    Reinforcement learning

    Reinforcement learning refers to a class of problems inmachine learningwhich postulate an agent exploring anenvironment in which the agent perceives its current state

    and takes actions. The environment, in return, provides areward (which can be positive or negative).

    Reinforcement learning algorithms attempt to find apolicy for maximizing cumulative reward for the agent

    over the course of solving a problem.

  • 8/14/2019 ics 806-Week 10

    8/21

    November 2009 School of Computing and

    Informatics

    Basic reinforcement learning model:

    1.a set of environment states S;2.a set of actions A; and

    3.a set of scalar "rewards" in .

    At each time t, the agent perceives its state st

    Sand the set of possible actions A(st). It chooses anaction a A(st) and receives from the environment

    the new state st+1 and a reward rt+1. Based on theseinteractions, the reinforcement learning agent must

    develop a policy :SA which maximizes the

    quantity R=r0+r1+...+rn .

  • 8/14/2019 ics 806-Week 10

    9/21

    November 2009 School of Computing and

    Informatics

    (From Sutton and Barto- Reinforcement learning, an introduction)

  • 8/14/2019 ics 806-Week 10

    10/21

    November 2009 School of Computing and

    Informatics

    Q-learningQ-learning is a reinforcement learning technique. It involves learning an

    action-value function that gives the expected utility of taking a givenaction in a given state and following a fixed policy thereafter. An

    advantage of Q-learning is that it is able to compare the expected utility of

    the available actions without requiring a model of the environment.

    where: s - current state, s' - next state, a - action, a' - action of the nextstate, r - immediate reward, - learning rate, - discount factor,

    Q(s,a) - expected discounted reinforcement of taking action a in state s.

  • 8/14/2019 ics 806-Week 10

    11/21

    November 2009 School of Computing and

    Informatics

    Q-Learning AlgorithmThe core of the algorithm is a value iteration update.

    For each state, s, from the state set S, and for each action, A,we can calculate an update to its expected discounted reward with

    the following expression:

    where ris an observed real reward, is a convergence rate such that 0 < < 1, and is a discount rate such that 0 < < 1.

  • 8/14/2019 ics 806-Week 10

    12/21

    November 2009 School of Computing and

    Informatics

    Begin

    For each s, a, initialize table entry Q(s,a)

  • 8/14/2019 ics 806-Week 10

    13/21

    November 2009 School of Computing and

    Informatics

    Russel & Novigs Q-learning algorithm P.614

    Function Q-Learning- Agent (e) returns an action

    static Q, a table of action valuesN, a table of state-action values

    a, the last action taken

    i, the previous state visited

    r, the reward received in state i

    j STATE[e]If i is non-null then

    N[a,i] N[a,i] +1Q[a,i] Q[a,i] + a(r + maxd Q[a, j] Q[a,i])

    If TERMINAL?[e] then

    i nullelse

    i j; r REWARD[e]; a arg maxd f(Q[a,j], N[a, j])return a

  • 8/14/2019 ics 806-Week 10

    14/21

    November 2009 School of Computing and

    Informatics

    Multi-agent systems learning(Stone & Veloso(1997))ML techniques can be directly applied in multi-agent

    systems;Multi-agent learning is more concerned with learning

    issues that arise because of the multiagent aspect of a

    given domain.Multi-agent learning is learning that is done by several

    agents and that becomes possible only because

    several agents are present. For example in football itinvolves learning those items that are related to groupactions. Or individual actions for the sake of thegroup(stone&veloso(1997)).

    Classification of agents(Stone and Veloso (1997)) homogeneous non-communicating; heterogeneous non-communicating and,

    heterogeneous communicating agents

  • 8/14/2019 ics 806-Week 10

    15/21

    November 2009 School of Computing and

    Informatics

    Homogeneous non-communicating agents

    IssuesReactive vs. deliberative agents

    Local or global perspective

    Modeling of other agents states

    How to affect others

    Learning opportunities

    Enable others actions

    Sensor data

    Other agents sensor data

    Learning Techniques and areas

    Reactive Behaviors for Formation maintenance. Balch[7]

    Local knowledge sometimes better. Roychowdhury[67] (limited) Recursive Modeling Method (RMM). Durfee[24]

    Dont model othersjust pay attention to reward. Schmidhuber[77] Stigmergy. Holland/Goldman and Rosenschein[39, 31] Qlearning for behaviors like foraging, homing, etc. Mataric[52]

  • 8/14/2019 ics 806-Week 10

    16/21

    November 2009 School of Computing and

    Informatics

    Heterogeneous non-communicating

    agents

    Issues

    Benevolence vs. competitiveness Stable vs. evolving agents (arms race, credit/blame)

    Modeling of others goals, actions, and knowledge

    Resource management (interdependent actions) Social conventions

    Roles

    Learning opportunities

    Credit/blame in competitive scenarios

    Behaviors that blend well with team

    Prediction of others actions

    Dynamic role assumption

  • 8/14/2019 ics 806-Week 10

    17/21

    November 2009 School of Computing and

    Informatics

    Heterogeneous non-communicating agents

    Learning Techniques and areasGame theory, iterative play. By Mor and Rosenschein/Sandholm and CritesMinimaxQ.

    Competitive coevolution. By Haynes and Sen/Grefenstette and Daley/Rosin andBelewDeduce intentions through observation. By Huber and DurfeeAutoepistemic reasoning (ignorance). By PermpoontanalarpModel as a team (individual role). By TambeSocial reasoning: depend on others for goal (game theory). By Sichman and

    DemazeauGAs to deal with Braes paradox (more resource ,worse). By Arora and SenMultiagent RL for adaptive Load Balancing. By Schaerf, Shoham, andTennenholtzFocal points/Emergent conventions. By Fenster et al./Walker and WooldridgeDesign agents play different roles. By Prasad et al.

  • 8/14/2019 ics 806-Week 10

    18/21

    November 2009 School of Computing and

    Informatics

    Heterogeneous communicating agents

    Issues

    Understanding each other

    Planning communicative acts

    Benevolence vs. competitiveness Resource management (schedule coordination)

    Commitment/decommitment

    Truth in communication

    Learning opportunities

    Evolving language

    Effects of speech acts on global dynamics

    Communication utility and truthfulness

    Commitment utility

  • 8/14/2019 ics 806-Week 10

    19/21

    November 2009 School of Computing and

    Informatics

    Heterogeneous communicating agentsLearning and related Techniques

    Language protocols: KIF for content (Genesreth and Fikes, KQML for

    message format by Finin et al. Speech acts. By Cohen and Levesque/Lux and Steiner Learning social behaviors. By Mataric Bayesian learning in negotiation: model others. By Zeng and Sycara

    Multiagent Qlearning. By Weiss Training other agents Qfunctions (track driving). By Clouse Minimize the need for training. By Potter and Grefenstette

    Cooperative coevolution. By Bull et al. Contract nets for electronic commerce. By Sandholm and Lesser Marketbased systems. By Huberman and Clearwater Generalized Partial Global Planning (GPGP). By Decker Internal, Social, and Collective (role) commitments. By Castelfranchi Commitment states (potential, pre, and actual) as planning states.

    ByHaddadi Belief/Desire/Intention (BDI) model: OASIS. By Rao and Georgeff BDI commitments only over intentions. By Rao and Georgeff Coalitions. By Zlotkin and Rosenschein/Shehory and Kraus/Sandholm

    and Lesser Reasoning about truthfulness. By Sandholm and Lesser/ Rosenschein

  • 8/14/2019 ics 806-Week 10

    20/21

    November 2009 School of Computing and

    Informatics

    CRITIC how well

    agent is doing

    LEARNING ELEMENT

    make improvements

    PROBLEM GENERATOR

    suggest actions that maylead to new experiences

    PERFORMANCE

    ELEMENT select external

    action

    Environment

    Feedback

    Sensors

    Changes

    Learning goals

    Future adjustmentsEffectorsKnowledge

    A learning agent: Adapted from Russel & Novig P.526

  • 8/14/2019 ics 806-Week 10

    21/21

    November 2009 School of Computing and

    Informatics

    END OF SESSION(WEEK 11) EXERCISES

    1. Define the term learning

    2. Describe how human beings learn

    3. What is machine learning?4. Outline various types of machine learning

    algorithms

    5. What is reinforcement learning?

    6. What is Q-learning

    7. Formulate algorithms for basic reinforcementleaning and Q-learning

    8. Describe learning in Multi-Agent Systems