Rule-Based Reinforcement Learning Augmented by External Knowledge

Total Page:16

File Type:pdf, Size:1020Kb

Rule-Based Reinforcement Learning Augmented by External Knowledge Rule-based Reinforcement Learning augmented by External Knowledge Nicolas Bougie12, Ryutaro Ichise1 1 National Institute of Informatics 2 The Graduate University for Advanced Studies Sokendai [email protected], [email protected] Abstract Learning from scratch and lack of interpretability impose some problems on deep reinforcement learning methods. Reinforcement learning has achieved several suc- Randomly initializing the weights of a neural network is in- cesses in sequential decision problems. However, efficient. Furthermore, this is likely intractable to train the these methods require a large number of training model in many domains due to a large amount of required iterations in complex environments. A standard data. Additionally, most RL algorithms cannot introduce ex- paradigm to tackle this challenge is to extend rein- ternal knowledge limiting their performance. Moreover, the forcement learning to handle function approxima- impossibility to explain and understand the reason for a de- tion with deep learning. Lack of interpretability cision restricts their use to non-safety critical domains, ex- and impossibility to introduce background knowl- cluding for example medicine or law. An approach to tackle edge limits their usability in many safety-critical these problems is to combine simple reinforcement learning real-world scenarios. In this paper, we study how techniques and external knowledge. to combine reinforcement learning and external A powerful recent idea to address the problem of computa- knowledge. We derive a rule-based variant ver- tional expenses is to modularize the model into an ensemble sion of the Sarsa(λ) algorithm, which we call Sarsa- of experts [Lample and Chaplot, 2017], [Bougie and Ichise, rb(λ), that augments data with complex knowledge 2017]. The task is divided into a sequence of stages and for and exploits similarities among states. We apply each one, a policy is learned. Since each expert focuses on our method to a trading task from the Stock Mar- learning a stage of the task, the reduction of the actions to ket Environment. We show that the resulting al- consider leads to a shorter learning period. Although this ap- gorithm leads to much better performance but also proach is conceptually simple, it does not handle very com- improves training speed compared to the Deep Q- plicated environments and environments with a large set of learning (DQN) algorithm and the Deep Determin- actions. istic Policy Gradients (DDPG) algorithm. Another technique is called Hierarchical Learning [Tessler et al., 2017], [Barto and Mahadevan, 2003] and is used to solve complex tasks, such as “simulating human brain” [Lake 1 Introduction et al., 2016]. It is inspired by human learning which uses pre- Over last few years, reinforcement learning (RL) has made vious experiences to face new situations. Instead of learning significant progress to learn good policies in many domains. directly the entire task, different sub-tasks are learned by the Well-known temporal difference (TD) methods such as Sarsa agent. By reusing knowledge acquired from the previous sub- [Sutton, 1996] or Q-learning [Watkins and Dayan, 1992] tasks, the learning is faster and easier. Some limitations are learn to predict the best action to take by step-wise interac- the necessity to re-train the model which is time-consuming tions with the environment. In particular, Q-learning has been and problems related to the catastrophic forgetting of knowl- shown to be effective in solving the traveling salesman prob- edge on previous tasks. All the previously cited approaches lem [Gambardella and Dorigo, 1995] or learning to drive a suffer from lack of interpretation reducing their usage in crit- bicycle [Randløv and Alstrøm, 1998]. However large or con- ical applications such as autonomous driving. tinuous state spaces limit their application to simple environ- An approach, Symbolic Reinforcement Learning [Garnelo ments. et al., 2016], [d’Avila Garcez et al., 2018] combines a system Recently, combining advances in deep learning and rein- that learns an abstracted representation of the environment forcement learning has proved to be very successful in mas- and high-order reasoning. However this has several limita- tering complex tasks. A significant example is the combina- tions, it cannot support ongoing adaptation to a new environ- tion of neural networks and Q-learning, resulting in “Deep ment and cannot handle external sources of prior knowledge. Q-Learning” (DQN) [Mnih et al., 2013], able to achieve hu- This paper demonstrates that a simple reinforcement learn- man performance on many tasks including Atari video games ing agent can overcome these challenges to learn control poli- [Bellemare et al., 2013]. cies. Our model is trained with a variant of the Sarsa(λ) algorithm [Singh and Sutton, 1996]. We introduce external 2.1 Q-learning algorithm knowledge by representing the states as rules. Rules trans- Q-learning [Watkins and Dayan, 1992] is a common tech- form the raw data into a compressed and high-level represen- nique to approximate π ≈ π∗. The estimation of the action tation. To deal with the problem of training speed and highly value function is iteratively performed by updating Q(s; a). fluctuating environments [Dundar et al., ], we use a sub-states This algorithm is considered as an off-policy method since mechanism. Sub-states allow a more frequent update of the the update rule is unrelated to the policy that is learned, as Q-values thereby smooth and speed-up the learning. Further- follows: more, we adapted eligibility traces which turned out to be critical in guiding the algorithm to solve tasks. In order to evaluate our method, we constructed a variety Q(st; at) Q(st; at)+ of trading environment simulations based on real stock mar- α [r + γ ∗ max Q(s ; a) − Q(s ; a )] (4) ket data. Our rule-based approach, Sarsa-rb(λ), can learn to t+1 a t+1 t t trade in a small number of iterations. In many cases, we The choice of the action follows a policy derived from Q. are able to outperform the well-known Deep Q-learning al- The most common policy called -greedy policy trade-off the gorithm in term of quality of policy and training time. Sarsa- exploration=exploitation dilemma. In case of exploration, a rb(λ) also exhibits higher performance than DDPG [Lillicrap random action is sampled whereas exploitation selects the ac- et al., 2015] after converging. tion with the highest estimated return. In order to converge to The paper is organized as follows. Section 2 gives an a stable policy, the probability of exploitation must increase overview of reinforcement learning. Section 3 describes the over time. An obvious approach to adapting Q-learning to main contributions of the paper. Section 4 presents the exper- continuous domains is to discretize the state spaces, leading iments and the results. Section 5 presents the main conclu- to an explosion of the number of Q-values. Therefore, a good sions drawn from the work. estimation of the Q-values in this context is often intractable. 2.2 Sarsa algorithm 2 Reinforcement Learning Sarsa is a temporal differentiation (TD) control method. The Reinforcement learning consists of an agent learning a pol- key difference between Q-learning and Sarsa is that Sarsa in icy π by interacting with an environment. At each time-step an on-policy method. It implies that the Q-values are learned the agent receives an observation st and chooses an action at. based on the action performed by the current policy instead The agent gets a feedback from the environment called a re- of a greedy policy. The update rule becomes : ward r . Given this reward and the observation, the agent can t Q(s ; a ) Q(s ; a )+α[r +γQ(s ; a )−Q(s ; a )] update its policy to improve the future rewards. t t t t t+1 t+1 t+1 t t Given a discount factor γ, the future discounted reward, (5) called return Rt, is defined as follows : Q : X × A ! T Algorithm 1 Sarsa: Learn function R X t0−t procedure SARSA(X , A, R, T , α, γ) Rt = γ rt0 (1) Initialize Q : X × A ! uniformly t0=t R while Q is not converged do where T is the time-step at which the epoch terminates. Start in state s 2 X The goal of reinforcement learning is to learn to select the Choose a from s using policy derived from Q (e.g., action with the maximum return Rt achievable for a given -greedy) observation [Sutton and Barto, 1998]. From Equation (1), while s is not terminal do 0 we can define the action value Qπ(s; a) at a time t as the Take action a, observe r, s 0 0 expected reward for selecting an action a for a given state st Choose a from s using policy derived from Q and following a policy π. (e.g., -greedy) Q(s; a) Q(s; a) + α · (r + γ · Q(s0; a0) − π Q (s; a) = E [Rt j st = s; a] (2) Q(s; a)) s s0 The optimal policy is defined as selecting the action with the a a0 optimal Q-value, the highest expected return, followed by an return Q optimal sequence of actions. This obeys the Bellman opti- mality equation: Sarsa converges with probability 1 to an optimal policy as long as all the action-value states are visited an infinite h 0 0 i Q∗(s; a) = r + γ max Q∗(s ; a ) j s; a (3) number of times. Unfortunately, it is not possible to straight- E 0 a forwardly apply Sarsa learning to continuous or large state In temporal difference (TD) learning methods such as Q- spaces. Such large spaces are difficult to explore since it learning or Sarsa, the Q-values are updated after each time- requires a frequent visit of each state to accurately estimate step instead of updating the values after each epoch, as hap- their values, resulting in an inefficient estimation of the Q- pens in Monte Carlo learning.
Recommended publications
  • Personality and Individual Differences 128 (2018) 162–169
    Personality and Individual Differences 128 (2018) 162–169 Contents lists available at ScienceDirect Personality and Individual Differences journal homepage: www.elsevier.com/locate/paid Risk as reward: Reinforcement sensitivity theory and psychopathic T personality perspectives on everyday risk-taking ⁎ Liam P. Satchella, , Alison M. Baconb, Jennifer L. Firthc, Philip J. Corrd a School of Law and Criminology, University of West London, United Kingdom b School of Psychology, Plymouth University, United Kingdom c Department of Psychology, Nottingham Trent University, United Kingdom d Department of Psychology, City, University of London, United Kingdom ARTICLE INFO ABSTRACT Keywords: This study updates and synthesises research on the extent to which impulsive and antisocial disposition predicts Personality everyday pro- and antisocial risk-taking behaviour. We use the Reinforcement Sensitivity Theory (RST) of Reinforcement Sensitivity Theory personality to measure approach, avoidance, and inhibition dispositions, as well as measures of Callous- Psychopathy Unemotional and psychopathic personalities. In an international sample of 454 respondents, results showed that Callous-unemotional traits RST, psychopathic personality, and callous-unemotional measures accounted for different aspects of risk-taking Risk-taking behaviour. Specifically, traits associated with ‘fearlessness’ related more to ‘prosocial’ (recreational and social) risk-taking, whilst traits associated with ‘impulsivity’ related more to ‘antisocial’ (ethical and health) risk-taking. Further, we demonstrate that psychopathic personality may be demonstrated by combining the RST and callous- unemotional traits (high impulsivity, callousness, and low fear). Overall this study showed how impulsive, fearless and antisocial traits can be used in combination to identify pro- and anti-social risk-taking behaviours; suggestions for future research are indicated. 1.
    [Show full text]
  • Fact Sheet - Reinforcement
    FACT SHEET - REINFORCEMENT Positive Reinforcement Because many children with autism have difficulty with communication, play skills, and socialization, it is often difficult to motivate them to engage in activities that incorporate these skills. Positive reinforcement can provide additional motivation to help shape and increase developmentally approriate behaviors. A positive reinforcer is anything that is added following a behavior that increases the likelihood of the behavior occuring again in the future. Rewards are often given to children when they engage in desirable behaviors, but if the reward does Contact Information not cause those behaviors to increase in the future, then the reward is not actually a Website: positive reinforcer. www.coe.fau.edu/card/ Various Forms of Reinforcement Boca Raton Campus 777 Glades Road Natural Reinforcement: A child’s positive behaviors and social interactions Boca Raton, FL. 33431 are reinforced naturally. The natural consequences of positive behaviors become reinforcing themselves. Successful interactions become motivating to the child. Main Line: 561/ 297-2023 Toll Free: 1-888-632-6395 Examples: Fax: 561/297-2507 ♦ There is a ball out of reach for a child. The child says, “Ball,” and an adult Port St Lucie Campus hands the ball to the child. Access to the ball is reinforcing and increases the likelihood of the child requesting “ball” in the future. 500 NW California Blvd. ♦ A child is struggling with a difficult puzzle. The child says, “Help,” and an Port St Lucie, FL. 34986 adult helps the child. Completion of the puzzle is reinforcing. This successful interaction increases the likelihood of the child attempting puzzles Main Line: 561/ 297-2023 in the future and requesting help when needed.
    [Show full text]
  • The Building Blocks of Treatment in Cognitive-Behavioral Therapy
    Isr J Psychiatry Relat Sci Vol 46 No. 4 (2009) 245–250 The Building Blocks of Treatment in Cognitive-Behavioral Therapy Jonathan D. Huppert, PhD Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel Abstract: Cognitive behavioral therapy (CBT) is a set of treatments that focus on altering thoughts, sensations, emo- tions and behaviors by addressing identified maintenance mechanisms such as distorted thinking or avoidance. The current article describes the history of CBT and provides a description of many of the basic techniques used in CBT. These include: psychoeducation, self-monitoring, cognitive restructuring, in vivo exposure, imaginal exposure, and homework assignments. Cognitive-behavioral therapy (CBT) is probably progress has been made in understanding the psy- better called cognitive and behavioral therapies, chological mechanisms involved in treatment (e.g., given that there are many treatments and traditions 10), though there is still much to understand. that fall under the rubric of CBT. These therapies The basic tenets of CBT theory of mental illness have different emphases on theory (e.g., cognitive is that psychopathology is comprised of maladap- versus behavioral) and application (e.g., practical tive associations among thoughts, behaviors and vs. theoretical). Historically, behavior therapy de- emotions that are maintained by cognitive (at- veloped out of learning theory traditions of Pavlov tention, interpretation, memory) and behavioral (1) and Skinner (2), both of whom considered processes (avoidance, reinforcement, etc.). Within learning’s implications for psychopathology. More CBT theories, there are different emphases on direct clinical applications of behavioral prin- aspects of the characteristics of psychopathology ciples were developed by Mowrer (3), Watson and and their maintenance mechanisms (e.g., 11–14).
    [Show full text]
  • Throwing More Light on the Dark Side of Personality: a Re-Examination of Reinforcement
    City Research Online City, University of London Institutional Repository Citation: Broerman, R. L., Ross, S. & Corr, P. J. (2014). Throwing more light on the dark side of psychopathy: An extension of previous findings for the revised Reinforcement Sensitivity Theory. Personality and Individual Differences, 68, pp. 165-169. doi: 10.1016/j.paid.2014.04.024 This is the accepted version of the paper. This version of the publication may differ from the final published version. Permanent repository link: http://openaccess.city.ac.uk/16257/ Link to published version: http://dx.doi.org/10.1016/j.paid.2014.04.024 Copyright and reuse: City Research Online aims to make research outputs of City, University of London available to a wider audience. Copyright and Moral Rights remain with the author(s) and/or copyright holders. URLs from City Research Online may be freely distributed and linked to. City Research Online: http://openaccess.city.ac.uk/ [email protected] Throwing More Light on the Dark Side of Personality: A Re-examination of Reinforcement Sensitivity Theory in Primary and Secondary Psychopathy Scales Broerman, R. L. Ross, S. R. Corr, P. J. Introduction Due to researchers’ differing opinions regarding the construct of psychopathy, the distinction between primary and secondary psychopathy, though it has long been recognized to exist, has yet to be fully understood. This distinction, originally proposed by Karpman (1941, 1948), suggests two separate etiologies leading to psychopathy. Whereas primary psychopathy stems from genetic influences resulting in emotional deficits, secondary psychopathy is associated with environmental factors such as abuse (Lee & Salekin, 2010). Additionally, primary psychopathy is characterized by lack of fear/anxiety, secondary psychopathy is thought more to represent a vulnerability to experience higher levels of negative affect in general (Vassileva, Kosson, Abramowitz, & Conrad, 2005).
    [Show full text]
  • PSYCO 282: Operant Conditioning Worksheet
    PSYCO 282 Behaviour Modification Operant Conditioning Worksheet Operant Conditioning Examples For each example below, decide whether the situation describes positive reinforcement (PR), negative reinforcement (NR), positive punishment (PP), or negative punishment (NP). Note: the examples are randomly ordered, and there are not equal numbers of each form of operant conditioning. Question Set #1 ___ 1. Johnny puts his quarter in the vending machine and gets a piece of candy. ___ 2. I put on sunscreen to avoid a sunburn. ___ 3. You stick your hand in a flame and you get a painful burn. ___ 4. Bobby fights with his sister and does not get to watch TV that night. ___ 5. A child misbehaves and gets a spanking. ___ 6. You come to work late regularly and you get demoted. ___ 7. You take an aspirin to eliminate a headache. ___ 8. You walk the dog to avoid having dog poop in the house. ___ 9. Nathan tells a good joke and his friends all laugh. ___ 10. You climb on a railing of a balcony and fall. ___ 11. Julie stays out past her curfew and now does not get to use the car for a week. ___ 12. Robert goes to work every day and gets a paycheck. ___ 13. Sue wears a bike helmet to avoid a head injury. ___ 14. Tim thinks he is sneaky and tries to text in class. He is caught and given a long, boring book to read. ___ 15. Emma smokes in school and gets hall privileges taken away. ___ 16.
    [Show full text]
  • Integrating the Olweus Bullying Prevention Program and Positive Behavioral Interventions and Supports in Pennsylvania
    Integrating the Olweus Bullying Prevention Program and Positive Behavioral Interventions and Supports in Pennsylvania 2 Integrating the Olweus Bullying Prevention Program and Positive Behavioral Interventions and Supports in Pennsylvania Overview of Workgroup and Method This report was prepared with input from This report was produced to summarize the Pennsylvania OBPP-PBIS workgroup. the workgroup’s findings related to the The workgroup included representation following questions: from statewide leadership organizations that support the dissemination of Olweus • Is it possible to implement both OBPP Bullying Prevention Program (OBPP) and and PBIS in a school? Positive Behavioral Interventions and • What strategies support co-implemen- Supports (PBIS) in the commonwealth, tation of OBPP and PBIS? as well as leaders from schools that have • What considerations are warranted experience with both programs/frame- when a school is selecting an evidence- works. The workgroup met on six different based school climate improvement occasions and conducted site visits of program, such as OBPP or PBIS? model implementation sites. Definitions of Bullying Among Youths Bullying is any unwanted aggressive behavior(s) by another youth or group of youths who are not siblings or current dating partners that involves an observed or perceived power imbalance and is repeated multiple times or is highly likely to be repeated. Bullying may inflict harm or distress on the targeted youth including physical, psychological, social or educational harm. – Centers for
    [Show full text]
  • Giving Students the Tools to Reduce Bullying Behavior Through the Blending of School-Wide Positive Behavior Support, Explicit In
    Giving students the tools to reduce bullying behavior through the blending of school-wide positive behavior support, explicit instruction, and a redefinition of the bullying construct. CONTENTS ii. Before We Intervene ................................................................................... ii-1 1. Student Curriculum (Part 1) ..................................................................... 1-1 Objectives and Procedure ............................................................................................................... 1-1 Teaching the Social Responsibility Skills .................................................................................... 1-3 2. Student Curriculum (Part 2) ...................................................................... 2-1 Responding to Stop/walk/talk ...................................................................................................... 2-2 Group Practice .................................................................................................................................... 2-2 3. Gossip .............................................................................................................. 3-1 Stop/walk/talk with gossip ............................................................................................................ 3-2 Group Practice .................................................................................................................................... 3-2 4. Inappropriate Remarks ..............................................................................
    [Show full text]
  • Top 10 Behaviors of an Expert Animal Trainer
    The Top 10 Behaviors of Expert Animal Trainers Steve Martin Natural Encounters, Inc. Abstract Think of a trainer you recognize as an expert. Now, think of the characteristics that inspire you to call that person an expert. Is it the person's knowledge, skills, charisma, confidence, reputation or ... something else? This presentation will operationalize some of the most important characteristics that expert animal trainers exhibit, from my point of view. Introduction We all know great trainers in our lives, people we look up to, admire, talk about favorably with others. But, how does a person earn that reputation as a great trainer? And, what separates a great trainer from an average trainer? To answer these questions, we need to start by operationalizing the construct “training skill.” What does a trainer do to earn a reputation and label of “Expert?” “Expert” Operationalized Curators, managers, supervisors, veterinarians, directors and more would benefit from a description of the observable training skills of their staff. Since everyone’s training these days, how does a leader with no experience in training judge the skills of their staff? Because a person has read Don’t Shoot the Dog (a great resource by the way), has a whistle around their neck or a clicker in their hand, and uses jargon that confuses non- trainers, does not mean a person is a highly-skilled trainer. When a vet, curator or director watches a training session how are they to know skillful training when they see it? When the trainer tells them the animal is acting up, distracted by their presence, or messing with their minds, how does the director know the real problem isn’t the trainer encroaching on the animal’s personal space, unclear criteria, low rate of reinforcement, poor antecedent arrangement, or one of many other common training mistakes? For that matter, how does the trainer know? Good training involves the artful application of scientific principles.
    [Show full text]
  • Reinforcement
    Reinforcement This article is about the psychological concept. For behavior but this term may also refer to an enhancement the construction materials reinforcement, see Rebar. of memory. One example of this effect is called post- For reinforcement learning in computer science, see training reinforcement where a stimulus (e.g. food) given Reinforcement learning. For beam stiffening, see shortly after a training session enhances the learning.[2] Stiffening. This stimulus can also be an emotional one. A good ex- In behavioral psychology, reinforcement is a ample is that many people can explain in detail where they were when they found out the World Trade Center was attacked.[3][4] Reinforcement is an important part of operant or instrumental conditioning. 1 Introduction B.F. Skinner was a well-known and influential researcher who articulated many of the theoretical constructs of reinforcement and behaviorism. Skinner defined rein- forcers according to the change in response strength (re- sponse rate) rather than to more subjective criteria, such as what is pleasurable or valuable to someone. Accord- Diagram of operant conditioning ingly, activities, foods or items considered pleasant or en- joyable may not necessarily be reinforcing (because they consequence that will strengthen an organism’s future be- produce no increase in the response preceding them). havior whenever that behavior is preceded by a specific Stimuli, settings, and activities only fit the definition of antecedent stimulus. This strengthening effect may be reinforcers if the behavior that immediately precedes the measured as a higher frequency of behavior (e.g., pulling potential reinforcer increases in similar situations in the a lever more frequently), longer duration (e.g., pulling future; for example, a child who receives a cookie when a lever for longer periods of time), greater magnitude he or she asks for one.
    [Show full text]
  • Repeat Victimisation, Retraumatisation and Victim Vulnerability
    Send Orders for Reprints to [email protected] 36 The Open Criminology Journal, 2015, 8, 36-48 Open Access Repeat Victimisation, Retraumatisation and Victim Vulnerability Nicola Graham-Kevan*, Matthew Brooks, VJ Willan, Michelle Lowe, Phaedra Robinson, Roxanne Khan, Rachel Stokes, May Irving, Marta Karwacka and Joanne Bryce School of Psychology, University of Central Lancashire, UK Abstract: This study explores the contribution that traumatic experiences and psychological post-traumatic stress symptoms make to predicting subsequent revictimisation in a sample of violent crime victims. In addition, the timing of first trauma exposure was also explored. Fifty-four adult victims (27 male and 27 female) of police recorded violent crime were interviewed and their traumatic exposure history, trauma symptomology, age at first trauma exposure as well as psychological and psychosocial functioning were assessed. These victims were followed longitudinally and subsequent revictimisation between six and twelve months post index victimisation measured. A greater number of types of trauma exposure was related lower emotional stability, higher trauma symptomology and revictimisation. Those victims with childhood traumatic exposure reported more trauma symptomology exposure than those without prior exposure. The implications for law enforcement and victim services are discussed. Keywords: Crime, victims, violence, psychological trauma, post traumatic press. Interest in revictimisation (revictimisation refers here to of subsequent victimisation increases. This could be through any subsequent victimisation after the recorded index violent maladaptive coping (Fortier, DiLillo, Messman-Moore, victimisation) has been increasing over the past decade Peugh, DeNardi & Gaffey, 2009), such as substance use (Farrell, 2005) and so the factors that help to explain this (Dumais, De Benedictis, Joyal, Allaire, Lessage & Côte, phenomena are an important area to research (Davis, 2013; Hassel, Nordfjærn & Hagen, 2013), hypervigilance Maxwell, & Taylor, 2006).
    [Show full text]
  • How New Ties Facilitate the Mutual Reinforcement of Status and Bullying in Elementary Schools Rozemarijn Van Der Ploeg, Christian Steglich and René Veenstra
    The way bullying works: How new ties facilitate the mutual reinforcement of status and bullying in elementary schools Rozemarijn van der Ploeg, Christian Steglich and René Veenstra The self-archived postprint version of this journal article is available at Linköping University Institutional Repository (DiVA): http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-154080 N.B.: When citing this work, cite the original publication. van der Ploeg, R., Steglich, C., Veenstra, R., (2019), The way bullying works: How new ties facilitate the mutual reinforcement of status and bullying in elementary schools, Social Networks. https://doi.org/10.1016/j.socnet.2018.12.006 Original publication available at: https://doi.org/10.1016/j.socnet.2018.12.006 Copyright: Elsevier http://www.elsevier.com/ ACCEPTED MANUSCRIPT / UNCORRECTED PROOF The Way Bullying Works: How New Ties Facilitate the Mutual Reinforcement of Status and Bullying in Elementary Schools Rozemarijn van der Ploega,b, Christian Steglicha,c, René Veenstraa a Department of Sociology, University of Groningen, Groningen, the Netherlands b Department of Pedagogy and Educational Science, University of Groningen, Groningen, the Netherlands c Institute for Analytical Sociology, Linköping University, Linköping, Sweden Article in press Social Networks: https://doi.org/10.1016/j.socnet.2018.12.006 Correspondence regarding this paper should be addressed to Rozemarijn van der Ploeg, University of Groningen, Grote Rozenstraat 38, 9712 TJ Groningen, the Netherlands. Electronic mail may be addressed to [email protected]. Telephone: 0031 50 363 2486. 1 ACCEPTED MANUSCRIPT / UNCORRECTED PROOF Highlights Younger children punished bullying by a refusal to attribute status to bullies.
    [Show full text]
  • And Others Reducing Behavior Problems
    DOCUMENT RESUME ED 034 570 PS 002 300 AUT9OR Becker, Wesley C.; AndOthers TTTL7 I reducing BehaviorProblems: An OperantConditioning Guide for Teachers. INSTITUTION ERIC Clearinghouseon Early Childhood Education, Urbana, T11. ;National Lab.on Early Childhood Education. SPONS AGENCY Office of EconomicOpportunity, Washington,D.C.; Office of Education (DHFW), Washington, D.C. PUR DATE Nov 69 NOTE, 20D. EDRS PRICE EDPS Price MF-$0.25HC-$1.10 DESCRIPTORS Behavior Change, *BehaviorProblems, *Classroom Techniques, *Guides,Negative Reinforcement, *Operant Conditioning,Positive Reinforcement, Preschool Children,Reinforcement ARSTRACT Classroom managementand what teacherscan do to make it possible forchildren to behavebetter, which permits learning to occur, are the subjects of thishandbook. The authors hypothesize that the first step toward betterclassroom management is a teacher's recognitionthat how childrenbehave is largely determined by the teacher's behavior. Whenteachers employ operant conditioning they systematicallyuse rewarding principlesto strengthen children's stitablQ behavior.Ignoring unsuitablebehavior will discourage itscontinuance. Behaviorcan be changed by three methods: (1) Reward appropriatebehavior and withdrawrewards following inappropriatebehavior, (2)Strengthen the rewards first method is if the unsuccessful, and (3) Punish inappropriatebehavior while rewardingappropriate behavior ifmethods (1) and booklet explains (2) fail. The each method and offerssupporting research and evaluations of theuse of different methods.It outlines step-by-step procedures and has appeal for parents,teachers, andanyone involved in trainingchildren. (DO) S. S. DEPARTMENT OF NEALTH, EDUCATION & WELFARE OFFICE OF EDUCATION,, THIS DOCUMENT HAS BEEN REMDDLICED EMILY AS RECEIVED FROMTHE PERSON OR ORVEZATIO1 Fr. rc; :ries OF VIEW OROPINIONS STATED DO NOT NECESSARILY REPRESENT emu.'" OFFICE OF EDUCATION POSITION OR POLICY. (Nw gatr" O reducing behavior problems: an operant conditioning guide for teachers wesley c.
    [Show full text]